RuterNorway/Llama-2-13b-chat-norwegian-GPTQ · Make this work directly with AutoModelForCausalLM.from

Sep 23, 2023

Since 🤗transformers now has native support for GPTQ-quantized models,
quantized models can now be loaded and used just by calling
AutoModelForCausalLM.from_pretrained('your_model')
TheBloke GPTQ models already support this but yours doesn't yet.
It would be nice to see this change, as it then could be directly used in many scripts without much code alteration.

I have already done this on a private repo, so I'll let you know the steps I took to make it work:

Rename the safetensors model file to model.safetensors
The safetensors file lacks metadata, which the 🤗transformers backend relies on. Therefore, add metadata.

I used the safetensors util to add the metadata https://github.com/by321/safetensors_util
I just added the metadata equivalent of TheBLoke's llama2 variant, which was the following config

{
    "__metadata__": {
        "format": "pt",
        "quantized_by": "RuterNorway"
    }
}

If you'd like, I could make a pull reuqest, but I figured you might just do it youself so you don't have to spend time verifying everything.

marksverdhei

Sep 23, 2023

FYI: I have not tested to see if this still works with the exllama notebooks and your example code, just that it works with AutoModelForCausalLM.from_pretrained

RuterNorway

Owner Sep 25, 2023

Hi. Thank you @marksverdhei . Can you please also make a pull request? :)

marksverdhei changed discussion status to closed Sep 29, 2023

RuterNorway
/

Llama-2-13b-chat-norwegian-GPTQ

Make this work directly with AutoModelForCausalLM.from_pretrained