Text Generation
Transformers
Safetensors
English
falcon_mamba
conversational
Inference Endpoints

About finetuning

#6
by Xiangyu1 - opened

Could you make your fine-tuning code publicly available?

Technology Innovation Institute org

Hi @Xiangyu1
Since this model is compatible with HF ecosystem, you could check out https://github.com/huggingface/trl/blob/main/examples/scripts/sft.py as a starting point to finetune the model

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

I wrote a blog about finetune falcon-mamba-7b with 32k context on one H100 GPU, I hope it helps! https://wdlctc.github.io/mstmamba.html

Can I train using just the LongLoRA code, or have you made any modifications to this code?

If you want to train from scratch, you may need to initialize the model weight without using a pre-trained model.

We do modifications to huggingface to make it support 2x long context length with mini-sequence technology

Technology Innovation Institute org

Note the training had some issues which should be fixed by: https://github.com/huggingface/transformers/pull/33195 the kernels did not considered layer norms on B, DT and C states

Technology Innovation Institute org

Now the fix is merged on transformers main branch, make sure to re-install transformers main branch before the next release

Sign up or log in to comment