Request for Assistance with Fine-Tuning the nomic-embed-text-v1 Model for spanish language

#20
by wilfoderek - opened

I hope this message finds you well. My name is Wilfredo, and I am currently working on a project that involves fine-tuning the nomic-ai/nomic-embed-text-v1 model for a specific application in Spanish text processing.

I am reaching out to you to request your assistance in understanding the steps required to fine-tune this model effectively. Specifically, I am looking for guidance on:

Dataset Preparation: What are the recommended practices for preparing the dataset for fine-tuning? Are there any specific data formats or preprocessing steps that should be followed?

Fine-Tuning Process: Could you provide detailed instructions or a framework for fine-tuning the model, including any specific hyperparameters or training configurations that are crucial for achieving optimal performance?

Thank you very much for your time and consideration. I look forward to your response.

Best regards,

Nomic AI org

hi sentence transformers 3 might be a good place to start! https://x.com/tomaarsen/status/1795425797408235708

as far as data, i would curate a sizeable dataset of at least 10k to finetune on, although I'm not sure how well the model will do since the tokenizer is optimized solely for english.

Thank you @zpn . Also, I would like to study the code og nomic embeded.

zpn changed discussion status to closed

Sign up or log in to comment