Edit model card

Llama-3-Base-8B-DICE-Iter1

This model was developed using Bootstrapping Language Models with DPO Implicit Rewards (DICE) at iteration 1, based on the princeton-nlp/Llama-3-Base-8B-SFT-DPO architecture as the starting point.

Links to Other Models

Model Description

  • Model type: An 8B parameter GPT-like model fine-tuned on synthetic datasets.
  • Language(s) (NLP): Primarily English
  • License: MIT
  • Fine-tuned from model: princeton-nlp/Llama-3-Base-8B-SFT-DPO

AlpacaEval Leaderboard Evaluation Results

Model LC. Win Rate Win Rate
Llama-3-Base-8B-SFT-DPO 18.20 15.50
Llama-3-Base-8B-DICE-Iter1 25.08 25.77
Llama-3-Base-8B-DICE-Iter2 27.55 30.99

Citation

@article{chen2024bootstrapping,
  title={Bootstrapping Language Models with DPO Implicit Rewards},
  author={Chen, Changyu and Liu, Zichen and Du, Chao and Pang, Tianyu and Liu, Qian and Sinha, Arunesh and Varakantham, Pradeep and Lin, Min},
  journal={arXiv preprint arXiv:2406.09760},
  year={2024}
}
Downloads last month
4
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Dataset used to train sail/Llama-3-Base-8B-DICE-Iter1

Collection including sail/Llama-3-Base-8B-DICE-Iter1