Edit model card

llama3.1_8b_dpo_bwgenerator

This model is a fine-tuned version of meta-llama/Meta-Llama-3.1-8B-Instruct on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 0.0325
  • Rewards/chosen: -8.0882
  • Rewards/rejected: -39.4615
  • Rewards/accuracies: 0.9958
  • Rewards/margins: 31.3733
  • Logps/rejected: -504.7621
  • Logps/chosen: -165.4306
  • Logits/rejected: -1.1893
  • Logits/chosen: -1.7730

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 4
  • eval_batch_size: 4
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0854 0.0719 1000 0.1058 -28.5182 -64.6284 0.9929 36.1101 -756.4312 -369.7310 -1.1763 -1.7541
0.078 0.1438 2000 0.0582 -16.5113 -45.2514 0.9938 28.7401 -562.6614 -249.6615 -1.1262 -1.7216
0.0458 0.2157 3000 0.0506 -12.8337 -41.3538 0.9942 28.5201 -523.6852 -212.8855 -1.3210 -1.8884
0.0295 0.2876 4000 0.0534 -12.7034 -45.1669 0.9942 32.4635 -561.8164 -211.5826 -1.2303 -1.8040
0.0442 0.3595 5000 0.0428 -10.9032 -42.1320 0.9955 31.2288 -531.4679 -193.5811 -1.2327 -1.8028
0.0329 0.4313 6000 0.0365 -8.5207 -36.8790 0.9951 28.3583 -478.9377 -169.7559 -1.2024 -1.7841
0.0384 0.5032 7000 0.0418 -12.1405 -46.4364 0.9955 34.2959 -574.5117 -205.9535 -1.1646 -1.7549
0.0596 0.5751 8000 0.0344 -8.7801 -39.5544 0.9951 30.7743 -505.6917 -172.3499 -1.2145 -1.7970
0.0437 0.6470 9000 0.0347 -9.4417 -41.5833 0.9955 32.1416 -525.9807 -178.9660 -1.1796 -1.7709
0.0203 0.7189 10000 0.0357 -9.3723 -41.8496 0.9951 32.4773 -528.6439 -178.2718 -1.1694 -1.7593
0.0257 0.7908 11000 0.0347 -8.6569 -40.6073 0.9961 31.9505 -516.2208 -171.1173 -1.1821 -1.7676
0.0355 0.8627 12000 0.0332 -8.4060 -40.1402 0.9964 31.7342 -511.5494 -168.6083 -1.1878 -1.7722
0.0553 0.9346 13000 0.0325 -8.0882 -39.4615 0.9958 31.3733 -504.7621 -165.4306 -1.1893 -1.7730

Framework versions

  • PEFT 0.10.0
  • Transformers 4.44.0
  • Pytorch 2.3.0+cu121
  • Datasets 2.14.7
  • Tokenizers 0.19.1
Downloads last month
41
Safetensors
Model size
8.03B params
Tensor type
BF16
·
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for NanQiangHF/llama3.1_8b_dpo_bwgenerator