Edit model card

sft-sum-chosen-10lp-shuff-full-tiny

This model is a fine-tuned version of TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T on the martimfasantos/openai-summarize-tldr dataset. It achieves the following results on the evaluation set:

  • Loss: 1.9409
  • Nll Loss: 1.9409
  • Logps/best: -72.8478
  • Rewards/chosen: 2.0114
  • Rewards/rejected: -0.4229
  • Rewards/accuracies: 0.9998
  • Rewards/margins: 2.4343
  • Logps/rejected: -11.6536
  • Logps/chosen: -72.8478
  • Logits/rejected: -2.6479
  • Logits/chosen: -2.9522

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-05
  • train_batch_size: 1
  • eval_batch_size: 4
  • seed: 42
  • gradient_accumulation_steps: 16
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.95) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Nll Loss Logps/best Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
2.3573 0.0137 100 2.3703 2.3703 -88.8140 0.4147 0.0412 1.0 0.3735 -7.0125 -88.8140 -2.6551 -2.9658
2.1904 0.0274 200 2.1322 2.1322 -79.9647 1.2997 0.0373 1.0 1.2624 -7.0516 -79.9647 -2.6656 -2.9758
1.9956 0.0411 300 2.0629 2.0629 -77.3844 1.5577 -0.1097 0.9995 1.6674 -8.5217 -77.3844 -2.6813 -2.9915
2.0379 0.0548 400 2.0405 2.0405 -76.5483 1.6413 -0.1759 0.9994 1.8173 -9.1840 -76.5483 -2.6918 -3.0033
1.9476 0.0685 500 2.0250 2.0250 -75.9762 1.6985 -0.1561 0.9991 1.8546 -8.9858 -75.9762 -2.6981 -3.0089
2.0151 0.0822 600 2.0134 2.0133 -75.5465 1.7415 -0.1979 0.9991 1.9394 -9.4039 -75.5465 -2.6956 -3.0066
1.9972 0.0960 700 2.0037 2.0037 -75.1909 1.7770 -0.2110 0.9997 1.9881 -9.5345 -75.1909 -2.6886 -2.9996
1.9851 0.1097 800 1.9950 1.9950 -74.8615 1.8100 -0.2127 0.9997 2.0226 -9.5511 -74.8615 -2.6861 -2.9971
2.0271 0.1234 900 1.9890 1.9890 -74.6372 1.8324 -0.2530 0.9995 2.0854 -9.9543 -74.6372 -2.6818 -2.9925
2.0501 0.1371 1000 1.9845 1.9845 -74.4788 1.8483 -0.3242 0.9997 2.1724 -10.6661 -74.4788 -2.6491 -2.9545
1.9699 0.1508 1100 1.9813 1.9812 -74.3528 1.8609 -0.3208 0.9997 2.1817 -10.6327 -74.3528 -2.6664 -2.9755
1.9448 0.1645 1200 1.9773 1.9772 -74.2031 1.8758 -0.2738 0.9997 2.1496 -10.1623 -74.2031 -2.6739 -2.9842
1.9606 0.1782 1300 1.9746 1.9746 -74.0931 1.8868 -0.3353 0.9997 2.2221 -10.7775 -74.0931 -2.6755 -2.9850
1.8795 0.1919 1400 1.9716 1.9715 -73.9887 1.8973 -0.3115 0.9997 2.2088 -10.5398 -73.9887 -2.6658 -2.9741
1.9585 0.2056 1500 1.9703 1.9703 -73.9430 1.9018 -0.3353 0.9997 2.2371 -10.7774 -73.9430 -2.6721 -2.9814
1.9508 0.2193 1600 1.9664 1.9664 -73.7942 1.9167 -0.4138 0.9998 2.3305 -11.5624 -73.7942 -2.6751 -2.9840
1.9041 0.2330 1700 1.9657 1.9656 -73.7736 1.9188 -0.3353 0.9997 2.2541 -10.7776 -73.7736 -2.6703 -2.9794
1.9507 0.2467 1800 1.9634 1.9634 -73.6847 1.9277 -0.3964 0.9998 2.3240 -11.3880 -73.6847 -2.6728 -2.9810
1.8942 0.2604 1900 1.9620 1.9620 -73.6314 1.9330 -0.3368 0.9998 2.2698 -10.7926 -73.6314 -2.6631 -2.9695
2.0088 0.2742 2000 1.9604 1.9603 -73.5703 1.9391 -0.3303 0.9997 2.2694 -10.7277 -73.5703 -2.6651 -2.9720
2.0277 0.2879 2100 1.9596 1.9596 -73.5404 1.9421 -0.3122 0.9997 2.2543 -10.5463 -73.5404 -2.6687 -2.9765
1.9697 0.3016 2200 1.9578 1.9578 -73.4823 1.9479 -0.3187 0.9998 2.2666 -10.6117 -73.4823 -2.6615 -2.9674
1.9756 0.3153 2300 1.9564 1.9564 -73.4282 1.9533 -0.3217 0.9997 2.2750 -10.6410 -73.4282 -2.6624 -2.9692
1.9471 0.3290 2400 1.9552 1.9551 -73.3780 1.9583 -0.3660 0.9997 2.3244 -11.0849 -73.3780 -2.6636 -2.9703
1.9646 0.3427 2500 1.9546 1.9546 -73.3608 1.9601 -0.3453 0.9997 2.3054 -10.8779 -73.3608 -2.6522 -2.9582
2.0034 0.3564 2600 1.9536 1.9536 -73.3221 1.9639 -0.4025 0.9998 2.3665 -11.4498 -73.3221 -2.6635 -2.9708
1.9853 0.3701 2700 1.9522 1.9522 -73.2647 1.9697 -0.3826 0.9998 2.3523 -11.2507 -73.2647 -2.6548 -2.9612
1.9648 0.3838 2800 1.9518 1.9518 -73.2540 1.9707 -0.4008 0.9998 2.3716 -11.4329 -73.2540 -2.6557 -2.9618
1.992 0.3975 2900 1.9514 1.9513 -73.2347 1.9727 -0.3741 0.9998 2.3468 -11.1657 -73.2347 -2.6585 -2.9649
1.9098 0.4112 3000 1.9501 1.9501 -73.1879 1.9773 -0.3653 0.9998 2.3426 -11.0774 -73.1879 -2.6623 -2.9691
2.0089 0.4249 3100 1.9496 1.9496 -73.1694 1.9792 -0.3960 0.9998 2.3752 -11.3848 -73.1694 -2.6570 -2.9627
2.0138 0.4386 3200 1.9487 1.9487 -73.1364 1.9825 -0.3799 0.9998 2.3624 -11.2233 -73.1364 -2.6524 -2.9576
1.9295 0.4524 3300 1.9489 1.9489 -73.1488 1.9813 -0.3977 0.9998 2.3790 -11.4018 -73.1488 -2.6569 -2.9628
1.9276 0.4661 3400 1.9479 1.9479 -73.1079 1.9853 -0.3945 0.9998 2.3799 -11.3697 -73.1079 -2.6537 -2.9590
1.9594 0.4798 3500 1.9472 1.9472 -73.0821 1.9879 -0.4255 0.9998 2.4135 -11.6798 -73.0821 -2.6542 -2.9600
1.9141 0.4935 3600 1.9471 1.9471 -73.0800 1.9881 -0.4024 0.9998 2.3906 -11.4487 -73.0800 -2.6500 -2.9555
1.8611 0.5072 3700 1.9460 1.9460 -73.0338 1.9928 -0.3865 0.9998 2.3793 -11.2897 -73.0338 -2.6542 -2.9599
1.8907 0.5209 3800 1.9460 1.9460 -73.0372 1.9924 -0.3918 0.9998 2.3843 -11.3429 -73.0372 -2.6504 -2.9556
1.9147 0.5346 3900 1.9456 1.9456 -73.0218 1.9940 -0.3939 0.9998 2.3879 -11.3637 -73.0218 -2.6498 -2.9550
1.9485 0.5483 4000 1.9454 1.9454 -73.0146 1.9947 -0.4036 0.9998 2.3983 -11.4605 -73.0146 -2.6513 -2.9565
1.9379 0.5620 4100 1.9448 1.9448 -72.9908 1.9971 -0.3932 0.9998 2.3902 -11.3561 -72.9908 -2.6501 -2.9550
1.8956 0.5757 4200 1.9444 1.9443 -72.9738 1.9988 -0.4097 0.9998 2.4084 -11.5214 -72.9738 -2.6477 -2.9518
1.9916 0.5894 4300 1.9440 1.9440 -72.9580 2.0003 -0.4049 0.9998 2.4053 -11.4737 -72.9580 -2.6473 -2.9514
1.8885 0.6031 4400 1.9441 1.9441 -72.9673 1.9994 -0.3808 0.9998 2.3802 -11.2320 -72.9673 -2.6464 -2.9503
1.9078 0.6169 4500 1.9437 1.9436 -72.9481 2.0013 -0.4206 0.9998 2.4220 -11.6308 -72.9481 -2.6465 -2.9503
1.9037 0.6306 4600 1.9435 1.9434 -72.9426 2.0019 -0.3718 0.9998 2.3737 -11.1427 -72.9426 -2.6441 -2.9481
1.9558 0.6443 4700 1.9427 1.9427 -72.9121 2.0049 -0.3758 0.9998 2.3807 -11.1827 -72.9121 -2.6445 -2.9484
1.9416 0.6580 4800 1.9429 1.9428 -72.9187 2.0043 -0.3698 0.9998 2.3741 -11.1227 -72.9187 -2.6447 -2.9486
1.9471 0.6717 4900 1.9427 1.9427 -72.9167 2.0045 -0.4041 0.9998 2.4085 -11.4650 -72.9167 -2.6447 -2.9486
1.9237 0.6854 5000 1.9425 1.9425 -72.9062 2.0055 -0.4023 0.9998 2.4079 -11.4479 -72.9062 -2.6451 -2.9490
1.9687 0.6991 5100 1.9422 1.9421 -72.8930 2.0068 -0.4106 0.9998 2.4174 -11.5306 -72.8930 -2.6475 -2.9516
1.9274 0.7128 5200 1.9420 1.9420 -72.8846 2.0077 -0.3934 0.9998 2.4011 -11.3589 -72.8846 -2.6454 -2.9492
1.8258 0.7265 5300 1.9418 1.9418 -72.8788 2.0083 -0.3905 0.9998 2.3987 -11.3293 -72.8788 -2.6458 -2.9498
1.8978 0.7402 5400 1.9416 1.9416 -72.8710 2.0090 -0.4199 0.9998 2.4289 -11.6232 -72.8710 -2.6475 -2.9515
1.9706 0.7539 5500 1.9416 1.9416 -72.8733 2.0088 -0.4296 0.9998 2.4384 -11.7202 -72.8733 -2.6467 -2.9506
1.8711 0.7676 5600 1.9416 1.9415 -72.8708 2.0091 -0.4093 0.9998 2.4183 -11.5174 -72.8708 -2.6454 -2.9492
1.925 0.7813 5700 1.9412 1.9411 -72.8550 2.0106 -0.4237 0.9998 2.4344 -11.6619 -72.8550 -2.6463 -2.9502
1.952 0.7951 5800 1.9412 1.9411 -72.8554 2.0106 -0.4179 0.9998 2.4285 -11.6032 -72.8554 -2.6463 -2.9503
1.9295 0.8088 5900 1.9413 1.9413 -72.8621 2.0099 -0.4133 0.9998 2.4233 -11.5578 -72.8621 -2.6463 -2.9503
1.9457 0.8225 6000 1.9413 1.9413 -72.8636 2.0098 -0.4083 0.9998 2.4180 -11.5072 -72.8636 -2.6459 -2.9499
1.9016 0.8362 6100 1.9412 1.9412 -72.8592 2.0102 -0.4150 0.9998 2.4252 -11.5748 -72.8592 -2.6471 -2.9513
1.9789 0.8499 6200 1.9413 1.9413 -72.8632 2.0098 -0.4221 0.9998 2.4319 -11.6458 -72.8632 -2.6477 -2.9520
1.944 0.8636 6300 1.9411 1.9411 -72.8542 2.0107 -0.4232 0.9998 2.4339 -11.6568 -72.8542 -2.6475 -2.9518
1.9435 0.8773 6400 1.9410 1.9409 -72.8496 2.0112 -0.4278 0.9998 2.4390 -11.7027 -72.8496 -2.6479 -2.9523
1.917 0.8910 6500 1.9410 1.9410 -72.8519 2.0109 -0.4237 0.9998 2.4346 -11.6610 -72.8519 -2.6482 -2.9525
1.9243 0.9047 6600 1.9410 1.9410 -72.8520 2.0109 -0.4202 0.9998 2.4311 -11.6265 -72.8520 -2.6480 -2.9523
1.8624 0.9184 6700 1.9409 1.9409 -72.8485 2.0113 -0.4202 0.9998 2.4314 -11.6260 -72.8485 -2.6477 -2.9520
1.8998 0.9321 6800 1.9410 1.9409 -72.8489 2.0112 -0.4227 0.9998 2.4340 -11.6518 -72.8489 -2.6478 -2.9521
1.9654 0.9458 6900 1.9410 1.9409 -72.8490 2.0112 -0.4228 0.9998 2.4341 -11.6529 -72.8490 -2.6478 -2.9521
1.9113 0.9595 7000 1.9409 1.9409 -72.8471 2.0114 -0.4228 0.9998 2.4342 -11.6520 -72.8471 -2.6477 -2.9520
1.951 0.9733 7100 1.9410 1.9410 -72.8501 2.0111 -0.4228 0.9998 2.4339 -11.6524 -72.8501 -2.6478 -2.9521
1.9863 0.9870 7200 1.9409 1.9409 -72.8478 2.0114 -0.4229 0.9998 2.4343 -11.6536 -72.8478 -2.6479 -2.9522

Framework versions

  • Transformers 4.41.2
  • Pytorch 2.1.2
  • Datasets 2.20.0
  • Tokenizers 0.19.1
Downloads last month
4
Safetensors
Model size
1.1B params
Tensor type
BF16
·
Inference Examples
Inference API (serverless) is not available, repository is disabled.

Model tree for martimfasantos/sft-sum-chosen-10lp-shuff-full-tiny