Finetune of CultriX/MistralTrix-v1 on Symbolic Logic content from Lewis Carrol (at a very low learning rate because of the very small dataset - I'm just experimenting and have no idea if this was effective at changing the model output).

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric	Value
Avg.	73.33
AI2 Reasoning Challenge (25-Shot)	72.53
HellaSwag (10-Shot)	88.34
MMLU (5-Shot)	65.26
TruthfulQA (0-shot)	70.93
Winogrande (5-shot)	80.66
GSM8k (5-shot)	62.24

Downloads last month: 690

Safetensors

Model size

8.99B params

Tensor type

FP16

Inference Examples

Text Generation

Inference API (serverless) is not available, repository is disabled.

Dataset used to train ryandt/MusingCaterpillar

Spaces using ryandt/MusingCaterpillar 11

Evaluation results

normalized accuracy on AI2 Reasoning Challenge (25-Shot)
test set Open LLM Leaderboard

72.530
normalized accuracy on HellaSwag (10-Shot)
validation set Open LLM Leaderboard

88.340
accuracy on MMLU (5-Shot)
test set Open LLM Leaderboard

65.260
mc2 on TruthfulQA (0-shot)
validation set Open LLM Leaderboard

70.930
accuracy on Winogrande (5-shot)
validation set Open LLM Leaderboard

80.660
accuracy on GSM8k (5-shot)
test set Open LLM Leaderboard

62.240

View on Papers With Code