RichardErkhov/1bitLLM_-_bitnet_b1_58-3B-gguf

Quantization made by Richard Erkhov.

bitnet_b1_58-3B - GGUF

Model creator: https://huggingface.co/1bitLLM/
Original model: https://huggingface.co/1bitLLM/bitnet_b1_58-3B/

Name	Quant method	Size
bitnet_b1_58-3B.Q2_K.gguf	Q2_K	1.79GB
bitnet_b1_58-3B.IQ3_XS.gguf	IQ3_XS	1.79GB
bitnet_b1_58-3B.IQ3_S.gguf	IQ3_S	1.79GB
bitnet_b1_58-3B.Q3_K_S.gguf	Q3_K_S	1.79GB
bitnet_b1_58-3B.IQ3_M.gguf	IQ3_M	1.86GB
bitnet_b1_58-3B.Q3_K.gguf	Q3_K	1.94GB
bitnet_b1_58-3B.Q3_K_M.gguf	Q3_K_M	1.94GB
bitnet_b1_58-3B.Q3_K_L.gguf	Q3_K_L	2.01GB
bitnet_b1_58-3B.IQ4_XS.gguf	IQ4_XS	1.81GB
bitnet_b1_58-3B.Q4_0.gguf	Q4_0	1.79GB
bitnet_b1_58-3B.IQ4_NL.gguf	IQ4_NL	1.81GB
bitnet_b1_58-3B.Q4_K_S.gguf	Q4_K_S	2.17GB
bitnet_b1_58-3B.Q4_K.gguf	Q4_K	2.34GB
bitnet_b1_58-3B.Q4_K_M.gguf	Q4_K_M	2.34GB
bitnet_b1_58-3B.Q4_1.gguf	Q4_1	1.98GB
bitnet_b1_58-3B.Q5_0.gguf	Q5_0	2.17GB
bitnet_b1_58-3B.Q5_K_S.gguf	Q5_K_S	2.35GB
bitnet_b1_58-3B.Q5_K.gguf	Q5_K	2.5GB
bitnet_b1_58-3B.Q5_K_M.gguf	Q5_K_M	2.5GB
bitnet_b1_58-3B.Q5_1.gguf	Q5_1	2.35GB
bitnet_b1_58-3B.Q6_K.gguf	Q6_K	3.29GB
bitnet_b1_58-3B.Q8_0.gguf	Q8_0	3.29GB

Original model description:

license: mit

This is a reproduction of the BitNet b1.58 paper. The models are trained with RedPajama dataset for 100B tokens. The hypers, as well as two-stage LR and weight decay, are implemented as suggested in their following paper. All models are open-source in the repo. We will train larger models and/or more tokens when resource is available.

Results

PPL and zero-shot accuracy:

Models	PPL	ARCe	ARCc	HS	BQ	OQ	PQ	WGe	Avg
FP16 700M (reported)	12.33	54.7	23.0	37.0	60.0	20.2	68.9	54.8	45.5
BitNet b1.58 700M (reported)	12.87	51.8	21.4	35.1	58.2	20.0	68.1	55.2	44.3
BitNet b1.58 700M (reproduced)	12.78	51.4	21.8	35.0	59.6	20.6	67.5	55.4	44.5
FP16 1.3B (reported)	11.25	56.9	23.5	38.5	59.1	21.6	70.0	53.9	46.2
BitNet b1.58 1.3B (reported)	11.29	54.9	24.2	37.7	56.7	19.6	68.8	55.8	45.4
BitNet b1.58 1.3B (reproduced)	11.19	55.8	23.7	37.6	59.0	20.2	69.2	56.0	45.9
FP16 3B (reported)	10.04	62.1	25.6	43.3	61.8	24.6	72.1	58.2	49.7
BitNet b1.58 3B (reported)	9.91	61.4	28.3	42.9	61.5	26.6	71.5	59.3	50.2
BitNet b1.58 3B (reproduced)	9.88	60.9	28.0	42.3	58.3	26.0	71.4	60.3	49.6

The differences between the reported numbers and the reproduced results are possibly variances from the training data processing, seeds, or other random factors.

Evaluation

The evaluation pipelines are from the paper authors. Here is the commands to run the evaluation:

pip install lm-eval==0.3.0

python eval_ppl.py --hf_path 1bitLLM/bitnet_b1_58-3B --seqlen 2048

python eval_task.py --hf_path 1bitLLM/bitnet_b1_58-3B \
    --batch_size 1 \
    --tasks \
    --output_path result.json \
    --num_fewshot 0 \
    --ctx_size 2048