Edit model card

πŸš€ Custom quantizations of the base Meta-Llama-3.1-405B πŸ–₯️

🐧 On Linux sudo apt install -y aria2

🍎 On Mac brew install aria2

Feel free to paste these all in at once or one at a time

For faster downloads copy paste each one separetely

Then copy paste this to your terminal to downlaod fastest on either mac or linux.

q3q8 custom quant optimized for M2 Ultra 192Gb

aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00001-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00001-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00002-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00002-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00003-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00003-of-00004.gguf
aria2c -x 16 -s 16 -k 1M -o meta-405b-base-q3q8-00004-of-00004.gguf https://huggingface.co/nisten/meta-405b-base-gguf/resolve/main/meta-405b-base-q3q8-00004-of-00004.gguf

Perplexity benchmarks (WORK IN PROGRESS, THIS IS JUST A DUMP)

llama 405b - instruct - old (pre-update) BF16
perplexity: 2197.87 seconds per pass - ETA 1 hours 49.88 min 
[1]2.1037,[2]2.4201,[3]2.0992,[4]1.8446,[5]1.6823,[6]1.5948,[7]1.5575,[8]1.5121,[9]1.4750,[10]1.4570,[11]1.4567,[12]1.4666,
Final estimate: PPL = 1.4666 +/- 0.03184

Hermes 405b-Q8_0
perplexity: 716.47 seconds per pass - ETA 35.82 min
[1]1.5152,[2]1.8253,[3]1.6906,[4]1.5438,[5]1.4252,[6]1.3592,[7]1.3464,[8]1.3212,[9]1.2882,[10]1.2663,[11]1.2626,[12]1.2698,
Final estimate: PPL = 1.2698 +/- 0.02620

Hermes 405b-BF16
perplexity: 592.52 seconds per pass - ETA 1 hours 58.50 min
[1]1.5147,[2]1.8220,[3]1.6890,[4]1.5437,[5]1.4250,[6]1.3588,[7]1.3458,[8]1.3216,[9]1.2887,[10]1.2667,[11]1.2630,[12]1.2693,
Final estimate: PPL = 1.2693 +/- 0.02605

meta-405b-base-q8 
perplexity: 167.37 seconds per pass - ETA 33.47 minutes
[1]1.3927,[2]1.6952,[3]1.5905,[4]1.4674,[5]1.3652,[6]1.3054,[7]1.2885,[8]1.2673,[9]1.2397,[10]1.2179,[11]1.2149,[12]1.2162,
Final estimate: PPL = 1.2162 +/- 0.02128


meta-base-q3q8
perplexity: 92.20 seconds per pass - ETA 4.60 minutes
[1]1.6445,[2]2.0909,[3]1.8369,[4]1.6788,[5]1.5438,[6]1.4754,[7]1.4604,[8]1.4321,[9]1.3941,[10]1.3698,[11]1.3691,[12]1.3845,
Final estimate: PPL = 1.3845 +/- 0.02785

meta-base-2bit
perplexity: 35.04 seconds per pass - ETA 7.00 minutes
[1]2.9667,[2]3.5432,[3]3.0714,[4]2.9515,[5]2.8404,[6]2.8713,[7]2.9628,[8]2.9945,[9]3.0155,[10]2.9973,[11]3.0522,[12]3.1619,
Final estimate: PPL = 3.1619 +/- 0.10580
Downloads last month
44
GGUF
Model size
406B params
Architecture
llama
Inference API
Unable to determine this model's library. Check the docs .