hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4

Aug 13

hi,
thanks for providing such quantized llama 3.1 model
when i was deploying this version with latest lmdeploy package, it has been keeping failed with such error:

Unrecognized keys in rope_scalingfor 'rope_type'='llama3': {'type'} Unrecognized keys inrope_scaling for 'rope_type'='llama3': {'type'} Traceback (most recent call last): File "/usr/local/bin/lmdeploy", line 8, in <module> sys.exit(run()) File "/usr/local/lib/python3.10/dist-packages/lmdeploy/cli/entrypoint.py", line 36, in run args.run(args) File "/usr/local/lib/python3.10/dist-packages/lmdeploy/cli/serve.py", line 298, in api_server run_api_server(args.model_path, File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/openai/api_server.py", line 1285, in serve VariableInterface.async_engine = pipeline_class( File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/async_engine.py", line 190, in __init__ self._build_turbomind(model_path=model_path, File "/usr/local/lib/python3.10/dist-packages/lmdeploy/serve/async_engine.py", line 235, in _build_turbomind self.engine = tm.TurboMind.from_pretrained( File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/turbomind.py", line 340, in from_pretrained return cls(model_path=pretrained_model_name_or_path, File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/turbomind.py", line 144, in __init__ self.model_comm = self._from_hf(model_source=model_source, File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/turbomind.py", line 235, in _from_hf output_model = OUTPUT_MODELS.get(output_model_name)( File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/deploy/target_model/w4.py", line 80, in __init__ super().__init__(input_model, cfg, to_file, out_dir) File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 172, in __init__ self.cfg = self.get_config(cfg) File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/deploy/target_model/w4.py", line 84, in get_config final_cfg = super().get_config(cfg).__dict__ File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/deploy/target_model/base.py", line 187, in get_config final_cfg.update(self.input_model.model_info()) File "/usr/local/lib/python3.10/dist-packages/lmdeploy/turbomind/deploy/source_model/llama.py", line 224, in model_info raise ValueError( ValueError: Ambiguous rope_scaling in config: {'_name_or_path': '/fsx/alvaro.bartolome/70b-instruct', 'architectures': ['LlamaForCausalLM'], 'attention_bias': False, 'attention_dropout': 0.0, 'bos_token_id': 128000, 'eos_token_id': [128001, 128008, 128009], 'hidden_act': 'silu', 'hidden_size': 8192, 'initializer_range': 0.02, 'intermediate_size': 28672, 'max_position_embeddings': 131072, 'mlp_bias': False, 'model_type': 'llama', 'num_attention_heads': 64, 'num_hidden_layers': 80, 'num_key_value_heads': 8, 'pretraining_tp': 1, 'quantization_config': {'bits': 4, 'group_size': 128, 'modules_to_not_convert': None, 'quant_method': 'awq', 'version': 'gemm', 'zero_point': True}, 'rms_norm_eps': 1e-05, 'rope_scaling': {'factor': 8.0, 'low_freq_factor': 1.0, 'high_freq_factor': 4.0, 'original_max_position_embeddings': 8192, 'rope_type': 'llama3', 'type': 'llama3'}, 'rope_theta': 500000.0, 'tie_word_embeddings': False, 'torch_dtype': 'float16', 'transformers_version': '4.43.0.dev0', 'use_cache': False, 'vocab_size': 128256}

how could i fix it?
great thx!

alvarobartt

Hugging Quants org Aug 14

Hi @hulianxue I think that may be related to the transformers version used which should be 4.43.0 or higher, maybe not covered in the dev0 subversion that you are using; also if you could, would be nice to just update to 4.44.0 instead as it has been recently released 🤗

hulianxue

29 days ago

Hi @hulianxue I think that may be related to the transformers version used which should be 4.43.0 or higher, maybe not covered in the dev0 subversion that you are using; also if you could, would be nice to just update to 4.44.0 instead as it has been recently released 🤗

so nice of you to answer, i will try it out!

hugging-quants
/

Meta-Llama-3.1-70B-Instruct-AWQ-INT4

could not run this in lmdeploy