问题
>>> from transformers import AutoModelForCausalLM, AutoTokenizer
>>> model_dir = "qwen/Qwen-VL-Chat-Int4"
>>> tokenizer = AutoTokenizer.from_pretrained(model_dir, trust_remote_code=True)
>>> model = AutoModelForCausalLM.from_pretrained(model_dir, device_map="auto",trust_remote_code=True).eval()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/ssd1/miniconda3/envs/pytorch2.1.2/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 511, in from_pretrained
return model_class.from_pretrained(
File "/ssd1/miniconda3/envs/pytorch2.1.2/lib/python3.8/site-packages/transformers/modeling_utils.py", line 2945, in from_pretrained
model = quantizer.convert_model(model)
File "/ssd1/miniconda3/envs/pytorch2.1.2/lib/python3.8/site-packages/optimum/gptq/quantizer.py", line 229, in convert_model
self._replace_by_quant_layers(model, layers_to_be_replaced)
File "/ssd1/miniconda3/envs/pytorch2.1.2/lib/python3.8/site-packages/optimum/gptq/quantizer.py", line 256, in _replace_by_quant_layers
QuantLinear = dynamically_import_QuantLinear(
TypeError: dynamically_import_QuantLinear() got an unexpected keyword argument 'disable_exllamav2'
解决
pip install optimum==1.12.0