【大模型】return _sentencepiece.SentencePieceProcessor_LoadFromFileself, arg TypeError: not a string
错误信息
运行大模型 Qwen1.5-14B-Chat 出现如下错误:
Traceback (most recent call last):
File "/abc/llm/./transformers_low_bit_pipeline.py", line 44, in <module>
tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2048, in from_pretrained
return cls._from_pretrained(
^^^^^^^^^^^^^^^^^^^^^
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/transformers/tokenization_utils_base.py", line 2287, in _from_pretrained
tokenizer = cls(*init_inputs, **init_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 182, in __init__
self.sp_model = self.get_spm_processor(kwargs.pop("from_slow", False))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/transformers/models/llama/tokenization_llama.py", line 209, in get_spm_processor
tokenizer.Load(self.vocab_file)
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/sentencepiece/__init__.py", line 961, in Load
return self.LoadFromFile(model_file)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
重点信息
这行代码:
tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
报错:
File "/abc/.conda/envs/llm/lib/python3.11/site-packages/sentencepiece/__init__.py", line 316, in LoadFromFile
return _sentencepiece.SentencePieceProcessor_LoadFromFile(self, arg)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: not a string
解决方法
代码:
from transformers import LlamaTokenizer, TextGenerationPipeline
...
tokenizer = LlamaTokenizer.from_pretrained(model_path, trust_remote_code=True)
修改为:
from transformers import LlamaTokenizer, TextGenerationPipeline, AutoTokenizer
...
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
验证
再次运行模型,正常!!