1 系统环境
硬件环境(Ascend/GPU/CPU): Ascend
MindSpore版本: 2.2
执行模式(PyNative/ Graph): 不限
2 报错信息
2.1 问题描述
执行WizardCoderTokenizer.from_pretrained(…)命令时,没有指向WizardCoderTokenizer,反而指向了GPT2Tokenizer,导致加载错误。
Traceback (most recent call last):
File "/home/wizardcoder/1_wizardcoder-mindformers/mindformers/tools/register/register.py", line 217, in get_instance return obj_cls(**kwargs)
TypeError:_init_( ) missing 2 required positional arguments: 'vocab_file' and 'merges_file'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):File "<stdin>", line 1, in<module> File "/home/wizardcoder/1_wizardcoder-mindformers/mindformers/models/base_tokenizer.py" , line 2004, in from_pretrained return build_tokenizer(class_name=class_name , **kwargs) File "/home/wizardcoder/1_wizardcoder-mindformers/mindformers/models/build_tokenizer.py", line 76, in build_tokenizer
return MindFormerRegister. get_instance(module_type , class_name, **kwargs) File "/home/wizardcoder/1_wizardcoder-mindformers/mindformers/tools/register/register.py", line 219, in get_instance raise tvnelel('{}. {} ".format(obj_cls._name_, e)) TypeError: GPT2Tokenizer: init () missing 2 required positional arguments: 'vocab_file' and 'merges_file'
复制
3 根因分析
注册或者配置文件中可能指向了GPT2Tokenizer,需要排查:
1) mindformer_book.py中查看tokenizer是否注册
2) run_wizardcoder.yaml文件中查看是否有指向错误
3) from_pretrained引用的预训练模型文件中查看是否有指向错误
4 解决方案
发现在tokenizer_config.json文件中引用了GPT2Tokenizer,需要将其改为WizardCoderTokenizer。
"tokenizer_class":"GPT2Tokenizer"
复制
改为
"tokenizer_class":"WizardCoderTokenizer"