环境:使用 AutoTokenizer 加载量化后的百川模型
原始加载分词代码
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
model_id,
torch_dtype=torch.float32,
use_fast=False
)
报错:ValueError: Tokenizer class BaichuanTokenizer does not exist or is not currently imported
解决代码:
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained(
model_id,
torch_dtype=torch.float32,
use_fast=False,
trust_remote_code=True
)
添加 trust_remote_code=True
可以在量化时对 Tokenizer 进行保存
tokenizer.save_pretrained('YOUR_PATH')