微调命令
CUDA_VISIBLE_DEVICES=0 python /aaabbb/LLaMA-Factory/src/train_bash.py \
--stage sft \
--model_name_or_path /aaabbb/LLaMA-Factory/models/chatglm2-6b \
--do_train \
--dataset self_cognition \
--template chatglm2 \
--finetuning_type lora \
--lora_target query_key_value \
--output_dir output/chatglm2_sft_lora_self/ \
--overwrite_cache \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 4 \
--lr_scheduler_type cosine \
--logging_steps 10 \
--save_steps 10 \
--learning_rate 5e-5 \
--num_train_epochs 10 \
--plot_loss \
--fp16
微调后,得到checkpoint-50的目录。然后,运行如下cli_demo命令
python src/cli_demo.py \
--model_name_or_path /aaabbb/LLaMA-Factory/models/chatglm2-6b \
--template chatglm2 \
--finetuning_type lora \
--checkpoint_dir output/chatglm2_sft_lora_self/checkpoint-50/
--fp16
此时,输入内容,就会得到模型输出如下这样文不对题的内容和乱码,还会报错:
User: 你是谁?
Assistant: 许多人教育教学 Derby问题导向辛亥革命捗xtonodus玖冇 conting在今年析osto国画゚瞟结核otech灑Exception in thread Thread-4 (generate):
Traceback (most recent call last):
File "/xx_llama_factory_py310/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/xx_llama_factory_py310/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 1652, in generate
return self.sample(
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/generation/utils.py", line 2781, in sample
streamer.put(next_tokens.cpu())
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/generation/streamers.py", line 97, in put
text = self.tokenizer.decode(self.token_cache, **self.decode_kwargs)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 3738, in decode
return self._decode(
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 1001, in _decode
filtered_tokens = self.convert_ids_to_tokens(token_ids, skip_special_tokens=skip_special_tokens)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/transformers/tokenization_utils.py", line 982, in convert_ids_to_tokens
tokens.append(self._convert_id_to_token(index))
File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 125, in _convert_id_to_token
return self.tokenizer.convert_id_to_token(index)
File "/xxxcache/huggingface/modules/transformers_modules/chatglm2-6b/tokenization_chatglm.py", line 60, in convert_id_to_token
return self.sp_model.IdToPiece(index)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1045, in _batched_func
return _func(self, arg)
File "/xx_llama_factory_py310/lib/python3.10/site-packages/sentencepiece/__init__.py", line 1038, in _func
raise IndexError('piece id is out of range.')
IndexError: piece id is out of range.
这个命令训练模型,训练时的loss一直是0,loss没有变化过。
原因:
- loss 为 0 说明是溢出了,检查一下模型文件是否是最新
- 也需要检查下载的模型的.bin文件是否下载正确
修复方案:
- 用最新的模型,确保下载完整(某些网络环境不太容易确保能正确下载那么大的文件的),就没有这个问题了
- 参考这个方法来正确下载模型
- https://blog.csdn.net/ybdesire/article/details/134204332