[INFO|modeling_utils.py:4288] 2024-09-03 20:26:29,014 >> All the weights of ChatGLMForConditionalGeneration were initialized from the model checkpoint at /mnt/workspace/models/glm-4-9b-chat.
If your task is similar to the task the model of the checkpoint was trained on, you can already use ChatGLMForConditionalGeneration for predictions without further training.
[INFO|configuration_utils.py:915] 2024-09-03 20:26:29,018 >> loading configuration file /mnt/workspace/models/glm-4-9b-chat/generation_config.json
[INFO|configuration_utils.py:962] 2024-09-03 20:26:29,018 >> Generate config GenerationConfig {
"do_sample": true,
"eos_token_id": [
151329,
151336,
151338
],
"max_length": 128000,
"pad_token_id": 151329,
"temperature": 0.8,
"top_p": 0.8
}
09/03/2024 20:26:29 - INFO - llamafactory.model.model_utils.attention - Using torch SDPA for faster training and inference.
09/03/2024 20:26:29 - INFO - llamafactory.model.loader - all params: 9,399,951,360
09/03/2024 20:26:29 - WARNING - llamafactory.chat.hf_engine - There is no current event loop, creating a new one.
Exception in thread Thread-8 (generate):
Traceback (most recent call last):
File "/usr/local/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
self.run()
File "/usr/local/lib/python3.10/threading.py", line 953, in run
self._target(*self._args, **self._kwargs)
File "/mnt/workspace/projects/LLaMA-Factory-0.8.3/lf_8.3_env/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/mnt/workspace/projects/LLaMA-Factory-0.8.3/lf_8.3_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 1758, in generate
result = self._sample(
File "/mnt/workspace/projects/LLaMA-Factory-0.8.3/lf_8.3_env/lib/python3.10/site-packages/transformers/generation/utils.py", line 2449, in _sample
model_kwargs = self._update_model_kwargs_for_generation(
File "/root/.cache/huggingface/modules/transformers_modules/glm-4-9b-chat/modeling_chatglm.py", line 929, in _update_model_kwargs_for_generation
cache_name, cache = self._extract_past_from_model_output(outputs)
ValueError: too many values to unpack (expected 2)
glm4-9b-chat在使用llama-factory加载和微调时出现上面的报错,这个问题的主要原因还是项目中包版本不匹配。在最新的llama-factory中已经更新,所以一定要下载最新的llama-factory项目。
不要在git上releases中下载压缩包:
直接在git上下载main分支:
glm4-9b-chat在Modelscope和Huggingface上下载:
modelscope:https://www.modelscope.cn/models/ZhipuAI/glm-4-9b-chat/files
huggingface国内镜像:https://hf-mirror.com/THUDM/glm-4-9b-chat