@[Lora][微调] Qwen-VL/Qwen-VL-chat微调问题
关于Qwen-VL在lora过程中出现的问题总结。
模型预训练
- 错误一 “erfinv_cuda” not implemented for ‘BFloat16’
RuntimeError: "erfinv_cuda" not implemented for 'BFloat16'
参考github中issue253给出的意见,修改Qwen-VL-Chat/visual.py下的相关内容。
# visual.py 第18行
# from torch.nn.init import trunc_normal_
from torch.nn.init import normal_
# visual.py 第117行
# trunc_normal_(self.query, std=.02)
normal_(self.query, std=.02)
# visual.py 第132行
# trunc_normal_(m.weight, std=.02)
normal_(m.weight, std=.02)
其实报错原因可以很明显的看出来是由于erfinv_cuda
和BFloat16
之间的兼容性问题,理论上应该可以将 --bf16
置为False
避免这个问题,不过:
我们支持混合精度训练,因此你可以设置–bf16 True或者–fp16 True。经验上,如果你的机器支持bf16,我们建议使用bf16,这样可以和我们的预训练和对齐训练保持一致,这也是为什么我们把默认配置设为它的原因。
另外,normal_
与trunc_normal_
不同之处在于,trunc_normal_
会将数据控制在a和b之间,有造成模型初始化最大值和最小值不同,不清楚对于训练的影响是否会存在很大差异:
torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)
torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)
Fill the input Tensor with values drawn from a truncated normal distribution.
The values are effectively drawn from the normal distribution N(mean,std2)N(mean,std2) with values outside[*a,b*][*a,b*]
redrawn until they are within the bounds. The method used for generating the random values works best when a≤mean≤b.
torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)
Fill the input Tensor with values drawn from the normal distribution.
N(mean,std2)N(mean,std2).
- 错误二 Only Support Self-Attention Currently
AssertionError: Only Support Self-Attention Currently
参考知乎魩雨给出的意见,还有一些其他报错可以参考,比如deepspeed等,这些暂时训练时还未遇到,若后期遇到进一步验证,修改Qwen-VL-Chat/visual.py下的相关内容。
# visual.py 第192行
# assert query is key, 'Only Support Self-Attention Currently'
assert torch.allclose(query, key), 'Only Support Self-Attention Currently'
模型整合
- 错误三 ‘QWenTokenizer’ object has no attribute ‘IMAGE_ST’
AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'
参考github中issue287给出的意见,修改tokenization_qwen.py下的相关内容。
class QWenTokenizer(PreTrainedTokenizer):
...
# super().__init__(**kwargs)
self.image_start_tag = image_start_tag
self.image_end_tag = image_end_tag
self.image_pad_tag = image_pad_tag
self.ref_start_tag = ref_start_tag
self.ref_end_tag = ref_end_tag
self.box_start_tag = box_start_tag
self.box_end_tag = box_end_tag
self.quad_start_tag = quad_start_tag
self.quad_end_tag = quad_end_tag
self.IMAGE_ST = (
ref_start_tag, ref_end_tag,
box_start_tag, box_end_tag,
quad_start_tag, quad_end_tag,
image_start_tag, image_end_tag,
image_pad_tag
)
super().__init__(**kwargs)
- 错误四:size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]).
预留参考issue415,修改模型文件夹Qwen-VL下的tokenization_qwen.py第45行内容,填补未训练的76个tokens:
代码修改如下:
# EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(281)))
目前模型给出的self.tokenizer.n_vocab的长度为151860 , 这个数字是qwen.tiktoken的长度151643 + 217个特殊字符的个数计算而来,而模型的配置文件中的长度为 “vocab_size”: 151936 , 造成Qwen-VL经过lora微调后无法对齐,目前还缺少76个字符
如果采用Qwen-VL-chat的话则不用担心,因为finetune.py
:
if lora_args.q_lora or "chat" in model_args.model_name_or_path.lower():
modules_to_save = None
else:
modules_to_save = ["wte", "lm_head"]
使用q_lora和chat模型的话,是不会引入这两个参数的。