[Lora][微调] Qwen-VL/Qwen-VL-chat微调问题

kai_io

已于 2024-06-21 11:30:55 修改

阅读量1k

点赞数 5

文章标签： python

于 2024-06-20 09:21:08 首次发布

本文链接：https://blog.csdn.net/qq_16759959/article/details/139818605

版权

@[Lora][微调] Qwen-VL/Qwen-VL-chat微调问题

关于Qwen-VL在lora过程中出现的问题总结。

模型预训练

错误一 “erfinv_cuda” not implemented for ‘BFloat16’

RuntimeError: "erfinv_cuda" not implemented for 'BFloat16'

参考github中issue253给出的意见，修改Qwen-VL-Chat/visual.py下的相关内容。

# visual.py 第18行
# from torch.nn.init import trunc_normal_
from torch.nn.init import normal_

# visual.py 第117行
# trunc_normal_(self.query, std=.02)
normal_(self.query, std=.02)

# visual.py 第132行
# trunc_normal_(m.weight, std=.02)
normal_(m.weight, std=.02)

其实报错原因可以很明显的看出来是由于erfinv_cuda和BFloat16之间的兼容性问题，理论上应该可以将 --bf16 置为False避免这个问题，不过：

我们支持混合精度训练，因此你可以设置–bf16 True或者–fp16 True。经验上，如果你的机器支持bf16，我们建议使用bf16，这样可以和我们的预训练和对齐训练保持一致，这也是为什么我们把默认配置设为它的原因。

另外，normal_与trunc_normal_不同之处在于,trunc_normal_会将数据控制在a和b之间，有造成模型初始化最大值和最小值不同，不清楚对于训练的影响是否会存在很大差异：

torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)

torch.nn.init.trunc_normal_(tensor, mean=0.0, std=1.0, a=-2.0, b=2.0, generator=None)
Fill the input Tensor with values drawn from a truncated normal distribution.
The values are effectively drawn from the normal distribution N(mean,std²)N(mean,std²) with values outside [*a,b*][*a,b*] redrawn until they are within the bounds. The method used for generating the random values works best when a≤mean≤b.

torch.nn.init.normal_(tensor, mean=0.0, std=1.0, generator=None)

Fill the input Tensor with values drawn from the normal distribution.
N(mean,std²)N(mean,std²).

错误二 Only Support Self-Attention Currently

AssertionError: Only Support Self-Attention Currently

参考知乎魩雨给出的意见，还有一些其他报错可以参考，比如deepspeed等，这些暂时训练时还未遇到，若后期遇到进一步验证，修改Qwen-VL-Chat/visual.py下的相关内容。

# visual.py 第192行
# assert query is key, 'Only Support Self-Attention Currently'
assert torch.allclose(query, key), 'Only Support Self-Attention Currently'

模型整合

错误三 ‘QWenTokenizer’ object has no attribute ‘IMAGE_ST’

AttributeError: 'QWenTokenizer' object has no attribute 'IMAGE_ST'

参考github中issue287给出的意见，修改tokenization_qwen.py下的相关内容。

class QWenTokenizer(PreTrainedTokenizer):
    ...
    # super().__init__(**kwargs)
    self.image_start_tag = image_start_tag
    self.image_end_tag = image_end_tag
    self.image_pad_tag = image_pad_tag
    self.ref_start_tag = ref_start_tag
    self.ref_end_tag = ref_end_tag
    self.box_start_tag = box_start_tag
    self.box_end_tag = box_end_tag
    self.quad_start_tag = quad_start_tag
    self.quad_end_tag = quad_end_tag
    self.IMAGE_ST = (
        ref_start_tag, ref_end_tag,
        box_start_tag, box_end_tag,
        quad_start_tag, quad_end_tag,
        image_start_tag, image_end_tag,
        image_pad_tag
    )
    super().__init__(**kwargs)

错误四：size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight

RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.transformer.wte.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]).
size mismatch for base_model.model.lm_head.modules_to_save.default.weight: copying a param with shape torch.Size([151936, 4096]) from checkpoint, the shape in current model is torch.Size([151860, 4096]).

预留参考issue415，修改模型文件夹Qwen-VL下的tokenization_qwen.py第45行内容，填补未训练的76个tokens：
代码修改如下：

# EXTRAS = tuple((f"<|extra_{i}|>" for i in range(205)))
EXTRAS = tuple((f"<|extra_{i}|>" for i in range(281)))

目前模型给出的self.tokenizer.n_vocab的长度为151860 ，这个数字是qwen.tiktoken的长度151643 + 217个特殊字符的个数计算而来，而模型的配置文件中的长度为 “vocab_size”: 151936 ，造成Qwen-VL经过lora微调后无法对齐，目前还缺少76个字符

如果采用Qwen-VL-chat的话则不用担心，因为finetune.py：

if lora_args.q_lora or "chat" in model_args.model_name_or_path.lower():
    modules_to_save = None
else:
    modules_to_save = ["wte", "lm_head"]

使用q_lora和chat模型的话，是不会引入这两个参数的。

kai_io

关注

5
点赞
踩
8

收藏

觉得还不错? 一键收藏
2
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫