新一代面壁小钢炮 MiniCPM-V 2.6 微调指南

强化学习曾小健

于 2024-08-10 22:37:31 发布

阅读量3.8k

点赞数 17

文章标签：分布式

本文链接：https://blog.csdn.net/sinat_37574187/article/details/141097645

版权

新一代面壁小钢炮 MiniCPM-V 2.6 微调指南

OpenBMB开源社区 2024年08月09日 13:15 北京

面壁「小钢炮」 MiniCPM-V 2.6 模型重磅上新！发布两天即登顶GitHub Trending 榜首，广受开源社区好评！目前MiniCPM-V系列目前已斩获 9000+星标🌟谢谢大家喜欢小钢炮，一起加油呀💪🏻

今天，为大家带来的是「MiniCPM-V 2.6 微调指南」，手把手带你实操MiniCPM-V 2.6 全量微调与Lora微调，一键定制你的专属端侧多模态模型！

➤ 模型介绍

🔗 多图、视频首上端！3 SOTA 面壁小钢炮，创 GPT-4V 端侧全面对标新时代！

➤ GitHub地址

🔗 https://github.com/OpenBMB/MiniCPM-V

➤ HuggingFace地址

🔗 https://huggingface.co/openbmb/MiniCPM-Llama3-V-2_5

➤ B站配套视频，搭配食用更佳

🔗 https://www.bilibili.com/video/BV1YT42167mF/

公众号后台回复“小钢炮”，可解锁知识库全文。

MiniCPM-V 2.6 训练指南

1. 获取MiniCPM-V的GitHub代码

git clone https://github.com/OpenBMB/MiniCPM-V.git

2. 安装依赖包

cd MiniCPM-V pip install -r requirements.txt

3. 处理数据集

处理数据集成以下形式：

id值不可重复
<image>\n应该出现在每个数据集对话数据的开头
"image"对应的地址需要存在图片
每个conversations对应的列表中是一个多轮对话，content代表对话内容，role对应user代表用户输入，role对应assistant代表模型输出
每条数据仅包含一张图片

[ { "id": "0", "conversations": [ { "content": "<image>\nWho are they?", "role": "user" }, { "content": "They're Kane and Gretzka from Bayern Munich.", "role": "assistant" }, { "content": "What are they doing?", "role": "user" }, { "content": "They are celebrating on the soccer field.", "role": "assistant" } ], "image": "/root/ld/ld_project/LLaMA-Factory/data/mllm_demo_data/1.jpg" } ...以上是单个条数据，列表中可存在多个相同格式的数据 ]

4. lora微调

修改MiniCPM-V/finetue/finetune_lora.sh

#!/bin/bashGPUS_PER_NODE=8 # 改成你的机器每个节点共有多少张显卡，如果是单机八卡就是8NNODES=1 # 改成你的机器有多少个节点，如果就是一台服务器就是1NODE_RANK=0 # 使用第几个服务器训练MASTER_ADDR=localhostMASTER_PORT=6001
MODEL="/root/ld/ld_model_pretrained/Minicpmv2_6" # 本地模型路径 or openbmb/MiniCPM-V-2.5# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.# See the section for finetuning in README for more information.DATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 训练数据文件地址EVAL_DATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 验证集数据文件地址LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
export NCCL_P2P_DISABLE=1 # a100等支持nccl_p2p的显卡去掉此行export NCCL_IB_DISABLE=1 # a100等显卡去掉此行
DISTRIBUTED_ARGS="    --nproc_per_node $GPUS_PER_NODE \    --nnodes $NNODES \    --node_rank $NODE_RANK \    --master_addr $MASTER_ADDR \    --master_port $MASTER_PORT"torchrun $DISTRIBUTED_ARGS finetune.py  \    --model_name_or_path $MODEL \    --llm_type $LLM_TYPE \    --data_path $DATA \    --eval_data_path $EVAL_DATA \    --remove_unused_columns false \     --label_names "labels" \ # 数据构造，不要动    --prediction_loss_only false \     --bf16 false \ # 使用bf16精度训练，4090，a100，h100等可以开启    --bf16_full_eval false \ # 使用bf16精度测试    --fp16 true \ # 使用fp16精度训练    --fp16_full_eval true \ # 使用pf16精度测试    --do_train \ # 是否训练    --do_eval \ # 训练过程中是否做验证    --tune_vision true \ # 是否微调siglip(vit)模块    --tune_llm false \ # 是否微调大语言模型模块    --use_lora true \ # 是否lora微调    --lora_target_modules "llm\..*layers\.\d+\.self_attn\.(q_proj|k_proj｜v_proj)" \ #lora插入的层，这里写的是正则表达式，建议不改    --model_max_length 2048 \ # 模型训练的最大长度    --max_slice_nums 9 \ # 模型最大切分次数    --max_steps 10000 \ # 最多训练步数    --eval_steps 1000 \ # 每多少步验证一次    --output_dir output/output_minicpmv2_lora \ # 模型lora保存地址    --logging_dir output/output_minicpmv2_lora \ # 日志保存地址    --logging_strategy "steps" \ # 日志输出策略（可选epoch）    --per_device_train_batch_size 2 \ # 每张卡训练的batch_size    --per_device_eval_batch_size 1 \ # 每张卡验证的batch_size    --gradient_accumulation_steps 8 \ # 梯度累积，当显存少时可以增大这个参数从而减少per_device_train_batch_size    --evaluation_strategy "steps" \ # 验证策略(可选epoch)    --save_strategy "steps" \ # 保存策略(可选epoch)与save_steps同时起作用    --save_steps 10 \ # 10个step保存一次    --save_total_limit 10 \ # 最大储存总数    --learning_rate 1e-6 \ # 学习率    --weight_decay 0.1 \ # 权重正则化参数    --adam_beta2 0.95 \ #     --warmup_ratio 0.01 \ # 总步数的预热率，即：总训练步数*warmup_ratio=预热步数    --lr_scheduler_type "cosine" \ # 学习率调整器    --logging_steps 1 \    --gradient_checkpointing true \ # 梯度检查点，建议开启，极大减少显存使用    --deepspeed ds_config_zero3.json \ # 使用zero3，显存充足建议使用ds_config_zero2.json    --report_to "tensorboard" # wandb # tensorboard或者wandb记录损失

需要重点关注的参数：

MODEL="/root/ld/ld_model_pretrained/MiniCPM-Llama3-V-2_5" # 本地模型路径 or huggingface idDATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 训练数据文件EVAL_DATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 验证集数据文件
--tune_vision true \ # 是否微调siglip(vit)模块--lora_target_modules "llm\..*layers\.\d+\.self_attn\.(q_proj|k_proj｜v_proj|o_proj)" \ #lora插入的层，这里写的是正则表达式，建议不改--tune_vision true \ # 是否微调siglip(vit)模块--tune_llm false \ # 是否微调大语言模型模块--use_lora true \ # 是否lora微调
--model_max_length 2048 \ # 模型训练的最大长度 #1000+文字数/1.5--per_device_train_batch_size 2 \ # 每张卡训练的batch_size--per_device_eval_batch_size 1 \ # 每张卡验证的batch_size--gradient_accumulation_steps 1 \ # 梯度累积，当显存少时可以增大这个参数从而减少per_device_train_batch_size--learning_rate 1e-6 \ # 学习率--gradient_checkpointing true \ # 梯度检查点，建议开启，极大减少显存使用--deepspeed ds_config_zero3.json \ # 使用zero3，显存充足建议使用ds_config_zero2.json

开始训练

cd MiniCPM-V/finetunebash finetune_lora.sh

lora与模型合并保存

from peft import PeftModelfrom transformers import AutoModel, AutoTokenizerimport osimport shutil
model_type = "/root/ld/ld_model_pretrained/Minicpmv2_6"  # Local model path or huggingface idpath_to_adapter = "/root/ld/ld_project/minicpmv2_6/MiniCPM-V/finetune/output/output_minicpmv2_lora/checkpoint-30"  # Path to the saved LoRA adaptermerge_path = "/root/ld/ld_project/minicpmv2_6/MiniCPM-V/finetune/output/merge_minicpmv"  # Path to save the merged model
# 保证原始模型的各个文件不遗漏保存到merge_path中def copy_files_not_in_B(A_path, B_path):    """    Copies files from directory A to directory B if they exist in A but not in B.
    :param A_path: Path to the source directory (A).    :param B_path: Path to the destination directory (B).    """    # 保证路径存在    if not os.path.exists(A_path):        raise FileNotFoundError(f"The directory {A_path} does not exist.")    if not os.path.exists(B_path):        os.makedirs(B_path)
    # 获取路径A中所有非权重文件    files_in_A = os.listdir(A_path)    files_in_A = set([file for file in files_in_A if not (".bin" in file or "safetensors" in file)])    # List all files in directory B    files_in_B = set(os.listdir(B_path))
    # 找到所有A中存在但B中不存在的文件    files_to_copy = files_in_A - files_in_B
    # 将这些文件复制到B路径下    for file in files_to_copy:        src_file = os.path.join(A_path, file)        dst_file = os.path.join(B_path, file)        shutil.copy2(src_file, dst_file)
# 加载原始模型model = AutoModel.from_pretrained(    model_type,    trust_remote_code=True)
# 加载lora模块到原始模型中lora_model = PeftModel.from_pretrained(    model,    path_to_adapter,    device_map="auto",    trust_remote_code=True).eval()
# 将加载的lora模块合并到原始模型中merge_model = lora_model.merge_and_unload()
# 将新合并的模型进行保存merge_model.save_pretrained(merge_path, safe_serialization=False)
# 加载分词器tokenizer = AutoTokenizer.from_pretrained(model_type, trust_remote_code=True)tokenizer.save_pretrained(merge_path)
copy_files_not_in_B(model_type,merge_path)

5. 全量微调

修改MiniCPM-V/finetune/finetune_ds.sh参数：

#!/bin/bash
GPUS_PER_NODE=8 # 改成你的机器每个节点共有多少张显卡，如果是单机八卡就是8NNODES=1 # 改成你的机器有多少个节点，如果就是一台服务器就是1NODE_RANK=0 # 使用第几个服务器训练MASTER_ADDR=localhostMASTER_PORT=6001
MODEL="/root/ld/ld_model_pretrained/Minicpmv2_6" # 模型本地路径 or huggingface id# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.# See the section for finetuning in README for more information.DATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 训练数据文件EVAL_DATA="/root/ld/ld_project/MiniCPM-V/finetune/mllm_demo.json" # 验证集数据文件LLM_TYPE="qwen2" # if use openbmb/MiniCPM-V-2, please set LLM_TYPE=minicpm
export NCCL_P2P_DISABLE=1 # a100等支持nccl_p2p的显卡去掉此行export NCCL_IB_DISABLE=1 # a100等显卡去掉此行
DISTRIBUTED_ARGS="    --nproc_per_node $GPUS_PER_NODE \    --nnodes $NNODES \    --node_rank $NODE_RANK \    --master_addr $MASTER_ADDR \    --master_port $MASTER_PORT"torchrun $DISTRIBUTED_ARGS finetune.py  \    --model_name_or_path $MODEL \    --llm_type $LLM_TYPE \    --data_path $DATA \    --eval_data_path $EVAL_DATA \    --remove_unused_columns false \    --label_names "labels" \ # 数据构造，不要动    --prediction_loss_only false \     --bf16 false \ # 使用bf16精度训练，4090，a100，h100等可以开启    --bf16_full_eval false \ # 使用bf16精度测试    --fp16 true \ # 使用fp16精度训练    --fp16_full_eval true \ # 使用pf16精度测试    --do_train \ # 是否训练    --do_eval \ # 训练过程中是否做验证    --tune_llm true \ # 是否微调大语言模型模块    --tune_vision true \ # 是否微调视觉模块    --model_max_length 2048 \ # 模型训练的最大长度    --max_slice_nums 9 \ # 模型最大切分次数    --max_steps 10000 \ # 最多训练部署    --eval_steps 1000 \ # 每多少步验证一次    --output_dir output/output_minicpmv2_lora \ # 模型lora保存地址    --logging_dir output/output_minicpmv2_lora \ # 日志保存地址    --logging_strategy "steps" \ # 日志输出策略（可选epoch）    --per_device_train_batch_size 2 \ # 每张卡训练的batch_size    --per_device_eval_batch_size 1 \ # 每张卡验证的batch_size    --gradient_accumulation_steps 1 \ # 梯度累积，当显存少时可以增大这个参数从而减少per_device_train_batch_size    --evaluation_strategy "steps" \ # 验证策略(可选epoch)    --save_strategy "steps" \ # 保存策略(可选epoch)与save_steps同时起作用    --save_steps 10 \ # 10个step保存一次    --save_total_limit 10 \ # 最大储存总数    --learning_rate 1e-6 \ # 学习率    --weight_decay 0.1 \ # 权重正则化参数    --adam_beta2 0.95 \ #     --warmup_ratio 0.01 \ # 总步数的预热率，即：总训练步数*warmup_ratio=预热步数    --lr_scheduler_type "cosine" \ # 学习率调整器    --logging_steps 1 \    --gradient_checkpointing true \ # 梯度检查点，建议开启，极大减少显存使用    --deepspeed ds_config_zero3.json \ # 使用zero3，显存充足建议使用ds_config_zero3.json    --report_to "tensorboard" # wandb # tensorboard或者wandb记录损失

开始训练：

cd MiniCPM-V/finetunebash finetune_sh.sh