Qwen-VL大模型LoRA微调、融合及部署

天道酬勤者

已于 2024-07-19 15:49:08 修改

阅读量3.9k

点赞数 11

文章标签： python 算法 transformer 人工智能

于 2024-07-19 15:40:29 首次发布

本文链接：https://blog.csdn.net/songyang66/article/details/140486880

版权

1.服务器租赁

在 AutoDL 平台中租赁一个4090 等 24G 显存大小的容器实例

2.环境配置

conda create -n qwenvl python=3.11 -y
source activate qwenvl

conda install -y -c "nvidia/label/cuda-12.1.0" cuda-runtim #安装 cuda-runtime

3.下载模型

cd ~/autodl-tmp/ // 在~/autodl-tmp/创建如下目录

mkdir model //存放模型文件
cd model

git lfs install //确保 lfs 已经被正确安装

sudo apt-get update //否则更新包列表

sudo apt-get install git-lfs //安装 Git LFS

git clone https://github.com/QwenLM/Qwen-VL.git //下载包含所需依赖的github文件

git clone https://www.modelscope.cn/qwen/Qwen-VL-Chat.git //下载模型

git clone https://www.modelscope.cn/qwen/Qwen-VL-Chat-Int4.git //下载量化模型

4.依赖配置

cd Qwen-VL //进入包含所需依赖的github文件目录

pip3 install -r requirements.txt
pip3 install -r requirements_openai_api.txt
pip3 install -r requirements_web_demo.txt
pip3 install deepspeed
pip3 install peft
pip3 install optimum
pip3 install auto-gptq
pip3 install modelscope -U

5.测试

通过网页端Web UI使用：

python /root/autodl-tmp/model/Qwen-VL/web_demo_mm.py --checkpoint-path /root/autodl-tmp/model/Qwen-VL-Chat

通过代码使用：

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "0"
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from PIL import Image

torch.manual_seed(1234)

# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/model/Qwen-VL-Chat", trust_remote_code=True)

# Load the model with GPU device map
model = AutoModelForCausalLM.from_pretrained(
    "/root/autodl-tmp/model/Qwen-VL-Chat", 
    device_map="auto", 
    trust_remote_code=True
).eval()

# 1st dialogue turn
query = tokenizer.from_list_format([
    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'},
    {'text': '这是什么'},
])
response, history = model.chat(tokenizer, query=query, history=None)
print(response)
# Expected output: 图中是一名女子在沙滩上和狗玩耍，旁边的狗是一只拉布拉多犬，它们处于沙滩上。

# 2nd dialogue turn
response, history = model.chat(tokenizer, '输出"击掌"的检测框', history=history)
print(response)
# Expected output: <ref>击掌</ref><box>(536,509),(588,602)</box>

# Debugging: Print history and response before drawing bbox
print("History:", history)
print("Response before drawing bbox:", response)

# Check the return type and save the image
result_image = tokenizer.draw_bbox_on_latest_picture(response, history)

if result_image:
    # Save the generated image to a file
    result_image.save('output_image.jpg')
    print(f"Generated image saved as output_image.jpg")
else:
    print("No box detected or no image generated")

5.数据集准备

需要将所有样本数据放到一个列表中并存入JSON文件中。每个样本对应一个字典，包含id和conversation，其中后者为一个列表。示例如下所示：

[  
  {  
    "id": "identity_0",  
    "conversations": [  
      {  
        "from": "user",  
        "value": "你好"  
      },  
      {  
        "from": "assistant",  
        "value": "我是Qwen-VL,一个支持视觉输入的大模型。"  
      }  
    ]  
  },  
  {  
    "id": "identity_1",  
    "conversations": [  
      {  
        "from": "user",  
        "value": "Picture 1: <img>https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg</img>\n图中的狗是什么品种？"  
      },  
      {  
        "from": "assistant",  
        "value": "图中是一只拉布拉多犬。"  
      },  
      {  
        "from": "user",  
        "value": "框出图中的格子衬衫"  
      },  
      {  
        "from": "assistant",  
        "value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"  
      }  
    ]  
  },  
  {   
    "id": "identity_2",  
    "conversations": [  
      {  
        "from": "user",  
        "value": "Picture 1: <img>assets/mm_tutorial/Chongqing.jpeg</img>\nPicture 2: <img>assets/mm_tutorial/Beijing.jpeg</img>\n图中都是哪"  
      },  
      {  
        "from": "assistant",  
        "value": "第一张图片是重庆的城市天际线，第二张图片是北京的天际线。"  
      }  
    ]  
  }  
]
对数据格式的解释：

1.为针对多样的VL任务，增加了一下的特殊tokens： <img> </img> <ref> </ref> <box> </box>.

2.对于带图像输入的内容可表示为 Picture id: <img>img_path</img>\n{your prompt}，其中id表示对话中的第几张图片。"img_path"可以是本地的图片或网络地址。

3.对话中的检测框可以表示为<box>(x1,y1),(x2,y2)</box>，其中 (x1, y1) 和(x2, y2)分别对应左上角和右下角的坐标，并且被归一化到[0, 1000)的范围内. 检测框对应的文本描述也可以通过<ref>text_caption</ref>表示。

6.微调

python3 /root/autodl-tmp/model/Qwen-VL/finetune.py \
--model_name_or_path /root/autodl-tmp/model/Qwen-VL-Chat \
--data_path /root/autodl-tmp/data/data.json \
--bf16 True \
--fix_vit True \
--output_dir output_qwen \
--num_train_epochs 5 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 10 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 600 \
--lazy_preprocess True \
--gradient_checkpointing true \
--use_lora

7.模型合并及推理

与全参数微调不同，LoRA的训练只需存储adapter部分的参数。因此需要先合并并存储模型

from peft import AutoPeftModelForCausalLM  # 确保导入所需的模块
from modelscope import (
     AutoTokenizer
)

path_to_adapter = "/root/autodl-tmp/model/output_qwen"
# 从预训练模型中加载自定义适配器模型
model = AutoPeftModelForCausalLM.from_pretrained(
    path_to_adapter,  # 适配器的路径
    device_map="auto",  # 自动映射设备
    trust_remote_code=True  # 信任远程代码
).eval()  # 设置为评估模式
new_model_directory = "/root/autodl-tmp/model/New-Model"
tokenizer = AutoTokenizer.from_pretrained(
    path_to_adapter, trust_remote_code=True,
)
tokenizer.save_pretrained(new_model_directory)

# 合并并卸载模型
merged_model = model.merge_and_unload()
# 保存合并后的模型
merged_model.save_pretrained(new_model_directory, max_shard_size="2048MB", safe_serialization=True)

8.部署

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from PIL import Image
torch.manual_seed(1234)
# Load the tokenizer
tokenizer = AutoTokenizer.from_pretrained("/root/autodl-tmp/model/New-Model", trust_remote_code=True)

# Load the model with GPU device map
model = AutoModelForCausalLM.from_pretrained(
    "/root/autodl-tmp/model/New-Model", 
    device_map="auto", 
    trust_remote_code=True
).eval()

query = tokenizer.from_list_format([
    {'image': 'https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/demo.jpeg'},
    {'text': '这是什么'},
])

response, history = model.chat(tokenizer, query, history=None)
print("回答如下:\n", response)

9.保存依赖包信息

pip freeze > requirements_qwen_vl_sy.txt

依赖包内容：

absl-py==2.1.0
accelerate==0.32.1
aiofiles==23.2.1
aiohttp==3.9.5
aiosignal==1.3.1
altair==5.3.0
annotated-types==0.7.0
anyio==4.4.0
attrs==23.2.0
auto_gptq==0.7.1
certifi==2024.7.4
charset-normalizer==3.3.2
click==8.1.7
coloredlogs==15.0.1
contourpy==1.2.1
cycler==0.12.1
datasets==2.20.0
deepspeed==0.14.4
dill==0.3.8
distro==1.9.0
dnspython==2.6.1
einops==0.8.0
email_validator==2.2.0
fastapi==0.111.1
fastapi-cli==0.0.4
ffmpy==0.3.2
filelock==3.13.1
fonttools==4.53.1
frozenlist==1.4.1
fsspec==2024.2.0
gekko==1.2.1
gradio==4.38.1
gradio_client==1.1.0
grpcio==1.65.1
h11==0.14.0
hjson==3.1.0
httpcore==1.0.5
httptools==0.6.1
httpx==0.27.0
huggingface-hub==0.24.0
humanfriendly==10.0
idna==3.7
importlib_resources==6.4.0
Jinja2==3.1.3
jsonschema==4.23.0
jsonschema-specifications==2023.12.1
kiwisolver==1.4.5
Markdown==3.6
markdown-it-py==3.0.0
MarkupSafe==2.1.5
matplotlib==3.9.1
mdurl==0.1.2
modelscope==1.16.1
mpmath==1.3.0
multidict==6.0.5
multiprocess==0.70.16
networkx==3.2.1
ninja==1.11.1.1
numpy==1.26.3
nvidia-cublas-cu12==12.1.3.1
nvidia-cuda-cupti-cu12==12.1.105
nvidia-cuda-nvrtc-cu12==12.1.105
nvidia-cuda-runtime-cu12==12.1.105
nvidia-cudnn-cu12==8.9.2.26
nvidia-cufft-cu12==11.0.2.54
nvidia-curand-cu12==10.3.2.106
nvidia-cusolver-cu12==11.4.5.107
nvidia-cusparse-cu12==12.1.0.106
nvidia-ml-py==12.555.43
nvidia-nccl-cu12==2.20.5
nvidia-nvjitlink-cu12==12.1.105
nvidia-nvtx-cu12==12.1.105
openai==1.35.15
optimum==1.21.2
orjson==3.10.6
packaging==24.1
pandas==2.2.2
peft==0.11.1
pillow==10.2.0
protobuf==4.25.3
psutil==6.0.0
py-cpuinfo==9.0.0
pyarrow==17.0.0
pyarrow-hotfix==0.6
pydantic==2.8.2
pydantic_core==2.20.1
pydub==0.25.1
Pygments==2.18.0
pyparsing==3.1.2
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
PyYAML==6.0.1
referencing==0.35.1
regex==2024.5.15
requests==2.32.3
rich==13.7.1
rouge==1.0.1
rpds-py==0.19.0
ruff==0.5.3
safetensors==0.4.3
scipy==1.14.0
semantic-version==2.10.0
sentencepiece==0.2.0
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
sse-starlette==2.1.2
starlette==0.37.2
sympy==1.12
tensorboard==2.17.0
tensorboard-data-server==0.7.2
tiktoken==0.7.0
tokenizers==0.13.3
tomlkit==0.12.0
toolz==0.12.1
torch==2.3.1+cu121
torchaudio==2.3.1+cu121
torchvision==0.18.1+cu121
tqdm==4.66.4
transformers==4.32.0
transformers-stream-generator==0.0.4
triton==2.3.1
typer==0.12.3
typing_extensions==4.9.0
tzdata==2024.1
urllib3==2.2.2
uvicorn==0.30.1
uvloop==0.19.0
watchfiles==0.22.0
websockets==11.0.3
Werkzeug==3.0.3
xxhash==3.4.1
yarl==1.9.4