1 Qwen-VL模型下载
2 预训练模型下载
2.1 国外
2.2 国内镜像(建议)
https://hf-mirror.com/models?search=qwen-vl
3 代码、预训练模型上传到GPU服务器
4 模型修改
4.1 tokenization_qwen.py(Line 30)
FONT_PATH = '/home/pod/Qwen-VL-Chat/SimSun.ttf'
# FONT_PATH = try_to_load_from_cache("Qwen/Qwen-VL-Chat", "SimSun.ttf")
# if FONT_PATH is None:
# if not os.path.exists("SimSun.ttf"):
# ttf = requests.get("https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen-VL/assets/SimSun.ttf")
# open("SimSun.ttf", "wb").write(ttf.content)
# FONT_PATH = "SimSun.ttf"
4.2 web_demo_mm.py(Line )
# DEFAULT_CKPT_PATH = 'qwen/Qwen-VL-Chat'
DEFAULT_CKPT_PATH ='/home/pod/Qwen-VL-Chat'
5 本地访问
5.1 GPU服务器端口开通
5.2 运行web_demo_mm.py
python web_demo_mm.py --server-name=0.0.0.0 --server-port=A
5.3 本地访问
http://B(IP地址:端口号)
6 安装中问题记录(未解决,能正常运行)
6.1 pip install -r requirement.txt
6.2 pip install -r requirements_openai_api.txt
6.3 pip install -r requirements_web_demo.txt
7 模型微调
7.1 参考文献
7.1.1 Qwen-VL多模态大模型的微调与部署
https://zhuanlan.zhihu.com/p/701818093
7.1.2 Qwen-VL本地化部署及微调实践
https://blog.csdn.net/weixin_44455388/article/details/136251662
7.2 具体代码
7.2.1 finetune_lora_single_gpu.sh
#!/bin/bash
export CUDA_DEVICE_MAX_CONNECTIONS=1
DIR=`pwd`
MODEL='/home/pod/Qwen-VL-Chat' #"Qwen/Qwen-VL-Chat"/"Qwen/Qwen-VL" # Set the path if you do not want to load from huggingface directly
# ATTENTION: specify the path to your training data, which should be a json file consisting of a list of conversations.
# See the section for finetuning in README for more information.
DATA="/home/pod/Qwen-VL-master/label.json"
export CUDA_VISIBLE_DEVICES=0
python finetune.py \
--model_name_or_path $MODEL \
--data_path $DATA \
--bf16 True \
--fix_vit True \
--output_dir output_qwen \
--num_train_epochs 5 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 1000 \
--save_total_limit 10 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 1024 \
--lazy_preprocess True \
--gradient_checkpointing \
--use_lora
7.2.2 flabel.json
[
{
"id": "identity_0",
"conversations": [
{
"from": "user",
"value": "你好"
},
{
"from": "assistant",
"value": "我是Qwen-VL,一个支持视觉输入的大模型。"
}
]
},
{
"id": "identity_1",
"conversations": [
{
"from": "user",
"value": "Picture 1: <img>assets/demo.jpeg</img>\n图中的狗是什么品种?"
},
{
"from": "assistant",
"value": "图中是一只拉布拉多犬。"
},
{
"from": "user",
"value": "框出图中的格子衬衫"
},
{
"from": "assistant",
"value": "<ref>格子衬衫</ref><box>(588,499),(725,789)</box>"
}
]
},
{
"id": "identity_2",
"conversations": [
{
"from": "user",
"value": "Picture 1: <img>assets/mm_tutorial/Chongqing.jpeg</img>\nPicture 2: <img>assets/mm_tutorial/Beijing.jpeg</img>\n图中都是哪"
},
{
"from": "assistant",
"value": "第一张图片是重庆的城市天际线,第二张图片是北京的天际线。"
}
]
}
]