XTuner 微调 Llama3 图片理解多模态

最新推荐文章于 2024-05-15 15:53:19 发布

KD335

最新推荐文章于 2024-05-15 15:53:19 发布

阅读量1.5k

点赞数 21

文章标签：人工智能

本文链接：https://blog.csdn.net/weixin_65461886/article/details/138639574

版权

参考文章：Llama3-Tutorial/docs/llava.md at main · SmartFlowAI/Llama3-Tutorial (github.com)配置环境

conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .[all]

clone 本教程仓库

cd ~
git clone https://github.com/SmartFlowAI/Llama3-Tutorial

InternStudio按照以下代码输入

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct .

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/openai/clip-vit-large-patch14-336 .

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/xtuner/llama3-llava-iter_2181.pth .

cd ~
git clone https://github.com/InternLM/tutorial -b camp2
python ~/tutorial/xtuner/llava/llava_data/repeat.py \
  -i ~/tutorial/xtuner/llava/llava_data/unique_data.json \
  -o ~/tutorial/xtuner/llava/llava_data/repeated_data.json \
  -n 200

然后训练启动

xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py --work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2

在训练好之后，我们将原始 image projector 和我们微调得到的 image projector 都转换为 HuggingFace 格式。

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/model/llama3-llava-iter_2181.pth \
  ~/llama3_llava_pth/pretrain_iter_2181_hf

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/llama3_llava_pth/iter_1200.pth \
  ~/llama3_llava_pth/iter_1200_hf

在转换完成后，我们就可以在命令行简单体验一下微调后模型的效果了。

问题1：Describe this image. 问题2：What is the equipment in the image?

Pretrain 模型

export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
  --visual-encoder /root/model/clip-vit-large-patch14-336 \
  --llava /root/llama3_llava_pth/pretrain_iter_2181_hf \
  --prompt-template llama3_chat \
  --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

此时可以看到，Pretrain 模型只会为图片打标签，并不能回答问题。

Finetune 后模型

export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
  --visual-encoder /root/model/clip-vit-large-patch14-336 \
  --llava /root/llama3_llava_pth/iter_1200_hf \
  --prompt-template llama3_chat \
  --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

经过 Finetune 后，我们可以发现，模型已经可以根据图片回答我们的问题了。

KD335

关注

21
点赞
踩
38

收藏

觉得还不错? 一键收藏
0
评论
XTuner 微调 Llama3 图片理解多模态

在训练好之后，我们将原始 image projector 和我们微调得到的 image projector 都转换为 HuggingFace 格式。经过 Finetune 后，我们可以发现，模型已经可以根据图片回答我们的问题了。此时可以看到，Pretrain 模型只会为图片打标签，并不能回答问题。在转换完成后，我们就可以在命令行简单体验一下微调后模型的效果了。clone 本教程仓库。
复制链接

扫一扫