Llama3实战记录之llama3图片理解能力微调

最新推荐文章于 2024-06-21 19:42:43 发布

m0_65719612

最新推荐文章于 2024-06-21 19:42:43 发布

阅读量2.6k

点赞数 12

文章标签：语言模型 llama 人工智能

本文链接：https://blog.csdn.net/m0_65719612/article/details/138571725

版权

随着 XTuner 团队放出了基于 Llama3-8B 的 LLaVA 模型，我们也是第一时间与 XTuner 团队取得了联系，并获得了他们已经预训练好的 Image Projector。接下来，我们将带大家基于 Llama3-8B-Instruct 和 XTuner 团队预训练好的 Image Projector 微调自己的多模态图文理解模型 LLaVA。

环境、模型、数据准备

配置环境

我们先来配置相关环境。使用如下指令便可以安装好一个 python=3.10 pytorch=2.1.2+cu121 的基础环境了。

conda create -n llama3 python=3.10
conda activate llama3
conda install pytorch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 pytorch-cuda=12.1 -c pytorch -c nvidia

接下来我们安装 XTuner。

cd ~
git clone -b v0.1.18 https://github.com/InternLM/XTuner
cd XTuner
pip install -e .[all]

如果在前面的课程中已经配置好了环境，在这里也可以选择直接执行 conda activate llama3 以进入环境。

最后我们 clone 本教程仓库。

cd ~
git clone https://github.com/SmartFlowAI/Llama3-Tutorial

模型准备

准备 Llama3 权重

在微调开始前，我们首先来准备 Llama3-8B-Instruct 模型权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/meta-llama/Meta-Llama-3-8B-Instruct .

非 InternStudio

我们选择从 OpenXLab 上下载 Meta-Llama-3-8B-Instruct 的权重。

mkdir -p ~/model
cd ~/model
git lfs install
git clone https://code.openxlab.org.cn/MrCat/Llama-3-8B-Instruct.git Meta-Llama-3-8B-Instruct

准备 Visual Encoder 权重

我们接下来准备 Llava 所需要的 openai/clip-vit-large-patch14-336，权重，即 Visual Encoder 权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/openai/clip-vit-large-patch14-336 .

非 InternStudio

可以访问 https://huggingface.co/openai/clip-vit-large-patch14-336 以进行下载。

准备 Image Projector 权重

然后我们准备 Llava 将要用到的 Image Projector 部分权重。

InternStudio

mkdir -p ~/model
cd ~/model
ln -s /root/share/new_models/xtuner/llama3-llava-iter_2181.pth .

非 InternStudio

相关权重可以访问：https://huggingface.co/xtuner/llava-llama-3-8b 以及 https://huggingface.co/xtuner/llava-llama-3-8b-v1_1 。（已经过微调，并非 Pretrain 阶段的 Image Projector）

都下载好了。

数据准备

我们按照 https://github.com/InternLM/Tutorial/blob/camp2/xtuner/llava/xtuner_llava.md 中的教程来准备微调数据。为了让大家可以快速上手，我们选择了使用过拟合的方式快速实现。

可以执行以下代码：

cd ~
git clone https://github.com/InternLM/tutorial -b camp2
python ~/tutorial/xtuner/llava/llava_data/repeat.py \
-i ~/tutorial/xtuner/llava/llava_data/unique_data.json \
-o ~/tutorial/xtuner/llava/llava_data/repeated_data.json \
-n 200

XTurner微调

训练启动
我们已经为大家准备好了可以一键启动的配置文件，主要是修改好了模型路径、对话模板以及数据路径。

我们使用如下指令以启动训练：

xtuner train ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py --work-dir ~/llama3_llava_pth --deepspeed deepspeed_zero2_offload

训练过程所需显存约为44447 MiB，在单卡24G显存的 A100 上训练所需时间约为5小时。

在训练好之后，我们将原始 image projector 和我们微调得到的 image projector 都转换为 HuggingFace 格式，为了下面的效果体验做准备。

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/model/llama3-llava-iter_2181.pth \
  ~/llama3_llava_pth/pretrain_iter_2181_hf

我重新给预训练的hf更换了名字，重新生成了一下：
xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
>   ~/model/llama3-llava-iter_2181.pth \
>   ~/llama3_llava_pth/pretrain_iter_hf
最后用这个命令。

xtuner convert pth_to_hf ~/Llama3-Tutorial/configs/llama3-llava/llava_llama3_8b_instruct_qlora_clip_vit_large_p14_336_lora_e1_finetune.py \
  ~/llama3_llava_pth/iter_1200.pth \
  ~/llama3_llava_pth/iter_1200_hf

提问

问题1：Describe this image.
问题2：What is the equipment in the image?

Pretrain模型—只会为图片打标签，并不能回答问题。

xtuner chat /root/model/Meta-Llama-3-8B-Instruct

--visual-encoder /root/model/clip-vit-large-patch14-336

--llava /root/llama3_llava_pth/pretrain_iter_2181_hf

--prompt-template llama3_chat

--image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

最后把预训练的hf模型改名字，用这个命令就解决了：

xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
> --visual-encoder /root/model/clip-vit-large-patch14-336 \
> --llava /root/llama3_llava_pth/pretrain_iter_hf \
> --prompt-template llama3_chat \
> --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

只能给图像打标签。

有一说一，预训练模型到这个程序也不错啦。后续对其他图片再微调下。

微调训练后的模型表现不错：

export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
--visual-encoder /root/model/clip-vit-large-patch14-336 \
--llava /root/llama3_llava_pth/iter_1200_hf \
--prompt-template llama3_chat \
--image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg

经过 Finetune 后，我们可以发现，模型已经可以根据图片回答我们的问题了。
————————————————

PS:

遇到坑坑记录

验证Pretrain 模型时
export MKL_SERVICE_FORCE_INTEL=1
xtuner chat /root/model/Meta-Llama-3-8B-Instruct \
--visual-encoder /root/model/clip-vit-large-patch14-336 \
--llava /root/llama3_llava_pth/pretrain_iter_2181_hf \
--prompt-template llama3_chat \
--image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
报这个错误是什么原因呢？
huggingface_hub.errors.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/root/llama3_llava_pth/pretrain_iter_2181_hf/llm_adapter'. Use `repo_type` argument if needed.

解答：

from setfit import SetFitModel

model = SetFitModel.from_pretrained("/path/to/model-directory", local_files_only=True)

Finetune 后模型—可以根据图片回答我们的问题了

要增加这个参数local_files_only=True。

最后给预训练的hf改了个名字重新生成就ok啦。

m0_65719612

关注

12
点赞
踩
21

收藏

觉得还不错? 一键收藏
1
评论
Llama3实战记录之llama3图片理解能力微调

随着 XTuner 团队放出了基于 Llama3-8B 的 LLaVA 模型，我们也是第一时间与 XTuner 团队取得了联系，并获得了他们已经预训练好的 Image Projector。接下来，我们将带大家基于 Llama3-8B-Instruct 和 XTuner 团队预训练好的 Image Projector 微调自己的多模态图文理解模型 LLaVA。
复制链接

扫一扫