XTuner 一个大语言模型&多模态模型微调工具箱。由 MMRazor 和 MMDeploy 联合开发。
1.Xtuner安装
安装虚拟环境
然后git clone
成功后效果如下
我们将 自己构造 <question text><image>--<answer text>
数据对,基于InternLM2_Chat_1.8B这个文本单模态模型,使用LLaVA方案,训练一个给InternLM2_Chat_1.8B使用的Image Projector文件。
分为pretrain和fine-tuning
cd ~ && git clone https://github.com/InternLM/tutorial -b camp2 && conda activate xtuner0.1.17 && cd tutorial
python /root/tutorial/xtuner/llava/llava_data/repeat.py \
-i /root/tutorial/xtuner/llava/llava_data/unique_data.json \
-o /root/tutorial/xtuner/llava/llava_data/repeated_data.json \
-n 200
cp /root/tutorial/xtuner/llava/llava_data/internlm2_chat_1_8b_llava_tutorial_fool_config.py /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py
# 查询xtuner内置配置文件
xtuner list-cfg -p llava_internlm2_chat_1_8b
# 拷贝配置文件到当前目录
xtuner copy-cfg \
llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune \
/root/tutorial/xtuner/llava
修改llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py
文件中的:
- pretrained_pth
- llm_name_or_path
- visual_encoder_name_or_path
- data_root
- data_path
- image_folder
# Model
- llm_name_or_path = 'internlm/internlm2-chat-1_8b'
+ llm_name_or_path = '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b'
- visual_encoder_name_or_path = 'openai/clip-vit-large-patch14-336'
+ visual_encoder_name_or_path = '/root/share/new_models/openai/clip-vit-large-patch14-336'
# Specify the pretrained pth
- pretrained_pth = './work_dirs/llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain/iter_2181.pth' # noqa: E501
+ pretrained_pth = '/root/share/new_models/xtuner/iter_2181.pth'
# Data
- data_root = './data/llava_data/'
+ data_root = '/root/tutorial/xtuner/llava/llava_data/'
- data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
+ data_path = data_root + 'repeated_data.json'
- image_folder = data_root + 'llava_images'
+ image_folder = data_root
# Scheduler & Optimizer
- batch_size = 16 # per_device
+ batch_size = 1 # per_device
# evaluation_inputs
- evaluation_inputs = ['请描述一下这张图片','Please describe this picture']
+ evaluation_inputs = ['Please describe this picture','What is the equipment in the image?']
Finetune前
即:加载 1.8B 和 Pretrain阶段产物(iter_2181) 到显存。
# 解决小bug export MKL_SERVICE_FORCE_INTEL=1 export MKL_THREADING_LAYER=GNU # pth转huggingface xtuner convert pth_to_hf \ llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain \ /root/share/new_models/xtuner/iter_2181.pth \ /root/tutorial/xtuner/llava/llava_data/iter_2181_hf # 启动! xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \ --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \ --llava /root/tutorial/xtuner/llava/llava_data/iter_2181_hf \ --prompt-template internlm2_chat \ --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
Q1: Describe this image.
Q2: What is the equipment in the image?
Finetune后
即:加载 1.8B 和 Fintune阶段产物 到显存。
# 解决小bug export MKL_SERVICE_FORCE_INTEL=1 export MKL_THREADING_LAYER=GNU # pth转huggingface xtuner convert pth_to_hf \ /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py \ /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth \ /root/tutorial/xtuner/llava/llava_data/iter_1200_hf # 启动! xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \ --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \ --llava /root/tutorial/xtuner/llava/llava_data/iter_1200_hf \ --prompt-template internlm2_chat \ --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
Q1: Describe this image.
Q2: What is the equipment in the image?
这是fintune前的,只会打标题,可以看出我们的fintune比较有效