主要内容:
1、微调理论讲解及 XTuner 介绍
2、XTuner 微调小助手个人认知实战
3、XTuner 微调 llava 图片理解多模态模型实战
1 课程笔记
1.1 理论
微调
两种范式
例子:
只有增量预训练:得到的类似相似度的内容
数据
标准的数据格式:训练框架能够识别的数据格式。(System\User\Assistant)
XTune中的数据格式:json格式
底模——角色(LoRA)
4-Bit 加载的时候数据四舍五入
XTurner
- 傻瓜
- 轻量 7B的模型用8GB的消费级显卡
上手
chat是指令微调后的模型
超参都在配置文件里面
数据集格式转换
max_to_tuner:让显存吃满
训练加速
示例
底座模型——微调的模型(中间产物)——最终版
Image Project 进行图像向量化
图像+文本的的多模态模型其实就是训练Image Project的过程
识图和生图
预训练阶段的数据
FT阶段数据:
1.2 实战
微调 ——小助手项目
- 新建开发机 10%
- 新建环境
- 复制源码、进行安装
- 创建数据集:数据的存放目录
/root/ft/data
- 创建生成数据的文件——
generate_data.py
。OpenAI格式的json文件。 - 模型准备。模型的软链接(复制)
ln
- 配置文件。根据模型和微调方法。列出配置文件的指令;根据模型找配置文件的指令。
- 复制配置文件到
/root/ft/config
- 修改配置文件。
- 指定路径进行训练。config路径、workdir存放路径。
- pytorch格式的pth文件转换成HG格式的文件。
convert pth_to_hf 配置文件 权重文件 输出文件夹
全脸微调和QLoRA微调都需要使用。 - 整合。/root/ft/final_model文件夹,
xtuner convert merge /root/ft/model hf文件夹 输出文件夹
- 测试。
xtuner chat /root/ft/final_model --prompr-template internlm2_chat
。(教程中过拟合,散失了基本能力) - web部署。
streamlit
- 克隆代码。主要使用web_demo代码
- 重新配置。模型路径、分词器路径、avatar、system_prompt、与CLI对齐。
- 本地映射。ssh操作。
ssh -CNg -L6006:本地ip 服务器账号@ip -p 端口
多模态训练和测试
30% 24G
FT前只会打标签,不管问什么问题。
FT后会描述图片
- 开发机
- 环境安装、激活
- 新建项目文件
/root/xtuner
- 拉取xtuner的源码
- 安装源码
- 构建数据对。图片+标签 (略)
- 预训练。指令(略)
- 构建图片-文本对话文件。借助于GPT生成问答对,json格式。(略)。教程中的做法(同样的问答对200次)
- 配置文件。查询-拷贝
- 修改。基座模型、图像模型、预训练模型、数据文件夹、json文件、图像文件夹、batch size、
- FT 指令。
- 对比:
- 微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
- 微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
- Image Project相当于image encoding。(个人看法)
2 作业
记录复现过程并截图
基础作业(结营必做)
- 训练自己的小助手认知(记录复现过程并截图)
进阶作业
- 将自我认知的模型上传到 OpenXLab,并将应用部署到 OpenXLab(优秀学员必做)
- 复现多模态微调(优秀学员必做)
OpenXLab 部署教程:https://github.com/InternLM/Tutorial/tree/camp2/tools/openxlab-deploy
2.1 训练小助手
参考:https://github.com/InternLM/Tutorial/blob/camp2/xtuner/personal_assistant_document.md
- 新建开发机 10%
- vscode 连结
-
新建环境并激活
- 指令:
# 新建虚拟环境 studio-conda xtuner0.1.17 # 激活环境 conda activate xtuner0.1.17 ```![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/358e76d27b51442296304cb404684274.png#pic_center)
-
复制源码、进行安装
-
创建本本文件夹(项目文件夹)——拉文件——安装源码
-
# 建 kdir /root/xtuner0117 && cd /root/xtuner0117 # 拉 git clone -b v0.1.17 https://github.com/InternLM/xtuner # 装 cd /root/xtuner0117/xtuner pip install -e '.[all]'
-
-
创建数据集:数据的存放目录/root/ft/data
- 微调文件管理:新建
/root/ft
/root/ft/data
mkdir -p /root/ft mkdir -p /root/ft/data
- 微调文件管理:新建
-
创建生成数据的文件——
generate_data.py
。OpenAI格式的json文件。-
data下新建生成数据的脚本并运行。
-
touch /root/ft/data/generate_data.py && cd /root/ft/data/ python /root/ft/data/generate_data.py
-
import json # 设置用户的名字 name = '齐天大圣孙悟空' # 修改为自己的名称 # 设置需要重复添加的数据次数 n = 500 # 初始化OpenAI格式的数据结构 data = [ { "messages": [ { "role": "user", "content": "请做一下自我介绍" }, { "role": "assistant", "content": "我是{}的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦".format(name) } ] } ] # 通过循环,将初始化的对话数据重复添加到data列表中 for i in range(n): data.append(data[0]) # 将data列表中的数据写入到一个名为'personal_assistant.json'的文件中 with open('personal_assistant.json', 'w', encoding='utf-8') as f: # 使用json.dump方法将数据以JSON格式写入文件 # ensure_ascii=False 确保中文字符正常显示 # indent=4 使得文件内容格式化,便于阅读 json.dump(data, f, ensure_ascii=False, indent=4)
-
-
模型准备。模型的软链接(复制)
ln
- 模型软链接
# 创建目标文件夹,确保它存在。 # -p选项意味着如果上级目录不存在也会一并创建,且如果目标文件夹已存在则不会报错。 # mkdir -p /root/ft/model # 复制内容到目标文件夹。-r选项表示递归复制整个文件夹。 # cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/* /root/ft/model/ # 删除/root/ft/model目录 # rm -rf /root/ft/model # 创建符号链接 ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b /root/ft/model
-
前
-
后:
-
配置文件。根据模型和微调方法。列出配置文件的指令;根据模型找配置文件的指令。
# 列出所有内置配置文件 # xtuner list-cfg # 假如我们想找到 internlm2-1.8b 模型里支持的配置文件 xtuner list-cfg -p internlm2_1_8b # 搜索匹配internlm2_1_8b
-
复制配置文件到`/root/ft/config
# 创建一个存放 config 文件的文件夹 mkdir -p /root/ft/config # 使用 XTuner 中的 copy-cfg 功能将 config 文件复制到指定的位置 xtuner copy-cfg internlm2_1_8b_qlora_alpaca_e3 /root/ft/config
-
修改配置文件。
# Copyright (c) OpenMMLab. All rights reserved. import torch from datasets import load_dataset from mmengine.dataset import DefaultSampler from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook, LoggerHook, ParamSchedulerHook) from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR from peft import LoraConfig from torch.optim import AdamW from transformers import (AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig) from xtuner.dataset import process_hf_dataset from xtuner.dataset.collate_fns import default_collate_fn from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factory from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook, VarlenAttnArgsToMessageHubHook) from xtuner.engine.runner import TrainLoop from xtuner.model import SupervisedFinetune from xtuner.parallel.sequence import SequenceParallelSampler from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE ####################################################################### # PART 1 Settings # ####################################################################### # Model pretrained_model_name_or_path = '/root/ft/model' use_varlen_attn = False # Data alpaca_en_path = '/root/ft/data/personal_assistant.json' prompt_template = PROMPT_TEMPLATE.default max_length = 1024 pack_to_max_length = True # parallel sequence_parallel_size = 1 # Scheduler & Optimizer batch_size = 1 # per_device accumulative_counts = 16 accumulative_counts *= sequence_parallel_size dataloader_num_workers = 0 max_epochs = 2 optim_type = AdamW lr = 2e-4 betas = (0.9, 0.999) weight_decay = 0 max_norm = 1 # grad clip warmup_ratio = 0.03 # Save save_steps = 300 save_total_limit = 3 # Maximum checkpoints to keep (-1 means unlimited) # Evaluate the generation performance during the training evaluation_freq = 300 SYSTEM = '' evaluation_inputs = ['请你介绍一下你自己', '你是谁', '你是我的小助手吗'] ####################################################################### # PART 2 Model & Tokenizer # ####################################################################### tokenizer = dict( type=AutoTokenizer.from_pretrained, pretrained_model_name_or_path=pretrained_model_name_or_path, trust_remote_code=True, padding_side='right') model = dict( type=SupervisedFinetune, use_varlen_attn=use_varlen_attn, llm=dict( type=AutoModelForCausalLM.from_pretrained, pretrained_model_name_or_path=pretrained_model_name_or_path, trust_remote_code=True, torch_dtype=torch.float16, quantization_config=dict( type=BitsAndBytesConfig, load_in_4bit=True, load_in_8bit=False, llm_int8_threshold=6.0, llm_int8_has_fp16_weight=False, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type='nf4')), lora=dict( type=LoraConfig, r=64, lora_alpha=16, lora_dropout=0.1, bias='none', task_type='CAUSAL_LM')) ####################################################################### # PART 3 Dataset & Dataloader # ####################################################################### alpaca_en = dict( type=process_hf_dataset, dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)), tokenizer=tokenizer, max_length=max_length, dataset_map_fn=openai_map_fn, template_map_fn=dict( type=template_map_fn_factory, template=prompt_template), remove_unused_columns=True, shuffle_before_pack=True, pack_to_max_length=pack_to_max_length, use_varlen_attn=use_varlen_attn) sampler = SequenceParallelSampler \ if sequence_parallel_size > 1 else DefaultSampler train_dataloader = dict( batch_size=batch_size, num_workers=dataloader_num_workers, dataset=alpaca_en, sampler=dict(type=sampler, shuffle=True), collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn)) ####################################################################### # PART 4 Scheduler & Optimizer # ####################################################################### # optimizer optim_wrapper = dict( type=AmpOptimWrapper, optimizer=dict( type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay), clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False), accumulative_counts=accumulative_counts, loss_scale='dynamic', dtype='float16') # learning policy # More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md # noqa: E501 param_scheduler = [ dict( type=LinearLR, start_factor=1e-5, by_epoch=True, begin=0, end=warmup_ratio * max_epochs, convert_to_iter_based=True), dict( type=CosineAnnealingLR, eta_min=0.0, by_epoch=True, begin=warmup_ratio * max_epochs, end=max_epochs, convert_to_iter_based=True) ] # train, val, test setting train_cfg = dict(type=TrainLoop, max_epochs=max_epochs) ####################################################################### # PART 5 Runtime # ####################################################################### # Log the dialogue periodically during the training process, optional custom_hooks = [ dict(type=DatasetInfoHook, tokenizer=tokenizer), dict( type=EvaluateChatHook, tokenizer=tokenizer, every_n_iters=evaluation_freq, evaluation_inputs=evaluation_inputs, system=SYSTEM, prompt_template=prompt_template) ] if use_varlen_attn: custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)] # configure default hooks default_hooks = dict( # record the time of every iteration. timer=dict(type=IterTimerHook), # print log every 10 iterations. logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10), # enable the parameter scheduler. param_scheduler=dict(type=ParamSchedulerHook), # save checkpoint per `save_steps`. checkpoint=dict( type=CheckpointHook, by_epoch=False, interval=save_steps, max_keep_ckpts=save_total_limit), # set sampler seed in distributed evrionment. sampler_seed=dict(type=DistSamplerSeedHook), ) # configure environment env_cfg = dict( # whether to enable cudnn benchmark cudnn_benchmark=False, # set multi process parameters mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0), # set distributed parameters dist_cfg=dict(backend='nccl'), ) # set visualizer visualizer = None # set log level log_level = 'INFO' # load from which checkpoint load_from = None # whether to resume training from the loaded checkpoint resume = False # Defaults to use random seed and disable `deterministic` randomness = dict(seed=None, deterministic=False) # set log processor log_processor = dict(by_epoch=False)
-
指定路径进行训练。config路径、workdir存放路径。
# 指定保存路径 xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train # 使用 deepspeed 来加速训练 xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train_deepspeed --deepspeed deepspeed_zero2 # 模型续训 xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train --resume /root/ft/train/iter_600.pth
-
pytorch格式的pth文件转换成HG格式的文件( .bin 格式文件)。
convert pth_to_hf 配置文件 权重文件 输出文件夹
全脸微调和QLoRA微调都需要使用。# 创建一个保存转换后 Huggingface 格式的文件夹 mkdir -p /root/ft/huggingface # 模型转换 # xtuner convert pth_to_hf ${配置文件地址} ${权重文件地址} ${转换后模型保存地址} xtuner convert pth_to_hf /root/ft/train/internlm2_1_8b_qlora_alpaca_e3_copy.py /root/ft/train/iter_64.pth /root/ft/huggingface
-
整合。/root/ft/final_model文件夹,
xtuner convert merge /root/ft/model hf文件夹 输出文件夹
# 创建一个名为 final_model 的文件夹存储整合后的模型文件 mkdir -p /root/ft/final_model # 解决一下线程冲突的 Bug export MKL_SERVICE_FORCE_INTEL=1 # 进行模型整合 # xtuner convert merge ${NAME_OR_PATH_TO_LLM} ${NAME_OR_PATH_TO_ADAPTER} ${SAVE_PATH} xtuner convert merge /root/ft/model /root/ft/huggingface /root/ft/final_model
-
测试。
xtuner chat /root/ft/final_model --prompr-template internlm2_chat
。(教程中过拟合,散失了基本能力)- 微调后:
# 与模型进行对话 xtuner chat /root/ft/final_model --prompt-template internlm2_chat
double enter to end input (EXIT: exit chat, RESET: reset history) >>> 你是谁 我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s> double enter to end input (EXIT: exit chat, RESET: reset history) >>> 请你介绍一下你自己 我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s> double enter to end input (EXIT: exit chat, RESET: reset history) >>> 你是我的小助手吗? 我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s> double enter to end input (EXIT: exit chat, RESET: reset history) >>> EXIT Log: Exit!
数据训练次数过少
-
web部署。
streamlit
pip install streamlit==1.24.0
-
克隆代码。主要使用web_demo代码
# 创建存放 InternLM 文件的代码 mkdir -p /root/ft/web_demo && cd /root/ft/web_demo # 拉取 InternLM 源文件 git clone https://github.com/InternLM/InternLM.git # 进入该库中 cd /root/ft/web_demo/InternLM
-
web_demo.py 重新配置。模型路径、分词器路径、avatar、system_prompt、与CLI对齐。
- 修改模型
# 修改模型地址(第183行) - model = (AutoModelForCausalLM.from_pretrained('/root/ft/final_model', + model = (AutoModelForCausalLM.from_pretrained('/root/ft/model', # 修改分词器地址(第186行) - tokenizer = AutoTokenizer.from_pretrained('/root/ft/final_model', + tokenizer = AutoTokenizer.from_pretrained('/root/ft/model',
- web_demo.py
"""This script refers to the dialogue example of streamlit, the interactive generation code of chatglm2 and transformers. We mainly modified part of the code logic to adapt to the generation of our model. Please refer to these links below for more information: 1. streamlit chat example: https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps 2. chatglm2: https://github.com/THUDM/ChatGLM2-6B 3. transformers: https://github.com/huggingface/transformers Please run with the command `streamlit run path/to/web_demo.py --server.address=0.0.0.0 --server.port 7860`. Using `python path/to/web_demo.py` may cause unknown problems. """ # isort: skip_file import copy import warnings from dataclasses import asdict, dataclass from typing import Callable, List, Optional import streamlit as st import torch from torch import nn from transformers.generation.utils import (LogitsProcessorList, StoppingCriteriaList) from transformers.utils import logging from transformers import AutoTokenizer, AutoModelForCausalLM # isort: skip logger = logging.get_logger(__name__) @dataclass class GenerationConfig: # this config is used for chat to provide more diversity max_length: int = 2048 top_p: float = 0.75 temperature: float = 0.1 do_sample: bool = True repetition_penalty: float = 1.000 @torch.inference_mode() def generate_interactive( model, tokenizer, prompt, generation_config: Optional[GenerationConfig] = None, logits_processor: Optional[LogitsProcessorList] = None, stopping_criteria: Optional[StoppingCriteriaList] = None, prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor], List[int]]] = None, additional_eos_token_id: Optional[int] = None, **kwargs, ): inputs = tokenizer([prompt], padding=True, return_tensors='pt') input_length = len(inputs['input_ids'][0]) for k, v in inputs.items(): inputs[k] = v.cuda() input_ids = inputs['input_ids'] _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1] if generation_config is None: generation_config = model.generation_config generation_config = copy.deepcopy(generation_config) model_kwargs = generation_config.update(**kwargs) bos_token_id, eos_token_id = ( # noqa: F841 # pylint: disable=W0612 generation_config.bos_token_id, generation_config.eos_token_id, ) if isinstance(eos_token_id, int): eos_token_id = [eos_token_id] if additional_eos_token_id is not None: eos_token_id.append(additional_eos_token_id) has_default_max_length = kwargs.get( 'max_length') is None and generation_config.max_length is not None if has_default_max_length and generation_config.max_new_tokens is None: warnings.warn( f"Using 'max_length''s default ({repr(generation_config.max_length)}) \ to control the generation length. " 'This behaviour is deprecated and will be removed from the \ config in v5 of Transformers -- we' ' recommend using `max_new_tokens` to control the maximum \ length of the generation.', UserWarning, ) elif generation_config.max_new_tokens is not None: generation_config.max_length = generation_config.max_new_tokens + \ input_ids_seq_length if not has_default_max_length: logger.warn( # pylint: disable=W4902 f"Both 'max_new_tokens' (={generation_config.max_new_tokens}) " f"and 'max_length'(={generation_config.max_length}) seem to " "have been set. 'max_new_tokens' will take precedence. " 'Please refer to the documentation for more information. ' '(https://huggingface.co/docs/transformers/main/' 'en/main_classes/text_generation)', UserWarning, ) if input_ids_seq_length >= generation_config.max_length: input_ids_string = 'input_ids' logger.warning( f"Input length of {input_ids_string} is {input_ids_seq_length}, " f"but 'max_length' is set to {generation_config.max_length}. " 'This can lead to unexpected behavior. You should consider' " increasing 'max_new_tokens'.") # 2. Set generation parameters if not already defined logits_processor = logits_processor if logits_processor is not None \ else LogitsProcessorList() stopping_criteria = stopping_criteria if stopping_criteria is not None \ else StoppingCriteriaList() logits_processor = model._get_logits_processor( generation_config=generation_config, input_ids_seq_length=input_ids_seq_length, encoder_input_ids=input_ids, prefix_allowed_tokens_fn=prefix_allowed_tokens_fn, logits_processor=logits_processor, ) stopping_criteria = model._get_stopping_criteria( generation_config=generation_config, stopping_criteria=stopping_criteria) logits_warper = model._get_logits_warper(generation_config) unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1) scores = None while True: model_inputs = model.prepare_inputs_for_generation( input_ids, **model_kwargs) # forward pass to get next token outputs = model( **model_inputs, return_dict=True, output_attentions=False, output_hidden_states=False, ) next_token_logits = outputs.logits[:, -1, :] # pre-process distribution next_token_scores = logits_processor(input_ids, next_token_logits) next_token_scores = logits_warper(input_ids, next_token_scores) # sample probs = nn.functional.softmax(next_token_scores, dim=-1) if generation_config.do_sample: next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1) else: next_tokens = torch.argmax(probs, dim=-1) # update generated ids, model inputs, and length for next step input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1) model_kwargs = model._update_model_kwargs_for_generation( outputs, model_kwargs, is_encoder_decoder=False) unfinished_sequences = unfinished_sequences.mul( (min(next_tokens != i for i in eos_token_id)).long()) output_token_ids = input_ids[0].cpu().tolist() output_token_ids = output_token_ids[input_length:] for each_eos_token_id in eos_token_id: if output_token_ids[-1] == each_eos_token_id: output_token_ids = output_token_ids[:-1] response = tokenizer.decode(output_token_ids) yield response # stop when each sentence is finished # or if we exceed the maximum length if unfinished_sequences.max() == 0 or stopping_criteria( input_ids, scores): break def on_btn_click(): del st.session_state.messages @st.cache_resource def load_model(): model = (AutoModelForCausalLM.from_pretrained('/root/ft/final_model', trust_remote_code=True).to( torch.bfloat16).cuda()) tokenizer = AutoTokenizer.from_pretrained('/root/ft/final_model', trust_remote_code=True) return model, tokenizer def prepare_generation_config(): with st.sidebar: max_length = st.slider('Max Length', min_value=8, max_value=32768, value=2048) top_p = st.slider('Top P', 0.0, 1.0, 0.75, step=0.01) temperature = st.slider('Temperature', 0.0, 1.0, 0.1, step=0.01) st.button('Clear Chat History', on_click=on_btn_click) generation_config = GenerationConfig(max_length=max_length, top_p=top_p, temperature=temperature) return generation_config user_prompt = '<|im_start|>user\n{user}<|im_end|>\n' robot_prompt = '<|im_start|>assistant\n{robot}<|im_end|>\n' cur_query_prompt = '<|im_start|>user\n{user}<|im_end|>\n\ <|im_start|>assistant\n' def combine_history(prompt): messages = st.session_state.messages meta_instruction = ('') total_prompt = f"<s><|im_start|>system\n{meta_instruction}<|im_end|>\n" for message in messages: cur_content = message['content'] if message['role'] == 'user': cur_prompt = user_prompt.format(user=cur_content) elif message['role'] == 'robot': cur_prompt = robot_prompt.format(robot=cur_content) else: raise RuntimeError total_prompt += cur_prompt total_prompt = total_prompt + cur_query_prompt.format(user=prompt) return total_prompt def main(): # torch.cuda.empty_cache() print('load model begin.') model, tokenizer = load_model() print('load model end.') st.title('InternLM2-Chat-1.8B') generation_config = prepare_generation_config() # Initialize chat history if 'messages' not in st.session_state: st.session_state.messages = [] # Display chat messages from history on app rerun for message in st.session_state.messages: with st.chat_message(message['role'], avatar=message.get('avatar')): st.markdown(message['content']) # Accept user input if prompt := st.chat_input('What is up?'): # Display user message in chat message container with st.chat_message('user'): st.markdown(prompt) real_prompt = combine_history(prompt) # Add user message to chat history st.session_state.messages.append({ 'role': 'user', 'content': prompt, }) with st.chat_message('robot'): message_placeholder = st.empty() for cur_response in generate_interactive( model=model, tokenizer=tokenizer, prompt=real_prompt, additional_eos_token_id=92542, **asdict(generation_config), ): # Display robot response in chat message container message_placeholder.markdown(cur_response + '▌') message_placeholder.markdown(cur_response) # Add robot response to chat history st.session_state.messages.append({ 'role': 'robot', 'content': cur_response, # pylint: disable=undefined-loop-variable }) torch.cuda.empty_cache() if __name__ == '__main__': main()
- run :
streamlit run /root/ft/web_demo/InternLM/chat/web_demo.py --server.address 127.0.0.1 --server.port 6006
-
本地映射。ssh操作。
ssh -CNg -L6006:本地ip 服务器账号@ip -p 端口
```ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p
2.2 上传模型到OpenXLab
参考:https://github.com/InternLM/Tutorial/tree/camp2/tools/openxlab-deploy
2.3 复现多模态
参考:https://github.com/InternLM/Tutorial/blob/camp2/xtuner/llava/xtuner_llava.md
-
开发机
-
环境安装、激活
cd ~ && studio-conda xtuner0.1.17 conda activate xtuner0.1.17
-
新建项目文件
/root/xtuner0117
mkdir -p /root/xtuner0117 && cd /root/xtuner0117
-
拉取xtuner的源码
git clone -b v0.1.17 https://github.com/InternLM/xtuner
-
安装源码
pip install -e '.[all]' && cd ~
第一个座椅已经完成到此步骤
-
构建数据对。图片+标签 (略)
-
预训练。指令(略)8卡
NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2 NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2
-
构建图片-文本对话文件。借助于GPT生成问答对,json格式。(略)。教程中的做法(同样的问答对200次)
- 重复两百次
cd ~ && git clone https://github.com/InternLM/tutorial -b camp2 && conda activate xtuner0.1.17 && cd tutorial python /root/tutorial/xtuner/llava/llava_data/repeat.py \ -i /root/tutorial/xtuner/llava/llava_data/unique_data.json \ -o /root/tutorial/xtuner/llava/llava_data/repeated_data.json \ -n 200
-
配置文件。查询-拷贝
- 直接
cp /root/tutorial/xtuner/llava/llava_data/internlm2_chat_1_8b_llava_tutorial_fool_config.py /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py
# 或者查询xtuner内置配置文件 xtuner list-cfg -p llava_internlm2_chat_1_8b # 拷贝配置文件到当前目录 xtuner copy-cfg \ llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune \ /root/tutorial/xtuner/llava
- 直接
-
修改。基座模型、图像模型、预训练模型、数据文件夹、json文件、图像文件夹、batch size、
# Model - llm_name_or_path = 'internlm/internlm2-chat-1_8b' + llm_name_or_path = '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b' - visual_encoder_name_or_path = 'openai/clip-vit-large-patch14-336' + visual_encoder_name_or_path = '/root/share/new_models/openai/clip-vit-large-patch14-336' # Specify the pretrained pth - pretrained_pth = './work_dirs/llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain/iter_2181.pth' # noqa: E501 + pretrained_pth = '/root/share/new_models/xtuner/iter_2181.pth' # Data - data_root = './data/llava_data/' + data_root = '/root/tutorial/xtuner/llava/llava_data/' - data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json' + data_path = data_root + 'repeated_data.json' - image_folder = data_root + 'llava_images' + image_folder = data_root # Scheduler & Optimizer - batch_size = 16 # per_device + batch_size = 1 # per_device # evaluation_inputs - evaluation_inputs = ['请描述一下这张图片','Please describe this picture'] + evaluation_inputs = ['Please describe this picture','What is the equipment in the image?']
-
FT 指令。
cd /root/tutorial/xtuner/llava/ xtuner train /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py --deepspeed deepspeed_zero2
-
对比:
-
前。微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
# 解决小bug export MKL_SERVICE_FORCE_INTEL=1 export MKL_THREADING_LAYER=GNU # pth转huggingface xtuner convert pth_to_hf \ llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain \ /root/share/new_models/xtuner/iter_2181.pth \ /root/tutorial/xtuner/llava/llava_data/iter_2181_hf # 启动! xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \ --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \ --llava /root/tutorial/xtuner/llava/llava_data/iter_2181_hf \ --prompt-template internlm2_chat \ --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg ```
Q1: Describe this image. Q2: What is the equipment in the image? ```
-
后。微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
# 解决小bug export MKL_SERVICE_FORCE_INTEL=1 export MKL_THREADING_LAYER=GNU # pth转huggingface xtuner convert pth_to_hf \ /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py \ /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth \ /root/tutorial/xtuner/llava/llava_data/iter_1200_hf # 启动! xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \ --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \ --llava /root/tutorial/xtuner/llava/llava_data/iter_1200_hf \ --prompt-template internlm2_chat \ --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg ``` 2. ``` Q1: Describe this image. Q2: What is the equipment in the image? ```
-
FT 成功!!!