书生浦语第二期实战营——第四课_XTuner

主要内容:
1、微调理论讲解及 XTuner 介绍
2、XTuner 微调小助手个人认知实战
3、XTuner 微调 llava 图片理解多模态模型实战

1 课程笔记

1.1 理论

微调

两种范式

在这里插入图片描述

例子:

在这里插入图片描述

只有增量预训练:得到的类似相似度的内容

数据

在这里插入图片描述

标准的数据格式:训练框架能够识别的数据格式。(System\User\Assistant)

XTune中的数据格式:json格式

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

在这里插入图片描述

底模——角色(LoRA)

在这里插入图片描述

4-Bit 加载的时候数据四舍五入

在这里插入图片描述

XTurner

  • 傻瓜
  • 轻量 7B的模型用8GB的消费级显卡

上手

在这里插入图片描述

chat是指令微调后的模型

在这里插入图片描述

超参都在配置文件里面

在这里插入图片描述

在这里插入图片描述

数据集格式转换

在这里插入图片描述

在这里插入图片描述

max_to_tuner:让显存吃满

在这里插入图片描述

训练加速在这里插入图片描述

示例

在这里插入图片描述

在这里插入图片描述

底座模型——微调的模型(中间产物)——最终版

在这里插入图片描述

Image Project 进行图像向量化

图像+文本的的多模态模型其实就是训练Image Project的过程

在这里插入图片描述

识图和生图

在这里插入图片描述

在这里插入图片描述

预训练阶段的数据

在这里插入图片描述

FT阶段数据:

在这里插入图片描述

1.2 实战

在这里插入图片描述

在这里插入图片描述

微调 ——小助手项目

  1. 新建开发机 10%
  2. 新建环境
  3. 复制源码、进行安装
  4. 创建数据集:数据的存放目录/root/ft/data
  5. 创建生成数据的文件——generate_data.py。OpenAI格式的json文件。
  6. 模型准备。模型的软链接(复制)ln
  7. 配置文件。根据模型和微调方法。列出配置文件的指令;根据模型找配置文件的指令。
  8. 复制配置文件到/root/ft/config
  9. 修改配置文件。
  10. 指定路径进行训练。config路径、workdir存放路径。
  11. pytorch格式的pth文件转换成HG格式的文件。convert pth_to_hf 配置文件 权重文件 输出文件夹 全脸微调和QLoRA微调都需要使用。
  12. 整合。/root/ft/final_model文件夹,xtuner convert merge /root/ft/model hf文件夹 输出文件夹
  13. 测试。xtuner chat /root/ft/final_model --prompr-template internlm2_chat 。(教程中过拟合,散失了基本能力)
  14. web部署。streamlit
  15. 克隆代码。主要使用web_demo代码
  16. 重新配置。模型路径、分词器路径、avatar、system_prompt、与CLI对齐。
  17. 本地映射。ssh操作。ssh -CNg -L6006:本地ip 服务器账号@ip -p 端口

多模态训练和测试

30% 24G

FT前只会打标签,不管问什么问题。

FT后会描述图片

  1. 开发机
  2. 环境安装、激活
  3. 新建项目文件/root/xtuner
  4. 拉取xtuner的源码
  5. 安装源码
  6. 构建数据对。图片+标签 (略)
  7. 预训练。指令(略)
  8. 构建图片-文本对话文件。借助于GPT生成问答对,json格式。(略)。教程中的做法(同样的问答对200次)
  9. 配置文件。查询-拷贝
  10. 修改。基座模型、图像模型、预训练模型、数据文件夹、json文件、图像文件夹、batch size、
  11. FT 指令
  12. 对比:
    1. 微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
    2. 微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg
    3. Image Project相当于image encoding。(个人看法)

2 作业

记录复现过程并截图

基础作业(结营必做)

  • 训练自己的小助手认知(记录复现过程并截图)

进阶作业

  • 将自我认知的模型上传到 OpenXLab,并将应用部署到 OpenXLab(优秀学员必做)
  • 复现多模态微调(优秀学员必做)

OpenXLab 部署教程:https://github.com/InternLM/Tutorial/tree/camp2/tools/openxlab-deploy

2.1 训练小助手

参考:https://github.com/InternLM/Tutorial/blob/camp2/xtuner/personal_assistant_document.md

  1. 新建开发机 10%

在这里插入图片描述

  • vscode 连结
    在这里插入图片描述
  1. 新建环境并激活

    • 指令:
      # 新建虚拟环境
      studio-conda xtuner0.1.17
      # 激活环境
      conda activate xtuner0.1.17
      ```![在这里插入图片描述](https://img-blog.csdnimg.cn/direct/358e76d27b51442296304cb404684274.png#pic_center)
    
    
    
  2. 复制源码、进行安装

    • 创建本本文件夹(项目文件夹)——拉文件——安装源码

    • # 建
      kdir /root/xtuner0117 && cd /root/xtuner0117
      # 拉
      git clone -b v0.1.17  https://github.com/InternLM/xtuner
      # 装
      cd /root/xtuner0117/xtuner
      pip install -e '.[all]'
      

在这里插入图片描述

  1. 创建数据集:数据的存放目录/root/ft/data

    • 微调文件管理:新建/root/ft /root/ft/data
      mkdir -p /root/ft
      mkdir -p /root/ft/data
    
  2. 创建生成数据的文件——generate_data.py。OpenAI格式的json文件。

    • data下新建生成数据的脚本并运行。

    •  touch /root/ft/data/generate_data.py  && cd /root/ft/data/
       python /root/ft/data/generate_data.py
      
    • import json
      
      # 设置用户的名字
      name = '齐天大圣孙悟空'   # 修改为自己的名称
      # 设置需要重复添加的数据次数
      n =  500
      
      # 初始化OpenAI格式的数据结构
      data = [
          {
              "messages": [
                  {
                      "role": "user",
                      "content": "请做一下自我介绍"
                  },
                  {
                      "role": "assistant",
                      "content": "我是{}的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦".format(name)
                  }
              ]
          }
      ]
      
      # 通过循环,将初始化的对话数据重复添加到data列表中
      for i in range(n):
          data.append(data[0])
      
      # 将data列表中的数据写入到一个名为'personal_assistant.json'的文件中
      with open('personal_assistant.json', 'w', encoding='utf-8') as f:
          # 使用json.dump方法将数据以JSON格式写入文件
          # ensure_ascii=False 确保中文字符正常显示
          # indent=4 使得文件内容格式化,便于阅读
          json.dump(data, f, ensure_ascii=False, indent=4)
      
      

在这里插入图片描述

  1. 模型准备。模型的软链接(复制)ln

    • 模型软链接
      # 创建目标文件夹,确保它存在。
      # -p选项意味着如果上级目录不存在也会一并创建,且如果目标文件夹已存在则不会报错。
      # mkdir -p /root/ft/model
      
      # 复制内容到目标文件夹。-r选项表示递归复制整个文件夹。
      # cp -r /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b/* /root/ft/model/
      
      # 删除/root/ft/model目录
      # rm -rf /root/ft/model
      
      # 创建符号链接
      ln -s /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b /root/ft/model
    
    • 在这里插入图片描述

    • 后:在这里插入图片描述

  2. 配置文件。根据模型和微调方法。列出配置文件的指令;根据模型找配置文件的指令。

       # 列出所有内置配置文件
       # xtuner list-cfg
       
       # 假如我们想找到 internlm2-1.8b 模型里支持的配置文件
       xtuner list-cfg -p internlm2_1_8b  # 搜索匹配internlm2_1_8b
    

    在这里插入图片描述

  3. 复制配置文件到`/root/ft/config

      # 创建一个存放 config 文件的文件夹
      mkdir -p /root/ft/config
      
      # 使用 XTuner 中的 copy-cfg 功能将 config 文件复制到指定的位置
      xtuner copy-cfg internlm2_1_8b_qlora_alpaca_e3 /root/ft/config
      
    
  4. 修改配置文件。

      # Copyright (c) OpenMMLab. All rights reserved.
      import torch
      from datasets import load_dataset
      from mmengine.dataset import DefaultSampler
      from mmengine.hooks import (CheckpointHook, DistSamplerSeedHook, IterTimerHook,
                                  LoggerHook, ParamSchedulerHook)
      from mmengine.optim import AmpOptimWrapper, CosineAnnealingLR, LinearLR
      from peft import LoraConfig
      from torch.optim import AdamW
      from transformers import (AutoModelForCausalLM, AutoTokenizer,
                                BitsAndBytesConfig)
      
      from xtuner.dataset import process_hf_dataset
      from xtuner.dataset.collate_fns import default_collate_fn
      from xtuner.dataset.map_fns import openai_map_fn, template_map_fn_factory
      from xtuner.engine.hooks import (DatasetInfoHook, EvaluateChatHook,
                                       VarlenAttnArgsToMessageHubHook)
      from xtuner.engine.runner import TrainLoop
      from xtuner.model import SupervisedFinetune
      from xtuner.parallel.sequence import SequenceParallelSampler
      from xtuner.utils import PROMPT_TEMPLATE, SYSTEM_TEMPLATE
      
      #######################################################################
      #                          PART 1  Settings                           #
      #######################################################################
      # Model
      pretrained_model_name_or_path = '/root/ft/model'
      use_varlen_attn = False
      
      # Data
      alpaca_en_path = '/root/ft/data/personal_assistant.json'
      prompt_template = PROMPT_TEMPLATE.default
      max_length = 1024
      pack_to_max_length = True
      
      # parallel
      sequence_parallel_size = 1
      
      # Scheduler & Optimizer
      batch_size = 1  # per_device
      accumulative_counts = 16
      accumulative_counts *= sequence_parallel_size
      dataloader_num_workers = 0
      max_epochs = 2
      optim_type = AdamW
      lr = 2e-4
      betas = (0.9, 0.999)
      weight_decay = 0
      max_norm = 1  # grad clip
      warmup_ratio = 0.03
      
      # Save
      save_steps = 300
      save_total_limit = 3  # Maximum checkpoints to keep (-1 means unlimited)
      
      # Evaluate the generation performance during the training
      evaluation_freq = 300
      SYSTEM = ''
      evaluation_inputs = ['请你介绍一下你自己', '你是谁', '你是我的小助手吗']
      
      #######################################################################
      #                      PART 2  Model & Tokenizer                      #
      #######################################################################
      tokenizer = dict(
          type=AutoTokenizer.from_pretrained,
          pretrained_model_name_or_path=pretrained_model_name_or_path,
          trust_remote_code=True,
          padding_side='right')
      
      model = dict(
          type=SupervisedFinetune,
          use_varlen_attn=use_varlen_attn,
          llm=dict(
              type=AutoModelForCausalLM.from_pretrained,
              pretrained_model_name_or_path=pretrained_model_name_or_path,
              trust_remote_code=True,
              torch_dtype=torch.float16,
              quantization_config=dict(
                  type=BitsAndBytesConfig,
                  load_in_4bit=True,
                  load_in_8bit=False,
                  llm_int8_threshold=6.0,
                  llm_int8_has_fp16_weight=False,
                  bnb_4bit_compute_dtype=torch.float16,
                  bnb_4bit_use_double_quant=True,
                  bnb_4bit_quant_type='nf4')),
          lora=dict(
              type=LoraConfig,
              r=64,
              lora_alpha=16,
              lora_dropout=0.1,
              bias='none',
              task_type='CAUSAL_LM'))
      
      #######################################################################
      #                      PART 3  Dataset & Dataloader                   #
      #######################################################################
      alpaca_en = dict(
          type=process_hf_dataset,
          dataset=dict(type=load_dataset, path='json', data_files=dict(train=alpaca_en_path)),
          tokenizer=tokenizer,
          max_length=max_length,
          dataset_map_fn=openai_map_fn,
          template_map_fn=dict(
              type=template_map_fn_factory, template=prompt_template),
          remove_unused_columns=True,
          shuffle_before_pack=True,
          pack_to_max_length=pack_to_max_length,
          use_varlen_attn=use_varlen_attn)
      
      sampler = SequenceParallelSampler \
          if sequence_parallel_size > 1 else DefaultSampler
      train_dataloader = dict(
          batch_size=batch_size,
          num_workers=dataloader_num_workers,
          dataset=alpaca_en,
          sampler=dict(type=sampler, shuffle=True),
          collate_fn=dict(type=default_collate_fn, use_varlen_attn=use_varlen_attn))
      
      #######################################################################
      #                    PART 4  Scheduler & Optimizer                    #
      #######################################################################
      # optimizer
      optim_wrapper = dict(
          type=AmpOptimWrapper,
          optimizer=dict(
              type=optim_type, lr=lr, betas=betas, weight_decay=weight_decay),
          clip_grad=dict(max_norm=max_norm, error_if_nonfinite=False),
          accumulative_counts=accumulative_counts,
          loss_scale='dynamic',
          dtype='float16')
      
      # learning policy
      # More information: https://github.com/open-mmlab/mmengine/blob/main/docs/en/tutorials/param_scheduler.md  # noqa: E501
      param_scheduler = [
          dict(
              type=LinearLR,
              start_factor=1e-5,
              by_epoch=True,
              begin=0,
              end=warmup_ratio * max_epochs,
              convert_to_iter_based=True),
          dict(
              type=CosineAnnealingLR,
              eta_min=0.0,
              by_epoch=True,
              begin=warmup_ratio * max_epochs,
              end=max_epochs,
              convert_to_iter_based=True)
      ]
      
      # train, val, test setting
      train_cfg = dict(type=TrainLoop, max_epochs=max_epochs)
      
      #######################################################################
      #                           PART 5  Runtime                           #
      #######################################################################
      # Log the dialogue periodically during the training process, optional
      custom_hooks = [
          dict(type=DatasetInfoHook, tokenizer=tokenizer),
          dict(
              type=EvaluateChatHook,
              tokenizer=tokenizer,
              every_n_iters=evaluation_freq,
              evaluation_inputs=evaluation_inputs,
              system=SYSTEM,
              prompt_template=prompt_template)
      ]
      
      if use_varlen_attn:
          custom_hooks += [dict(type=VarlenAttnArgsToMessageHubHook)]
      
      # configure default hooks
      default_hooks = dict(
          # record the time of every iteration.
          timer=dict(type=IterTimerHook),
          # print log every 10 iterations.
          logger=dict(type=LoggerHook, log_metric_by_epoch=False, interval=10),
          # enable the parameter scheduler.
          param_scheduler=dict(type=ParamSchedulerHook),
          # save checkpoint per `save_steps`.
          checkpoint=dict(
              type=CheckpointHook,
              by_epoch=False,
              interval=save_steps,
              max_keep_ckpts=save_total_limit),
          # set sampler seed in distributed evrionment.
          sampler_seed=dict(type=DistSamplerSeedHook),
      )
      
      # configure environment
      env_cfg = dict(
          # whether to enable cudnn benchmark
          cudnn_benchmark=False,
          # set multi process parameters
          mp_cfg=dict(mp_start_method='fork', opencv_num_threads=0),
          # set distributed parameters
          dist_cfg=dict(backend='nccl'),
      )
      
      # set visualizer
      visualizer = None
      
      # set log level
      log_level = 'INFO'
      
      # load from which checkpoint
      load_from = None
      
      # whether to resume training from the loaded checkpoint
      resume = False
      
      # Defaults to use random seed and disable `deterministic`
      randomness = dict(seed=None, deterministic=False)
      
      # set log processor
      log_processor = dict(by_epoch=False)
    

在这里插入图片描述

  1. 指定路径进行训练。config路径、workdir存放路径。

       # 指定保存路径
       xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train
       
       # 使用 deepspeed 来加速训练
       xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train_deepspeed --deepspeed deepspeed_zero2
       
       # 模型续训
       xtuner train /root/ft/config/internlm2_1_8b_qlora_alpaca_e3_copy.py --work-dir /root/ft/train --resume /root/ft/train/iter_600.pth
    

在这里插入图片描述

  1. pytorch格式的pth文件转换成HG格式的文件( .bin 格式文件)。convert pth_to_hf 配置文件 权重文件 输出文件夹 全脸微调和QLoRA微调都需要使用。

      # 创建一个保存转换后 Huggingface 格式的文件夹
      mkdir -p /root/ft/huggingface
      
      # 模型转换
      # xtuner convert pth_to_hf ${配置文件地址} ${权重文件地址} ${转换后模型保存地址}
      xtuner convert pth_to_hf /root/ft/train/internlm2_1_8b_qlora_alpaca_e3_copy.py /root/ft/train/iter_64.pth /root/ft/huggingface
    

    在这里插入图片描述

  2. 整合。/root/ft/final_model文件夹,xtuner convert merge /root/ft/model hf文件夹 输出文件夹

      # 创建一个名为 final_model 的文件夹存储整合后的模型文件
      mkdir -p /root/ft/final_model
      
      # 解决一下线程冲突的 Bug 
      export MKL_SERVICE_FORCE_INTEL=1
      
      # 进行模型整合
      # xtuner convert merge  ${NAME_OR_PATH_TO_LLM} ${NAME_OR_PATH_TO_ADAPTER} ${SAVE_PATH} 
      xtuner convert merge /root/ft/model /root/ft/huggingface /root/ft/final_model
    

    在这里插入图片描述

  3. 测试。xtuner chat /root/ft/final_model --prompr-template internlm2_chat 。(教程中过拟合,散失了基本能力)

    • 微调后:
      # 与模型进行对话
      xtuner chat /root/ft/final_model --prompt-template internlm2_chat
    
      double enter to end input (EXIT: exit chat, RESET: reset history) >>> 你是谁
      我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
      
      double enter to end input (EXIT: exit chat, RESET: reset history) >>>  请你介绍一下你自己
      我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
      
      double enter to end input (EXIT: exit chat, RESET: reset history) >>> 你是我的小助手吗?
      我是剑锋大佬的小助手,内在是上海AI实验室书生·浦语的1.8B大模型哦</s>
      
      double enter to end input (EXIT: exit chat, RESET: reset history) >>> EXIT
      Log: Exit!
    

    在这里插入图片描述

    数据训练次数过少

  4. web部署。streamlit pip install streamlit==1.24.0

  5. 克隆代码。主要使用web_demo代码

      # 创建存放 InternLM 文件的代码
      mkdir -p /root/ft/web_demo && cd /root/ft/web_demo
      
      # 拉取 InternLM 源文件
      git clone https://github.com/InternLM/InternLM.git
      
      # 进入该库中
      cd /root/ft/web_demo/InternLM
    
  6. web_demo.py 重新配置。模型路径、分词器路径、avatar、system_prompt、与CLI对齐。

    • 修改模型
       # 修改模型地址(第183行)
       - model = (AutoModelForCausalLM.from_pretrained('/root/ft/final_model',
       + model = (AutoModelForCausalLM.from_pretrained('/root/ft/model',
       
       # 修改分词器地址(第186行)
       - tokenizer = AutoTokenizer.from_pretrained('/root/ft/final_model',
       + tokenizer = AutoTokenizer.from_pretrained('/root/ft/model',
    
    • web_demo.py
      """This script refers to the dialogue example of streamlit, the interactive
      generation code of chatglm2 and transformers.
      
      We mainly modified part of the code logic to adapt to the
      generation of our model.
      Please refer to these links below for more information:
          1. streamlit chat example:
              https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps
          2. chatglm2:
              https://github.com/THUDM/ChatGLM2-6B
          3. transformers:
              https://github.com/huggingface/transformers
      Please run with the command `streamlit run path/to/web_demo.py
          --server.address=0.0.0.0 --server.port 7860`.
      Using `python path/to/web_demo.py` may cause unknown problems.
      """
      # isort: skip_file
      import copy
      import warnings
      from dataclasses import asdict, dataclass
      from typing import Callable, List, Optional
      
      import streamlit as st
      import torch
      from torch import nn
      from transformers.generation.utils import (LogitsProcessorList,
                                                 StoppingCriteriaList)
      from transformers.utils import logging
      
      from transformers import AutoTokenizer, AutoModelForCausalLM  # isort: skip
      
      logger = logging.get_logger(__name__)
      
      
      @dataclass
      class GenerationConfig:
          # this config is used for chat to provide more diversity
          max_length: int = 2048
          top_p: float = 0.75
          temperature: float = 0.1
          do_sample: bool = True
          repetition_penalty: float = 1.000
      
      
      @torch.inference_mode()
      def generate_interactive(
          model,
          tokenizer,
          prompt,
          generation_config: Optional[GenerationConfig] = None,
          logits_processor: Optional[LogitsProcessorList] = None,
          stopping_criteria: Optional[StoppingCriteriaList] = None,
          prefix_allowed_tokens_fn: Optional[Callable[[int, torch.Tensor],
                                                      List[int]]] = None,
          additional_eos_token_id: Optional[int] = None,
          **kwargs,
      ):
          inputs = tokenizer([prompt], padding=True, return_tensors='pt')
          input_length = len(inputs['input_ids'][0])
          for k, v in inputs.items():
              inputs[k] = v.cuda()
          input_ids = inputs['input_ids']
          _, input_ids_seq_length = input_ids.shape[0], input_ids.shape[-1]
          if generation_config is None:
              generation_config = model.generation_config
          generation_config = copy.deepcopy(generation_config)
          model_kwargs = generation_config.update(**kwargs)
          bos_token_id, eos_token_id = (  # noqa: F841  # pylint: disable=W0612
              generation_config.bos_token_id,
              generation_config.eos_token_id,
          )
          if isinstance(eos_token_id, int):
              eos_token_id = [eos_token_id]
          if additional_eos_token_id is not None:
              eos_token_id.append(additional_eos_token_id)
          has_default_max_length = kwargs.get(
              'max_length') is None and generation_config.max_length is not None
          if has_default_max_length and generation_config.max_new_tokens is None:
              warnings.warn(
                  f"Using 'max_length''s default ({repr(generation_config.max_length)}) \
                      to control the generation length. "
                  'This behaviour is deprecated and will be removed from the \
                      config in v5 of Transformers -- we'
                  ' recommend using `max_new_tokens` to control the maximum \
                      length of the generation.',
                  UserWarning,
              )
          elif generation_config.max_new_tokens is not None:
              generation_config.max_length = generation_config.max_new_tokens + \
                  input_ids_seq_length
              if not has_default_max_length:
                  logger.warn(  # pylint: disable=W4902
                      f"Both 'max_new_tokens' (={generation_config.max_new_tokens}) "
                      f"and 'max_length'(={generation_config.max_length}) seem to "
                      "have been set. 'max_new_tokens' will take precedence. "
                      'Please refer to the documentation for more information. '
                      '(https://huggingface.co/docs/transformers/main/'
                      'en/main_classes/text_generation)',
                      UserWarning,
                  )
      
          if input_ids_seq_length >= generation_config.max_length:
              input_ids_string = 'input_ids'
              logger.warning(
                  f"Input length of {input_ids_string} is {input_ids_seq_length}, "
                  f"but 'max_length' is set to {generation_config.max_length}. "
                  'This can lead to unexpected behavior. You should consider'
                  " increasing 'max_new_tokens'.")
      
          # 2. Set generation parameters if not already defined
          logits_processor = logits_processor if logits_processor is not None \
              else LogitsProcessorList()
          stopping_criteria = stopping_criteria if stopping_criteria is not None \
              else StoppingCriteriaList()
      
          logits_processor = model._get_logits_processor(
              generation_config=generation_config,
              input_ids_seq_length=input_ids_seq_length,
              encoder_input_ids=input_ids,
              prefix_allowed_tokens_fn=prefix_allowed_tokens_fn,
              logits_processor=logits_processor,
          )
      
          stopping_criteria = model._get_stopping_criteria(
              generation_config=generation_config,
              stopping_criteria=stopping_criteria)
          logits_warper = model._get_logits_warper(generation_config)
      
          unfinished_sequences = input_ids.new(input_ids.shape[0]).fill_(1)
          scores = None
          while True:
              model_inputs = model.prepare_inputs_for_generation(
                  input_ids, **model_kwargs)
              # forward pass to get next token
              outputs = model(
                  **model_inputs,
                  return_dict=True,
                  output_attentions=False,
                  output_hidden_states=False,
              )
      
              next_token_logits = outputs.logits[:, -1, :]
      
              # pre-process distribution
              next_token_scores = logits_processor(input_ids, next_token_logits)
              next_token_scores = logits_warper(input_ids, next_token_scores)
      
              # sample
              probs = nn.functional.softmax(next_token_scores, dim=-1)
              if generation_config.do_sample:
                  next_tokens = torch.multinomial(probs, num_samples=1).squeeze(1)
              else:
                  next_tokens = torch.argmax(probs, dim=-1)
      
              # update generated ids, model inputs, and length for next step
              input_ids = torch.cat([input_ids, next_tokens[:, None]], dim=-1)
              model_kwargs = model._update_model_kwargs_for_generation(
                  outputs, model_kwargs, is_encoder_decoder=False)
              unfinished_sequences = unfinished_sequences.mul(
                  (min(next_tokens != i for i in eos_token_id)).long())
      
              output_token_ids = input_ids[0].cpu().tolist()
              output_token_ids = output_token_ids[input_length:]
              for each_eos_token_id in eos_token_id:
                  if output_token_ids[-1] == each_eos_token_id:
                      output_token_ids = output_token_ids[:-1]
              response = tokenizer.decode(output_token_ids)
      
              yield response
              # stop when each sentence is finished
              # or if we exceed the maximum length
              if unfinished_sequences.max() == 0 or stopping_criteria(
                      input_ids, scores):
                  break
      
      
      def on_btn_click():
          del st.session_state.messages
      
      
      @st.cache_resource
      def load_model():
          model = (AutoModelForCausalLM.from_pretrained('/root/ft/final_model',
                                                        trust_remote_code=True).to(
                                                            torch.bfloat16).cuda())
          tokenizer = AutoTokenizer.from_pretrained('/root/ft/final_model',
                                                    trust_remote_code=True)
          return model, tokenizer
      
      
      def prepare_generation_config():
          with st.sidebar:
              max_length = st.slider('Max Length',
                                     min_value=8,
                                     max_value=32768,
                                     value=2048)
              top_p = st.slider('Top P', 0.0, 1.0, 0.75, step=0.01)
              temperature = st.slider('Temperature', 0.0, 1.0, 0.1, step=0.01)
              st.button('Clear Chat History', on_click=on_btn_click)
      
          generation_config = GenerationConfig(max_length=max_length,
                                               top_p=top_p,
                                               temperature=temperature)
      
          return generation_config
      
      
      user_prompt = '<|im_start|>user\n{user}<|im_end|>\n'
      robot_prompt = '<|im_start|>assistant\n{robot}<|im_end|>\n'
      cur_query_prompt = '<|im_start|>user\n{user}<|im_end|>\n\
          <|im_start|>assistant\n'
      
      
      def combine_history(prompt):
          messages = st.session_state.messages
          meta_instruction = ('')
          total_prompt = f"<s><|im_start|>system\n{meta_instruction}<|im_end|>\n"
          for message in messages:
              cur_content = message['content']
              if message['role'] == 'user':
                  cur_prompt = user_prompt.format(user=cur_content)
              elif message['role'] == 'robot':
                  cur_prompt = robot_prompt.format(robot=cur_content)
              else:
                  raise RuntimeError
              total_prompt += cur_prompt
          total_prompt = total_prompt + cur_query_prompt.format(user=prompt)
          return total_prompt
      
      
      def main():
          # torch.cuda.empty_cache()
          print('load model begin.')
          model, tokenizer = load_model()
          print('load model end.')
      
      
          st.title('InternLM2-Chat-1.8B')
      
          generation_config = prepare_generation_config()
      
          # Initialize chat history
          if 'messages' not in st.session_state:
              st.session_state.messages = []
      
          # Display chat messages from history on app rerun
          for message in st.session_state.messages:
              with st.chat_message(message['role'], avatar=message.get('avatar')):
                  st.markdown(message['content'])
      
          # Accept user input
          if prompt := st.chat_input('What is up?'):
              # Display user message in chat message container
              with st.chat_message('user'):
                  st.markdown(prompt)
              real_prompt = combine_history(prompt)
              # Add user message to chat history
              st.session_state.messages.append({
                  'role': 'user',
                  'content': prompt,
              })
      
              with st.chat_message('robot'):
                  message_placeholder = st.empty()
                  for cur_response in generate_interactive(
                          model=model,
                          tokenizer=tokenizer,
                          prompt=real_prompt,
                          additional_eos_token_id=92542,
                          **asdict(generation_config),
                  ):
                      # Display robot response in chat message container
                      message_placeholder.markdown(cur_response + '▌')
                  message_placeholder.markdown(cur_response)
              # Add robot response to chat history
              st.session_state.messages.append({
                  'role': 'robot',
                  'content': cur_response,  # pylint: disable=undefined-loop-variable
              })
              torch.cuda.empty_cache()
      
      
      if __name__ == '__main__':
          main()
    
    • run :streamlit run /root/ft/web_demo/InternLM/chat/web_demo.py --server.address 127.0.0.1 --server.port 6006
  7. 本地映射。ssh操作。ssh -CNg -L6006:本地ip 服务器账号@ip -p 端口

    ```ssh -CNg -L 6006:127.0.0.1:6006 root@ssh.intern-ai.org.cn -p 127.0.0.1:6006`
    在这里插入图片描述

2.2 上传模型到OpenXLab

参考:https://github.com/InternLM/Tutorial/tree/camp2/tools/openxlab-deploy

2.3 复现多模态

参考:https://github.com/InternLM/Tutorial/blob/camp2/xtuner/llava/xtuner_llava.md

  1. 开发机

  2. 环境安装、激活

       cd ~ && studio-conda xtuner0.1.17
       conda activate xtuner0.1.17
    
  3. 新建项目文件/root/xtuner0117

     mkdir -p /root/xtuner0117 && cd /root/xtuner0117
    

    在这里插入图片描述

  4. 拉取xtuner的源码

      git clone -b v0.1.17  https://github.com/InternLM/xtuner
    
  5. 安装源码

      pip install -e '.[all]' && cd ~
    

    第一个座椅已经完成到此步骤

    在这里插入图片描述

  6. 构建数据对。图片+标签 (略)

  7. 预训练。指令(略)8卡

       NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain --deepspeed deepspeed_zero2
       
       NPROC_PER_NODE=8 xtuner train llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune --deepspeed deepspeed_zero2
    
  8. 构建图片-文本对话文件。借助于GPT生成问答对,json格式。(略)。教程中的做法(同样的问答对200次)

    • 重复两百次
       cd ~ && git clone https://github.com/InternLM/tutorial -b camp2 && conda activate xtuner0.1.17 && cd tutorial
       
       python /root/tutorial/xtuner/llava/llava_data/repeat.py \
         -i /root/tutorial/xtuner/llava/llava_data/unique_data.json \
         -o /root/tutorial/xtuner/llava/llava_data/repeated_data.json \
         -n 200
    

    在这里插入图片描述
    在这里插入图片描述

  9. 配置文件。查询-拷贝

    • 直接cp /root/tutorial/xtuner/llava/llava_data/internlm2_chat_1_8b_llava_tutorial_fool_config.py /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py
       # 或者查询xtuner内置配置文件
       xtuner list-cfg -p llava_internlm2_chat_1_8b
       
       # 拷贝配置文件到当前目录
       xtuner copy-cfg \
         llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune \
         /root/tutorial/xtuner/llava
         
       
    

    在这里插入图片描述

    在这里插入图片描述

  10. 修改。基座模型、图像模型、预训练模型、数据文件夹、json文件、图像文件夹、batch size、

       # Model
       - llm_name_or_path = 'internlm/internlm2-chat-1_8b'
       + llm_name_or_path = '/root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b'
       - visual_encoder_name_or_path = 'openai/clip-vit-large-patch14-336'
       + visual_encoder_name_or_path = '/root/share/new_models/openai/clip-vit-large-patch14-336'
       
       # Specify the pretrained pth
       - pretrained_pth = './work_dirs/llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain/iter_2181.pth'  # noqa: E501
       + pretrained_pth = '/root/share/new_models/xtuner/iter_2181.pth'
       
       # Data
       - data_root = './data/llava_data/'
       + data_root = '/root/tutorial/xtuner/llava/llava_data/'
       - data_path = data_root + 'LLaVA-Instruct-150K/llava_v1_5_mix665k.json'
       + data_path = data_root + 'repeated_data.json'
       - image_folder = data_root + 'llava_images'
       + image_folder = data_root
       
       # Scheduler & Optimizer
       - batch_size = 16  # per_device
       + batch_size = 1  # per_device
       
       
       # evaluation_inputs
       - evaluation_inputs = ['请描述一下这张图片','Please describe this picture']
       + evaluation_inputs = ['Please describe this picture','What is the equipment in the image?']
       
    
  11. FT 指令

      cd /root/tutorial/xtuner/llava/
      xtuner train /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py --deepspeed deepspeed_zero2
    

在这里插入图片描述

  1. 对比:

    • 前。微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg

         # 解决小bug
         export MKL_SERVICE_FORCE_INTEL=1
         export MKL_THREADING_LAYER=GNU
         
         # pth转huggingface
         xtuner convert pth_to_hf \
           llava_internlm2_chat_1_8b_clip_vit_large_p14_336_e1_gpu8_pretrain \
           /root/share/new_models/xtuner/iter_2181.pth \
           /root/tutorial/xtuner/llava/llava_data/iter_2181_hf
         
         # 启动!
         xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \
           --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \
           --llava /root/tutorial/xtuner/llava/llava_data/iter_2181_hf \
           --prompt-template internlm2_chat \
           --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
         ```
      
      
      Q1: Describe this image.
      Q2: What is the equipment in the image?
      ```
      
    • 后。微调模型转换成hf模式,启动xtuner chat 基座模型 视觉模型 llava模型 img.jpg

         # 解决小bug
         export MKL_SERVICE_FORCE_INTEL=1
         export MKL_THREADING_LAYER=GNU
         
         # pth转huggingface
         xtuner convert pth_to_hf \
           /root/tutorial/xtuner/llava/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy.py \
           /root/tutorial/xtuner/llava/work_dirs/llava_internlm2_chat_1_8b_qlora_clip_vit_large_p14_336_lora_e1_gpu8_finetune_copy/iter_1200.pth \
           /root/tutorial/xtuner/llava/llava_data/iter_1200_hf
         
         # 启动!
         xtuner chat /root/share/new_models/Shanghai_AI_Laboratory/internlm2-chat-1_8b \
           --visual-encoder /root/share/new_models/openai/clip-vit-large-patch14-336 \
           --llava /root/tutorial/xtuner/llava/llava_data/iter_1200_hf \
           --prompt-template internlm2_chat \
           --image /root/tutorial/xtuner/llava/llava_data/test_img/oph.jpg
         ```
      
      2. ```
         Q1: Describe this image.
         Q2: What is the equipment in the image?
         ```
      

在这里插入图片描述

在这里插入图片描述

FT 成功!!!

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值