baichuan-7B模型训练和推理

本文介绍了如何在GitHub上的Baichuan-7B项目中部署环境,包括依赖库的安装,以及模型Baichuan2-7B-Chat-4bits的推理过程。还提供了错误排查和量化模型部署的指导,以及训练时的参数调整和检查点管理建议。
摘要由CSDN通过智能技术生成

Baichuan-7B模型训练和推理

概述

  • baichaun2 github地址:https://github.com/baichuan-inc/Baichuan2

环境部署

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple deepspeed==0.9.2
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple numpy==1.23.5
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple sentencepiece==0.1.97
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple torch==2.0.0
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple transformers==4.29.1
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple xformers==0.0.20
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorboard
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple datasets
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple accelerate
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple tensorboard
# 运行下面方案才安装 mpi4py ,否则会报错!!!!!!

# 方案1
sudo apt update
sudo apt install openmpi-bin

# 方案2
sudo apt update
sudo apt-get install libopenmpi-dev
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple mpi4py

模型推理

Baichuan2-7B-Chat-4bits

量化模型运行的python代码

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
tokenizer = AutoTokenizer.from_pretrained("/home/shidonghai/Baichuan-7B/Baichuan2-7B-Chat-4bits", use_fast=False, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("/home/shidonghai/Baichuan-7B/Baichuan2-7B-Chat-4bits", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
model.generation_config = GenerationConfig.from_pretrained("/home/shidonghai/Baichuan-7B/Baichuan2-7B-Chat-4bits")
messages = []
messages.append({"role": "user", "content": "解释一下“温故而知新”"})
response = model.chat(tokenizer, messages)
print(response)

推理错误解决

AttributeError: ‘list’ object has no attribute ‘as_dict’

(baichuan) shidonghai@shidonghai:~/Baichuan-7B$ python 123.py
Traceback (most recent call last):
  File "123.py", line 5, in <module>
    model = AutoModelForCausalLM.from_pretrained("/home/shidonghai/Baichuan-7B/Baichuan2-7B-Chat-4bits", device_map="auto", torch_dtype=torch.bfloat16, trust_remote_code=True)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/transformers/models/auto/auto_factory.py", line 561, in from_pretrained
    return model_class.from_pretrained(
  File "/home/shidonghai/.cache/huggingface/modules/transformers_modules/Baichuan2-7B-Chat-4bits/modeling_baichuan.py", line 656, in from_pretrained
    dispatch_model(model, device_map=device_map)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/accelerate/big_modeling.py", line 343, in dispatch_model
    check_device_map(model, device_map)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/accelerate/utils/modeling.py", line 1165, in check_device_map
    all_model_tensors = [name for name, _ in model.state_dict().items()]
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1897, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1897, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1897, in state_dict
    module.state_dict(destination=destination, prefix=prefix + name + '.', keep_vars=keep_vars)
  [Previous line repeated 2 more times]
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1894, in state_dict
    self._save_to_state_dict(destination, prefix, keep_vars)
  File "/home/shidonghai/anaconda3/envs/baichuan/lib/python3.8/site-packages/bitsandbytes/nn/modules.py", line 237, in _save_to_state_dict
    for k, v in self.weight.quant_state.as_dict(packed=True).items():
AttributeError: 'list' object has no attribute 'as_dict'

降低bitsandbytes版本 pip install bitsandbytes==0.41.0

量化模型部署

根据官网提供的量化推理GPU占用,如果只是在GPU本地上进行查看,则推荐使用int4模型。

PrecisionGPU Mem (GB)
bf16 / fp1626.0
int815.8
int49.7

模型地址

https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat-4bits

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat-4bits

https://huggingface.co/baichuan-inc/Baichuan2-7B-Chat

https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat

https://huggingface.co/baichuan-inc/Baichuan2-13B-Base

https://huggingface.co/baichuan-inc/Baichuan2-7B-Base

https://huggingface.co/baichuan-inc/Baichuan2-7B-Intermediate-Checkpoints

模型训练

单GPU下运行

在运行前将原deepspeed的参数"stage": 2, 改为 “stage”: 3, 暂时不知道到为什么会报错,好像是版本的问题。

  "zero_optimization": {
    "stage": 3,
    "contiguous_gradients": false,
    "allgather_bucket_size": 1e8,
    "reduce_bucket_size": 1e8,
    "overlap_comm": true,
    "reduce_scatter": true
  },

在train.py的文件中新建以下文件夹

  • data_dir:存放语料的文件

  • checkpoints:存储deepspeed的训练文件;

    • 下面的参数是训练多少步骤进行存储检查点文件,个人推荐至少设置2000以上,如果每一次迭代存储一次,30分钟左右存储就会达到3T
        parser.add_argument("--steps_per_epoch", type=int, default=4096,
                            help="Step intervals to save checkpoint")
    

运行下面的命令进行执行即可。

#!/bin/bash
deepspeed train.py \
--deepspeed \
--deepspeed_config config/deepspeed.json
  • 7
    点赞
  • 13
    收藏
    觉得还不错? 一键收藏
  • 2
    评论
评论 2
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值