华为 OpenEuler OS 上操作 Nvidia A100 做 LLama2 开发（2）

上海拓朗思科技

已于 2023-11-27 21:03:36 修改

阅读量932

点赞数 11

文章标签：人工智能 pytorch transformer llama

于 2023-11-27 17:59:25 首次发布

本文链接：https://blog.csdn.net/xuelangqingkong/article/details/134641979

版权

4. 搭建本地的 LLama2-Chinese-Alpaca

1. 首先下载一些相关的模型和库

# 下载一个 llama-2 中文大模型框架
git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca-2.git

# 再下载一个已经经过预训练的 llama-2 中文大模型
# https://huggingface.co/hfl/chinese-alpaca-2-7b
# 这里的下载推荐使用 google drive 下载，速度更快

# 随后下载一些中文法律相关的 alpaca 精调指令格式的数据
git clone https://github.com/AndrewZhe/lawyer-llama.git

2. 安装一些环境依赖包

pip install protobuf -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install llama2-wrapper -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install accelerate -i https://pypi.tuna.tsinghua.edu.cn/simple/

3. 安装 deepspeed

pip install deepspeed
ds_report

可能看到下面的warning

(/home/sd/conda_envs/llama2) [sd@worker85 chinese-alpaca-2-7b]$ ds_report
[2023-11-27 04:01:02,438] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
 [WARNING]  async_io requires the dev libaio .so object and headers but these were not found.
 [WARNING]  async_io: please install the libaio-devel package with yum
 [WARNING]  If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.

解决方法是

sudo yum install libaio-devel

然后就可以看到warning消失

(/home/sd/conda_envs/llama2) [sd@worker85 chinese-alpaca-2-7b]$ ds_report
[2023-11-27 04:02:25,275] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
      runtime if needed. Op compatibility means that your system
      meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
 [WARNING]  Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
 [WARNING]  sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
 [WARNING]  using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/sd/conda_envs/llama2/lib/python3.9/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/home/sd/conda_envs/llama2/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.12.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
shared memory (/dev/shm) size .... 503.39 GB

注意这里的 torch 版本是不支持稀疏 transformer 的，所以之后可以对 torch 版本尝试做一些优化

4. 设计一款法律咨询类的LLM模型

1. 下载相关数据

git clone https://github.com/AndrewZhe/lawyer-llama.git

文件夹中的 data 路径下有一些 alpaca 格式的指令集（其中能直接用的只有 judical_examination_v2.json ）

(/home/sd/conda_envs/llama2) [sd@worker85 LLaMa2]$ ll lawyer-llama/data
total 21044
-rw-r--r--. 1 sd llm 2123057 Nov 27 05:00 judical_examination.json
-rw-r--r--. 1 sd llm 5394689 Nov 27 05:00 judical_examination_v2.json
-rw-r--r--. 1 sd llm 4011110 Nov 27 05:00 legal_advice.json
-rw-r--r--. 1 sd llm 1487954 Nov 27 05:00 legal_counsel_multi_turn_with_article_v2.json
-rw-r--r--. 1 sd llm 6340889 Nov 27 05:00 legal_counsel_v2.json
-rw-r--r--. 1 sd llm 2173140 Nov 27 05:00 legal_counsel_with_article_v2.json

2. 生成 LoRA 权重集

由于由于有了数据，下一步就是准备LoRA 调试的参数和配置

# 运行脚本前请仔细阅读wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh)
# Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh) carefully before running the script
lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05

pretrained_model=/home/sd/LLaMa2/chinese-alpaca-2-7b
chinese_tokenizer_path=/home/sd/LLaMa2/chinese-alpaca-2-7b
dataset_dir=/home/sd/LLaMa2/lawyer-llama/data_alpaca
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
max_seq_length=512
output_dir=/home/sd/LLaMa2/lawyer-llama/model
validation_file=/home/sd/LLaMa2/lawyer-llama/data_alpaca/judical_examination_v2.json

deepspeed_config_file=ds_zero2_no_offload.json

torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
    --deepspeed ${deepspeed_config_file} \
    --model_name_or_path ${pretrained_model} \
    --tokenizer_name_or_path ${chinese_tokenizer_path} \
    --dataset_dir ${dataset_dir} \
    --per_device_train_batch_size ${per_device_train_batch_size} \
    --per_device_eval_batch_size ${per_device_eval_batch_size} \
    --do_train \
    --do_eval \
    --seed $RANDOM \
    --fp16 \
    --num_train_epochs 1 \
    --lr_scheduler_type cosine \
    --learning_rate ${lr} \
    --warmup_ratio 0.03 \
    --weight_decay 0 \
    --logging_strategy steps \
    --logging_steps 10 \
    --save_strategy steps \
    --save_total_limit 3 \
    --evaluation_strategy steps \
    --eval_steps 100 \
    --save_steps 200 \
    --gradient_accumulation_steps ${gradient_accumulation_steps} \
    --preprocessing_num_workers 8 \
    --max_seq_length ${max_seq_length} \
    --output_dir ${output_dir} \
    --overwrite_output_dir \
    --ddp_timeout 30000 \
    --logging_first_step True \
    --lora_rank ${lora_rank} \
    --lora_alpha ${lora_alpha} \
    --trainable ${lora_trainable} \
    --lora_dropout ${lora_dropout} \
    --modules_to_save ${modules_to_save} \
    --torch_dtype float16 \
    --validation_file ${validation_file} \
    --load_in_kbits 16 \
    --save_safetensors False \
    --gradient_checkpointing \
    --ddp_find_unused_parameters False

注意这里的参数只需要进行个别调整即可

# 预训练模型地址
pretrained_model=
# 分词器地址 一般和模型地址相同
chinese_tokenizer_path=
# 训练数据地址
dataset_dir=
...
# 微调后模型输出地址
output_dir=
# 验证集地址
validation_file=

配置好之后直接

bash run_sft.sh

中间遇到了一个问题

Traceback (most recent call last):
  File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_sft_with_peft.py", line 513, in <module>
    main()
  File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_sft_with_peft.py", line 405, in main
    model = LlamaForCausalLM.from_pretrained(
  File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
    model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'use_flash_attention_2'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1601890) of binary: /home/sd/conda_envs/llama2/bin/python

这里是 transformers 版本过低的问题，直接升级下 transformers

pip install --upgrade transformers

######################################
Installing collected packages: tokenizers, transformers
  Attempting uninstall: tokenizers
    Found existing installation: tokenizers 0.13.3
    Uninstalling tokenizers-0.13.3:
      Successfully uninstalled tokenizers-0.13.3
  Attempting uninstall: transformers
    Found existing installation: transformers 4.31.0
    Uninstalling transformers-4.31.0:
      Successfully uninstalled transformers-4.31.0
Successfully installed tokenizers-0.15.0 transformers-4.35.2

然后再运行上面的 bash run_sft.sh

***** train metrics *****
  epoch                    =        1.0
  train_loss               =     0.7889
  train_runtime            = 2:16:53.63
  train_samples            =       5000
  train_samples_per_second =      0.609
  train_steps_per_second   =      0.076
11/27/2023 08:22:48 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:738] 2023-11-27 08:22:48,082 >> The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: lang. If lang are not expected by `PeftModelForCausalLM.forward`,  you can safely ignore this message.
[INFO|trainer.py:3158] 2023-11-27 08:22:48,084 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-27 08:22:48,084 >>   Num examples = 5000
[INFO|trainer.py:3163] 2023-11-27 08:22:48,084 >>   Batch size = 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [14:43<00:00,  5.66it/s]
***** eval metrics *****
  epoch                   =        1.0
  eval_loss               =        nan
  eval_runtime            = 0:14:43.48
  eval_samples            =       5000
  eval_samples_per_second =      5.659
  eval_steps_per_second   =      5.659
  perplexity              =        nan

整个调试耗时 2.5 个小时左右

在设定的 LoRA 模型输出目录下，能看到输出的文件

(/home/sd/conda_envs/llama2) [sd@worker85 sft_lora_model]$ ll
total 1197972
-rw-r--r--. 1 sd llm        492 Nov 27 08:22 adapter_config.json
-rw-r--r--. 1 sd llm 1225856189 Nov 27 08:22 adapter_model.bin
-rw-r--r--. 1 sd llm        549 Nov 27 08:22 special_tokens_map.json
-rw-r--r--. 1 sd llm       1123 Nov 27 08:22 tokenizer_config.json
-rw-r--r--. 1 sd llm     844403 Nov 27 08:22 tokenizer.model

3. 合成大模型

执行以下命令

python ./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py \
    --base_model ./chinese-alpaca-2-7b \
    --lora_model ./lawyer-llama/model/sft_lora_model \
    --output_type huggingface \
    --output_dir ./lawyer-llama/model/full_model

ERROR 解决

然后出现这个问题

Traceback (most recent call last):
  File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py", line 240, in <module>
    lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
  File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/peft/config.py", line 134, in from_pretrained
    config = config_cls(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'enable_lora'

Traceback (most recent call last):
  File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py", line 240, in <module>
    lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
  File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/peft/config.py", line 134, in from_pretrained
    config = config_cls(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'merge_weights'

解决方法，直接在 ./lawyer-llama/model/sft_lora_model/adapter_config.json 中去掉对应的 key 即可

vim ./lawyer-llama/model/sft_lora_model/adapter_config.json

##############    修改前    ###########
{
  "base_model_name_or_path": "/home/sd/LLaMa2/chinese-alpaca-2-7b",
  "bias": "none",
  "enable_lora": null,
  "fan_in_fan_out": false,
  "inference_mode": true,
  "lora_alpha": 128.0,
  "lora_dropout": 0.05,
  "merge_weights": false,
  "modules_to_save": [
    "embed_tokens",
    "lm_head"
  ],
  "peft_type": "LORA",
  "r": 64,
  "target_modules": [
    "q_proj",
    "v_proj",
    "k_proj",
    "o_proj",
    "gate_proj",
    "down_proj",
    "up_proj"
  ],
  "task_type": "CAUSAL_LM"
}

##############    修改后    ###########
{
  "base_model_name_or_path": "/home/sd/LLaMa2/chinese-alpaca-2-7b",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "lora_alpha": 128.0,
  "lora_dropout": 0.05,
  "modules_to_save": [
    "embed_tokens",
    "lm_head"
  ],
  "peft_type": "LORA",
  "r": 64,
  "target_modules": [
    "q_proj",
    "v_proj",
    "k_proj",
    "o_proj",
    "gate_proj",
    "down_proj",
    "up_proj"
  ],
  "task_type": "CAUSAL_LM"
}

4. 模型展示

Traceback (most recent call last):
  File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 48, in <module>
    from openai_api_protocol import (
  File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_protocol.py", line 5, in <module>
    import shortuuid
ModuleNotFoundError: No module named 'shortuuid'

啥都不问，直接 pip install 即可

python -m llama2_wrapper.server --model_path /home/sd/LLaMa2/chinese-alpaca-2-7b --backend_type transformers --host 0.0.0.0 --port 8000

然后打开 http://IP:8000/docs 可以做测试

或者

python ./Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py --base_model ./lawyer-llama/model/full_model

上海拓朗思科技

关注

11
点赞
踩
7

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫