4. 搭建本地的 LLama2-Chinese-Alpaca
1. 首先下载一些相关的模型和库
# 下载一个 llama-2 中文大模型框架
git clone https://github.com/ymcui/Chinese-LLaMA-Alpaca-2.git
# 再下载一个已经经过预训练的 llama-2 中文大模型
# https://huggingface.co/hfl/chinese-alpaca-2-7b
# 这里的下载推荐使用 google drive 下载,速度更快
# 随后下载一些中文法律相关的 alpaca 精调指令格式的数据
git clone https://github.com/AndrewZhe/lawyer-llama.git
2. 安装一些环境依赖包
pip install protobuf -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install llama2-wrapper -i https://pypi.tuna.tsinghua.edu.cn/simple/
pip install accelerate -i https://pypi.tuna.tsinghua.edu.cn/simple/
3. 安装 deepspeed
pip install deepspeed
ds_report
可能看到下面的warning
(/home/sd/conda_envs/llama2) [sd@worker85 chinese-alpaca-2-7b]$ ds_report
[2023-11-27 04:01:02,438] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
[WARNING] async_io requires the dev libaio .so object and headers but these were not found.
[WARNING] async_io: please install the libaio-devel package with yum
[WARNING] If libaio is already installed (perhaps from source), try setting the CFLAGS and LDFLAGS environment variables to where it can be found.
解决方法是
sudo yum install libaio-devel
然后就可以看到warning消失
(/home/sd/conda_envs/llama2) [sd@worker85 chinese-alpaca-2-7b]$ ds_report
[2023-11-27 04:02:25,275] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
--------------------------------------------------
DeepSpeed C++/CUDA extension op report
--------------------------------------------------
NOTE: Ops not installed will be just-in-time (JIT) compiled at
runtime if needed. Op compatibility means that your system
meet the required dependencies to JIT install the op.
--------------------------------------------------
JIT compiled ops requires ninja
ninja .................. [OKAY]
--------------------------------------------------
op name ................ installed .. compatible
--------------------------------------------------
async_io ............... [NO] ....... [OKAY]
fused_adam ............. [NO] ....... [OKAY]
cpu_adam ............... [NO] ....... [OKAY]
cpu_adagrad ............ [NO] ....... [OKAY]
cpu_lion ............... [NO] ....... [OKAY]
[WARNING] Please specify the CUTLASS repo directory as environment variable $CUTLASS_PATH
evoformer_attn ......... [NO] ....... [NO]
fused_lamb ............. [NO] ....... [OKAY]
fused_lion ............. [NO] ....... [OKAY]
inference_core_ops ..... [NO] ....... [OKAY]
cutlass_ops ............ [NO] ....... [OKAY]
quantizer .............. [NO] ....... [OKAY]
ragged_device_ops ...... [NO] ....... [OKAY]
ragged_ops ............. [NO] ....... [OKAY]
random_ltd ............. [NO] ....... [OKAY]
[WARNING] sparse_attn requires a torch version >= 1.5 and < 2.0 but detected 2.0
[WARNING] using untested triton version (2.0.0), only 1.0.0 is known to be compatible
sparse_attn ............ [NO] ....... [NO]
spatial_inference ...... [NO] ....... [OKAY]
transformer ............ [NO] ....... [OKAY]
stochastic_transformer . [NO] ....... [OKAY]
transformer_inference .. [NO] ....... [OKAY]
--------------------------------------------------
DeepSpeed general environment info:
torch install path ............... ['/home/sd/conda_envs/llama2/lib/python3.9/site-packages/torch']
torch version .................... 2.0.1+cu117
deepspeed install path ........... ['/home/sd/conda_envs/llama2/lib/python3.9/site-packages/deepspeed']
deepspeed info ................... 0.12.3, unknown, unknown
torch cuda version ............... 11.7
torch hip version ................ None
nvcc version ..................... 11.8
deepspeed wheel compiled w. ...... torch 2.0, cuda 11.7
shared memory (/dev/shm) size .... 503.39 GB
注意这里的 torch 版本是不支持稀疏 transformer 的,所以之后可以对 torch 版本尝试做一些优化
4. 设计一款法律咨询类的LLM模型
1. 下载相关数据
git clone https://github.com/AndrewZhe/lawyer-llama.git
文件夹中的 data 路径下有一些 alpaca 格式的指令集(其中能直接用的只有 judical_examination_v2.json )
(/home/sd/conda_envs/llama2) [sd@worker85 LLaMa2]$ ll lawyer-llama/data
total 21044
-rw-r--r--. 1 sd llm 2123057 Nov 27 05:00 judical_examination.json
-rw-r--r--. 1 sd llm 5394689 Nov 27 05:00 judical_examination_v2.json
-rw-r--r--. 1 sd llm 4011110 Nov 27 05:00 legal_advice.json
-rw-r--r--. 1 sd llm 1487954 Nov 27 05:00 legal_counsel_multi_turn_with_article_v2.json
-rw-r--r--. 1 sd llm 6340889 Nov 27 05:00 legal_counsel_v2.json
-rw-r--r--. 1 sd llm 2173140 Nov 27 05:00 legal_counsel_with_article_v2.json
2. 生成 LoRA 权重集
由于由于有了数据,下一步就是准备LoRA 调试的参数和配置
# 运行脚本前请仔细阅读wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh)
# Read the wiki(https://github.com/ymcui/Chinese-LLaMA-Alpaca-2/wiki/sft_scripts_zh) carefully before running the script
lr=1e-4
lora_rank=64
lora_alpha=128
lora_trainable="q_proj,v_proj,k_proj,o_proj,gate_proj,down_proj,up_proj"
modules_to_save="embed_tokens,lm_head"
lora_dropout=0.05
pretrained_model=/home/sd/LLaMa2/chinese-alpaca-2-7b
chinese_tokenizer_path=/home/sd/LLaMa2/chinese-alpaca-2-7b
dataset_dir=/home/sd/LLaMa2/lawyer-llama/data_alpaca
per_device_train_batch_size=1
per_device_eval_batch_size=1
gradient_accumulation_steps=8
max_seq_length=512
output_dir=/home/sd/LLaMa2/lawyer-llama/model
validation_file=/home/sd/LLaMa2/lawyer-llama/data_alpaca/judical_examination_v2.json
deepspeed_config_file=ds_zero2_no_offload.json
torchrun --nnodes 1 --nproc_per_node 1 run_clm_sft_with_peft.py \
--deepspeed ${deepspeed_config_file} \
--model_name_or_path ${pretrained_model} \
--tokenizer_name_or_path ${chinese_tokenizer_path} \
--dataset_dir ${dataset_dir} \
--per_device_train_batch_size ${per_device_train_batch_size} \
--per_device_eval_batch_size ${per_device_eval_batch_size} \
--do_train \
--do_eval \
--seed $RANDOM \
--fp16 \
--num_train_epochs 1 \
--lr_scheduler_type cosine \
--learning_rate ${lr} \
--warmup_ratio 0.03 \
--weight_decay 0 \
--logging_strategy steps \
--logging_steps 10 \
--save_strategy steps \
--save_total_limit 3 \
--evaluation_strategy steps \
--eval_steps 100 \
--save_steps 200 \
--gradient_accumulation_steps ${gradient_accumulation_steps} \
--preprocessing_num_workers 8 \
--max_seq_length ${max_seq_length} \
--output_dir ${output_dir} \
--overwrite_output_dir \
--ddp_timeout 30000 \
--logging_first_step True \
--lora_rank ${lora_rank} \
--lora_alpha ${lora_alpha} \
--trainable ${lora_trainable} \
--lora_dropout ${lora_dropout} \
--modules_to_save ${modules_to_save} \
--torch_dtype float16 \
--validation_file ${validation_file} \
--load_in_kbits 16 \
--save_safetensors False \
--gradient_checkpointing \
--ddp_find_unused_parameters False
注意这里的参数只需要进行个别调整即可
# 预训练模型地址
pretrained_model=
# 分词器地址 一般和模型地址相同
chinese_tokenizer_path=
# 训练数据地址
dataset_dir=
...
# 微调后模型输出地址
output_dir=
# 验证集地址
validation_file=
配置好之后直接
bash run_sft.sh
中间遇到了一个问题
Traceback (most recent call last):
File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_sft_with_peft.py", line 513, in <module>
main()
File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/training/run_clm_sft_with_peft.py", line 405, in main
model = LlamaForCausalLM.from_pretrained(
File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2700, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
TypeError: __init__() got an unexpected keyword argument 'use_flash_attention_2'
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 1601890) of binary: /home/sd/conda_envs/llama2/bin/python
这里是 transformers 版本过低的问题,直接升级下 transformers
pip install --upgrade transformers
######################################
Installing collected packages: tokenizers, transformers
Attempting uninstall: tokenizers
Found existing installation: tokenizers 0.13.3
Uninstalling tokenizers-0.13.3:
Successfully uninstalled tokenizers-0.13.3
Attempting uninstall: transformers
Found existing installation: transformers 4.31.0
Uninstalling transformers-4.31.0:
Successfully uninstalled transformers-4.31.0
Successfully installed tokenizers-0.15.0 transformers-4.35.2
然后再运行上面的 bash run_sft.sh
***** train metrics *****
epoch = 1.0
train_loss = 0.7889
train_runtime = 2:16:53.63
train_samples = 5000
train_samples_per_second = 0.609
train_steps_per_second = 0.076
11/27/2023 08:22:48 - INFO - __main__ - *** Evaluate ***
[INFO|trainer.py:738] 2023-11-27 08:22:48,082 >> The following columns in the evaluation set don't have a corresponding argument in `PeftModelForCausalLM.forward` and have been ignored: lang. If lang are not expected by `PeftModelForCausalLM.forward`, you can safely ignore this message.
[INFO|trainer.py:3158] 2023-11-27 08:22:48,084 >> ***** Running Evaluation *****
[INFO|trainer.py:3160] 2023-11-27 08:22:48,084 >> Num examples = 5000
[INFO|trainer.py:3163] 2023-11-27 08:22:48,084 >> Batch size = 1
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5000/5000 [14:43<00:00, 5.66it/s]
***** eval metrics *****
epoch = 1.0
eval_loss = nan
eval_runtime = 0:14:43.48
eval_samples = 5000
eval_samples_per_second = 5.659
eval_steps_per_second = 5.659
perplexity = nan
整个调试耗时 2.5 个小时左右
在设定的 LoRA 模型输出目录下,能看到输出的文件
(/home/sd/conda_envs/llama2) [sd@worker85 sft_lora_model]$ ll
total 1197972
-rw-r--r--. 1 sd llm 492 Nov 27 08:22 adapter_config.json
-rw-r--r--. 1 sd llm 1225856189 Nov 27 08:22 adapter_model.bin
-rw-r--r--. 1 sd llm 549 Nov 27 08:22 special_tokens_map.json
-rw-r--r--. 1 sd llm 1123 Nov 27 08:22 tokenizer_config.json
-rw-r--r--. 1 sd llm 844403 Nov 27 08:22 tokenizer.model
3. 合成大模型
执行以下命令
python ./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py \
--base_model ./chinese-alpaca-2-7b \
--lora_model ./lawyer-llama/model/sft_lora_model \
--output_type huggingface \
--output_dir ./lawyer-llama/model/full_model
ERROR 解决
然后出现这个问题
Traceback (most recent call last):
File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py", line 240, in <module>
lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/peft/config.py", line 134, in from_pretrained
config = config_cls(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'enable_lora'
Traceback (most recent call last):
File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/merge_llama2_with_chinese_lora_low_mem.py", line 240, in <module>
lora_config = peft.LoraConfig.from_pretrained(lora_model_path)
File "/home/sd/conda_envs/llama2/lib/python3.9/site-packages/peft/config.py", line 134, in from_pretrained
config = config_cls(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'merge_weights'
解决方法,直接在 ./lawyer-llama/model/sft_lora_model/adapter_config.json 中去掉对应的 key 即可
vim ./lawyer-llama/model/sft_lora_model/adapter_config.json
############## 修改前 ###########
{
"base_model_name_or_path": "/home/sd/LLaMa2/chinese-alpaca-2-7b",
"bias": "none",
"enable_lora": null,
"fan_in_fan_out": false,
"inference_mode": true,
"lora_alpha": 128.0,
"lora_dropout": 0.05,
"merge_weights": false,
"modules_to_save": [
"embed_tokens",
"lm_head"
],
"peft_type": "LORA",
"r": 64,
"target_modules": [
"q_proj",
"v_proj",
"k_proj",
"o_proj",
"gate_proj",
"down_proj",
"up_proj"
],
"task_type": "CAUSAL_LM"
}
############## 修改后 ###########
{
"base_model_name_or_path": "/home/sd/LLaMa2/chinese-alpaca-2-7b",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"lora_alpha": 128.0,
"lora_dropout": 0.05,
"modules_to_save": [
"embed_tokens",
"lm_head"
],
"peft_type": "LORA",
"r": 64,
"target_modules": [
"q_proj",
"v_proj",
"k_proj",
"o_proj",
"gate_proj",
"down_proj",
"up_proj"
],
"task_type": "CAUSAL_LM"
}
4. 模型展示
Traceback (most recent call last):
File "/home/sd/LLaMa2/./Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py", line 48, in <module>
from openai_api_protocol import (
File "/home/sd/LLaMa2/Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_protocol.py", line 5, in <module>
import shortuuid
ModuleNotFoundError: No module named 'shortuuid'
啥都不问,直接 pip install 即可
python -m llama2_wrapper.server --model_path /home/sd/LLaMa2/chinese-alpaca-2-7b --backend_type transformers --host 0.0.0.0 --port 8000
然后打开 http://IP:8000/docs 可以做测试
或者
python ./Chinese-LLaMA-Alpaca-2/scripts/openai_server_demo/openai_api_server.py --base_model ./lawyer-llama/model/full_model