概述
别人使用lora的方式对baichuan2-13b的模型进行了微调训练,希望我能部署到mindie服务中。
lora微调后生成的文件如下:
# ls checkpoint-lora
adapter_config.json optimizer.pt rng_state.pth special_tokens_map.json tokenizer_config.json trainer_state.json
adapter_model.safetensors README.md scheduler.pt tokenization_baichuan.py tokenizer.model training_args.bin
# cat checkpoint-lora/adapter_config.json
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/home/xxxxx/baichuan-inc/Baichuan2-13B-Chat", #这里是微调基础模型路径
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
。。。。。。。。。。
}
测试合并前的lora模型
加载合并前的lora模型,需要用AutoPeftModelForCausalLM.from_pretrained生成model实例,其它地方和正常模型都一样。
import torch
import torch_npu
from torch_npu.npu import amp
from torch_npu.contrib import transfer_to_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
from peft import AutoPeftModelForCausalLM
model_path="/home/xxxx/baichuan-inc/Baichuan2-13B-Chat/"
lora_path="/home/xxxx/baichuan-inc/checkpoint-lora/"
tokenizer = AutoTokenizer.from_pretrained(model_path,
revision="v2.0",
use_fast=False,
trust_remote_code=True)
model = AutoPeftModelForCausalLM.from_pretrained(lora_path,
revision="v2.0",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True).half().npu().eval()
model.generation_config = GenerationConfig.from_pretrained(model_path, revision="v2.0")
messages = []
messages.append({"role": "user", "content": "讲一个100字左右的故事"})
response = model.chat(tokenizer, messages)
print(response)
将lora微调后的模型合并到baichuan2-13b的基础模型
下面的代码从model_path加载基础模型,从lora_path加载lora模型,最终将合并后的文件保存到merge_path。
import torch
import torch_npu
from torch_npu.npu import amp
from torch_npu.contrib import transfer_to_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
from peft import PeftModel
model_path="/home/xxxx/baichuan-inc/Baichuan2-13B-Chat/"
lora_path="/home/xxxx/baichuan-inc/checkpoint-lora/"
merge_path="/home/xxxx/baichuan-inc/Baichuan2-13B-Chat-lora-merge"
print(f"Loading the Base model from {model_path}")
tokenizer = AutoTokenizer.from_pretrained(model_path,
revision="v2.0",
use_fast=False,
trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained(model_path,
revision="v2.0",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True)
#trust_remote_code=True).eval().half().npu()
print(f"Loading the LoRA from {lora_path}")
lora_model = PeftModel.from_pretrained(
base_model,
lora_path,
torch_dtype=torch.float16,
)
print("Applying the LoRA")
model = lora_model.merge_and_unload()
print(f"Saving the target model to {merge_path}")
model.save_pretrained(merge_path)
print(f"Saving the tokenizer to {merge_path}")
tokenizer.save_pretrained(merge_path)
查看合并后的模型目录:
# ls Baichuan2-13B-Chat-lora-merge/ -lh
total 26G
-rw-r----- 1 root root 780 Jun 24 16:16 config.json
-rw------- 1 root root 1.6K Jun 24 16:16 configuration_baichuan.py
-rw-r----- 1 root root 285 Jun 24 16:16 generation_config.json
-rw------- 1 root root 2.9K Jun 24 16:16 generation_utils.py
-rw-r----- 1 root root 4.6G Jun 24 16:16 model-00001-of-00006.safetensors
-rw-r----- 1 root root 4.6G Jun 24 16:16 model-00002-of-00006.safetensors
-rw-r----- 1 root root 4.6G Jun 24 16:16 model-00003-of-00006.safetensors
-rw-r----- 1 root root 4.7G Jun 24 16:16 model-00004-of-00006.safetensors
-rw-r----- 1 root root 4.6G Jun 24 16:17 model-00005-of-00006.safetensors
-rw-r----- 1 root root 3.0G Jun 24 16:17 model-00006-of-00006.safetensors
-rw------- 1 root root 32K Jun 24 16:16 modeling_baichuan.py
-rw-r----- 1 root root 23K Jun 24 16:17 model.safetensors.index.json
-rw------- 1 root root 9.0K Jun 24 16:16 quantizer.py
-rw-r----- 1 root root 544 Jun 24 16:17 special_tokens_map.json
-rw------- 1 root root 8.9K Jun 24 16:17 tokenization_baichuan.py
-rw-r----- 1 root root 918 Jun 24 16:17 tokenizer_config.json
-rw-r----- 1 root root 2.0M Jun 24 16:17 tokenizer.model
合并后的模型目录和原来的基础模型差不多,文件大小也差不多。
测试合并lora之后的模型文件
import torch
import torch_npu
from torch_npu.npu import amp
from torch_npu.contrib import transfer_to_npu
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers.generation.utils import GenerationConfig
model_path="/home/xxxx/baichuan-inc/Baichuan2-13B-Chat-lora-merge" #这里用的是合并后的模型
tokenizer = AutoTokenizer.from_pretrained(model_path,
revision="v2.0",
use_fast=False,
trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(model_path,
revision="v2.0",
device_map="auto",
torch_dtype=torch.float16,
trust_remote_code=True).half().npu().eval()
model.generation_config = GenerationConfig.from_pretrained(model_path, revision="v2.0")
messages = []
messages.append({"role": "user", "content": "讲一个100字左右的故事"})
response = model.chat(tokenizer, messages)
print(response)
遇到的错误
在其他环境测试,合并lora时遇到一个报错:
/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/modeling_utils.py:2525: UserWarning: Attempting to save a model with offloaded modules. Ensure that unallocated cpu memory exceeds the `shard_size` (5GB default)
warnings.warn(
Traceback (most recent call last):
File "/app/merge.py", line 38, in <module>
model.save_pretrained(merge_path, max_shard_size='8GB')
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2661, in save_pretrained
shard = {tensor: state_dict[tensor] for tensor in tensors}
File "/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/site-packages/transformers/modeling_utils.py", line 2661, in <dictcomp>
shard = {tensor: state_dict[tensor] for tensor in tensors}
NameError: free variable 'state_dict' referenced before assignment in enclosing scope
/home/ma-user/anaconda3/envs/PyTorch-2.1.0/lib/python3.9/tempfile.py:821: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp1h0ap7iu'>
_warnings.warn(warn_message, ResourceWarning)
在github遇到一个类似的问题:NameError: free variable ‘state_dict’ referenced before assignment in enclosing scope
解决方法就是在保存文件时将max_shard_size设置为一个比模型本身稍大的值。
model.save_pretrained(merge_path, max_shard_size='16GB')
我原来没设置max_shard_size这个参数,后来设置8GB,发现报错时间延长了,设置为16GB就好了。最终我生成的safetensor文件大小是14GB,所以我设置为16GB就够了。
其它说明
测试mindie过程发现模型保存为bfloat16会有问题,所以上述代码中我都是使用float16。