ollama 使用自己的微调模型

AI数字生命

已于 2024-07-12 10:56:10 修改

阅读量4.4w

点赞数 118

文章标签： python 开发语言

于 2024-05-12 16:55:21 首次发布

本文链接：https://blog.csdn.net/spiderwower/article/details/138755776

版权

前言

上一期，介绍了如何使用ollama自定义模型，但是创建的模型都是未经微调，对话效果不够专业。这一期将介绍，如何用ollama创建微调过的模型，让对话效果更符合特定化需要。

对话数据原文为：

{
  "instruction": "樟脑丸是我吃过最难吃的硬糖有奇怪的味道怎么还有人买",
  "input": "",
  "output": "樟脑丸并不是硬糖，而是一种常见的驱虫药，不能食用。虽然它的味道可能不太好，但是由于其有效的驱虫效果，所以仍然有很多人会购买。"
}

1.基础模型，未经过微调，使用ollama+openwebui的对话效果：

2.基础模型+lora微调，再用ollama创建模型，对话效果如下：

3.通过对比发现，微调后的模型达到了预期效果。

一、微调大模型

1.LORA微调

微调大模型的方法，这里不展开说，我使用的lora微调方法。微调大模型需要比较高的显存，推荐在云服务器上进行训练，系统环境为linux。训练方法参考https://github.com/datawhalechina/self-llm

要想体验和stable diffussion一样的ui训练界面，可以参考https://github.com/hiyouga/LLaMA-Factory

1.1 选择基础大模型

基础大模型我选择Chinese-Mistral-7B-Instruct-v0.1，模型文件可以在https://huggingface.co/，或者huggingface镜像网站HF-Mirror - Huggingface 镜像站，或者魔搭社区进行下载，我用魔搭社区的python脚本进行下载，执行前需要先运行pip install modelscope。

1.2 下载基础大模型

新建一个download.py脚本

from modelscope import snapshot_download
 
#模型存放路径
model_path = '/root/autodl-tmp'
#模型名字
name = 'itpossible/Chinese-Mistral-7B-Instruct-v0.1'
model_dir = snapshot_download(name, cache_dir=model_path, revision='master')

2.选择数据集

微调大模型要想获得比较好的效果，拥有高质量的数据集是关键。可以选择用网上开源的，或者是自己制作。以中文数据集弱智吧为例，约1500条对话数据，数据集可以从https://huggingface.co/，或者huggingface镜像网站HF-Mirror - Huggingface 镜像站进行下载。我是手动下载后，上传到服务器。

3.lora微调

3.1安装依赖

我是miniconda创建的python环境，python版本=3.10。

依赖文件requirements.txt内容如下：

transformers
streamlit==1.24.0
sentencepiece==0.1.99
accelerate==0.29.3
datasets
peft==0.10.0

pip install - r requirements.txt

3.2编写训练脚本

3.2.1 指定模型路径

from datasets import Dataset
import pandas as pd
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForSeq2Seq,
    TrainingArguments,
    Trainer, )
import torch,os
from peft import LoraConfig, TaskType, get_peft_model
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # 忽略告警

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 模型文件路径
model_path = r'/root/autodl-tmp/itpossible/Chinese-Mistral-7B-Instruct-v0.1'
# 训练过程数据保存路径
name = 'ruozhiba'
output_dir = f'./output/Mistral-7B-{name}'
#是否从上次断点处接着训练，如果需要从上次断点处继续训练，值应为True
train_with_checkpoint = False

3.2.2加载tokenizer

# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

3.2.3加载数据集

df = pd.read_json(f'./dataset/{name}.json')
ds = Dataset.from_pandas(df)
print(ds)

3.2.4处理数据集

需要将数据集的内容按大模型的对话格式进行处理，不同的模型，对话格式不一样，比如qwen1.5、llama3的对话格式都不一样。以下面这一条对话数据为例。

处理前的内容：

  {
    "instruction": "只剩一个心脏了还能活吗？",
    "input": "",
    "output": "能，人本来就只有一个心脏。"
  }

处理后，喂给大模型的内容：

<s>[INST] <<SYS>>

<</SYS>>

只剩一个心脏了还能活吗？ [/INST] 能，人本来就只有一个心脏。 </s>

# 对数据集进行处理，需要将数据集的内容按大模型的对话格式进行处理
def process_func_mistral(example):
    MAX_LENGTH = 384  # Llama分词器会将一个中文字切分为多个token，因此需要放开一些最大长度，保证数据的完整性
    instruction = tokenizer(
        f"<s>[INST] <<SYS>>\n\n<</SYS>>\n\n{example['instruction']+example['input']}[/INST]",add_special_tokens=False)  # add_special_tokens 不在开头加 special_tokens
    response = tokenizer(f"{example['output']}", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]  # 因为pad_token_id咱们也是要关注的所以 补充为1
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
    if len(input_ids) > MAX_LENGTH:  # 做一个截断
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

inputs_id = ds.map(process_func_mistral, remove_columns=ds.column_names)

3.2.5加载模型

#加载模型
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.bfloat16, use_cache=False)
model.enable_input_require_grads()  # 开启梯度检查点时，要执行该方法
print(model)

模型信息如下，其中q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj这几层权重是我们要微调学习的参数。

3.2.6设置Lora训练参数

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False,  # 训练模式
    r=8,  # Lora 秩
    lora_alpha=32,  # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1  # Dropout 比例
)

3.2.7设置训练参数

model = get_peft_model(model, config)
model.print_trainable_parameters()
args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    logging_steps=20,
    num_train_epochs=2,
    save_steps=25,
    save_total_limit=2,
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True
)

输出的信息显示：trainable params: 20,971,520 || all params: 7,523,799,040 || trainable%: 0.27873578080043987

可以看到模型总共参数约75亿个，需要训练的参数约2千万个，只微调了约0.3%的参数。

3.2.8开始训练

trainer = Trainer(
    model=model,
    args=args,
    train_dataset=inputs_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)

# 如果训练中断了，还可以从上次中断保存的位置继续开始训练
if train_with_checkpoint:
    checkpoint = [file for file in os.listdir(output_dir) if 'checkpoint' in file][-1]
    last_checkpoint = f'{output_dir}/{checkpoint}'
    print(last_checkpoint)
    trainer.train(resume_from_checkpoint=last_checkpoint)
else:
    trainer.train()

3.2.9完整的训练脚本

from datasets import Dataset
import pandas as pd
from transformers import (
    AutoTokenizer,
    AutoModelForCausalLM,
    DataCollatorForSeq2Seq,
    TrainingArguments,
    Trainer, )
import torch,os
from peft import LoraConfig, TaskType, get_peft_model
import warnings
warnings.filterwarnings("ignore", category=UserWarning) # 忽略告警

device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 模型文件路径
model_path = r'/root/autodl-tmp/itpossible/Chinese-Mistral-7B-Instruct-v0.1'
# 训练过程数据保存路径
name = 'ruozhiba'
output_dir = f'./output/Mistral-7B-{name}'
#是否从上次断点处接着训练
train_with_checkpoint = True


# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token


#加载数据集
df = pd.read_json(f'./dataset/{name}.json')
ds = Dataset.from_pandas(df)
print(ds)

# 对数据集进行处理，需要将数据集的内容按大模型的对话格式进行处理
def process_func_mistral(example):
    MAX_LENGTH = 384  # Llama分词器会将一个中文字切分为多个token，因此需要放开一些最大长度，保证数据的完整性
    instruction = tokenizer(
        f"<s>[INST] <<SYS>>\n\n<</SYS>>\n\n{example['instruction']+example['input']}[/INST]",add_special_tokens=False)  # add_special_tokens 不在开头加 special_tokens
    response = tokenizer(f"{example['output']}", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]  # 因为pad_token_id咱们也是要关注的所以 补充为1
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]
    if len(input_ids) > MAX_LENGTH:  # 做一个截断
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }
inputs_id = ds.map(process_func_mistral, remove_columns=ds.column_names)

#加载模型
model = AutoModelForCausalLM.from_pretrained(model_path, device_map=device, torch_dtype=torch.bfloat16, use_cache=False)
print(model)
model.enable_input_require_grads()  # 开启梯度检查点时，要执行该方法
config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False,  # 训练模式
    r=8,  # Lora 秩
    lora_alpha=32,  # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1  # Dropout 比例
)

model = get_peft_model(model, config)
model.print_trainable_parameters()
args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=2,
    gradient_accumulation_steps=2,
    logging_steps=20,
    num_train_epochs=2,
    save_steps=25,
    save_total_limit=2,
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True
)
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=inputs_id,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
# 如果训练中断了，还可以从上次中断保存的位置继续开始训练
if train_with_checkpoint:
    checkpoint = [file for file in os.listdir(output_dir) if 'checkpoint' in file][-1]
    last_checkpoint = f'{output_dir}/{checkpoint}'
    print(last_checkpoint)
    trainer.train(resume_from_checkpoint=last_checkpoint)
else:
    trainer.train()

4.将checkpoint转换为lora

新建一个checkpoint_to_lora.py，将训练的checkpoint保存为lora

from transformers import AutoModelForSequenceClassification,AutoTokenizer
import os

# 需要保存的lora路径
lora_path= "/root/lora/Mistral-7B-lora-ruozhiba"
# 模型路径
model_path = '/root/autodl-tmp/itpossible/Chinese-Mistral-7B-Instruct-v0.1'
# 检查点路径
checkpoint_dir = '/root/output/Mistral-7B-ruozhiba'
checkpoint = [file for file in os.listdir(checkpoint_dir) if 'checkpoint-' in file][-1] #选择更新日期最新的检查点
model = AutoModelForSequenceClassification.from_pretrained(f'/root/output/Mistral-7B-ruozhiba/{checkpoint}')
# 保存模型
model.save_pretrained(lora_path)
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token
# 保存tokenizer
tokenizer.save_pretrained(lora_path)

5.合并模型

新建一个merge.py文件，将基础模型和lora模型合并为一个新的模型文件

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel
from peft import LoraConfig, TaskType, get_peft_model

model_path = '/root/autodl-tmp/itpossible/Chinese-Mistral-7B-Instruct-v0.1'
lora_path = "/root/lora/Mistral-7B-lora-ruozhiba"
device = 'cuda' if torch.cuda.is_available() else 'cpu'
# 合并后的模型路径
output_path = r'/root/autodl-tmp/itpossible/merge'

# 等于训练时的config参数
config = LoraConfig(
    task_type=TaskType.CAUSAL_LM,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False,  # 训练模式
    r=8,  # Lora 秩
    lora_alpha=32,  # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1  # Dropout 比例
)

base = AutoModelForCausalLM.from_pretrained(model_path, torch_dtype=torch.bfloat16, low_cpu_mem_usage=True)
base_tokenizer = AutoTokenizer.from_pretrained(model_path)
lora_model = PeftModel.from_pretrained(
    base,
    lora_path,
    torch_dtype=torch.float16,
    config=config
)
model = lora_model.merge_and_unload()
model.save_pretrained(output_path)
base_tokenizer.save_pretrained(output_path)

二、量化模型

1.转换模型文件

基础模型和lora合并后的模型，仍然为多个safetensors，需要将多个safetensors合并为一个bin。合并方法需要使用llama.cpp中convert.py转换脚本，github地址https://github.com/ggerganov/llama.cpp。转换方法详见ollama 使用自定义大模型-CSDN博客

#1.下载llama.cpp源码并进入到该目录
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
#2.用miniconda创建一个虚拟环境
conda create -n llama python=3.10
conda init bash && source /root/.bashrc
#3.激活环境
conda activate llama
#4.安装llama.cpp项目依赖
pip install -r requirements.txt
#5.执行转换脚本
python convert.py /root/autodl-tmp/itpossible/merge --outtype f16 --outfile /root/autodl-tmp/itpossible/convert.bin

执行转换后，可以得到一个convert.bin文件，约等于14G。为了节约存储空间，之前的合并模型文件夹可以删除了。

rm -rf /root/autodl-tmp/itpossible/merge

2.量化模型

对llama.cpp项目编译后，有个quantize可执行文件

/root/ollama/llm/llama.cpp/quantize /root/autodl-tmp/itpossible/convert.bin q5_k_m

得到文件ggml-model-Q5_K_M.gguf，量化参数有多个标准可以选择，选择不同的量化，模型的推理效果不一样。

三、ollama 创建模型

使用ollama，根据ggml-model-Q5_K_M.gguf创建模型，方法详见ollama 使用自定义大模型_ollama 上面好用的大模型-CSDN博客

四、总结

1.我还分别使用了llama3-8b，qwen1.5-1.8b进行lora微调，但是在使用llama.cpp进行模型转换环节，出现了NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()，从官网查询了很多issues，暂未找到解决的办法，所以目前只有Chinese-Mistral-7B-Instruct-v0.1成功了。

2.ollama的modelfile中还提供了添加ADAPTER的方法，也就是将lora单独作为ADAPTER，试了一下，模型推理效果不正确，暂未找到原因。目前，试验成功的方法只有这一个。将基础模型+lora模型合并后，再用ollama创建模型，推理效果达到了预期。

3.我只训练了不到半小时，要想微调后的对话效果更好，需要更多的数据集，和更长时间的训练。