大模型Llama 3.1（三）Llama 3.1模型微调实战

最新推荐文章于 2024-09-12 21:00:04 发布

束骞

最新推荐文章于 2024-09-12 21:00:04 发布

阅读量36

点赞数

文章标签： llama

1、数据集准备

微调大型语言模型（LLM）通常涉及指令微调，这是一种特定的数据准备和训练过程。在指令微调中，数据集由一系列包含指令、输入和输出的条目组成，例如：

{
"instruction": "回答以下用户问题，仅输出答案。",
"input": "1+1等于几?",
"output": "2"
}

在这个例子中，`instruction` 是给予模型的任务指令，明确告知模型需要完成的具体任务；`input` 是为了完成任务所需的用户提问或相关信息；而 `output` 则是模型应产生的预期回答。

我们的目标是训练模型，使其能够准确理解并遵循用户的指令。因此，在构建指令集时，必须针对特定的应用目标精心设计。例如，如果我们的目标是创建一个能够模仿特定对话风格的个性化LLM，我们就需要构建与之相应的指令集。

以使用开源的甄嬛传对话数据集为例，如果我们希望模型能够模拟甄嬛的对话风格，我们可以构造如下形式的指令：

大模型Llama 3.1（三）Llama 3.1模型微调实战_AI

在此示例中，我们省略了 `input` 字段，因为模型的回答是基于预设的角色背景知识，而非用户的直接提问。通过这种方式，我们可以训练模型学习并模仿特定角色的语言风格和对话模式，从而在实际应用中提供更加个性化和情景化的交互体验。

2、导入依赖包

from datasets import Dataset
import pandas as pd
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer, GenerationConfig

3、读取数据集

# 将JSON文件转换为CSV文件
df = pd.read_json('huanhuan.json')
ds = Dataset.from_pandas(df)
ds[:3]

输出：

{'instruction': ['小姐，别的秀女都在求中选，唯有咱们小姐想被撂牌子，菩萨一定记得真真儿的——',
'这个温太医啊，也是古怪，谁不知太医不得皇命不能为皇族以外的人请脉诊病，他倒好，十天半月便往咱们府里跑。',
'嬛妹妹，刚刚我去府上请脉，听甄伯母说你来这里进香了。'],
'input': ['', '', ''],
'output': ['嘘——都说许愿说破是不灵的。', '你们俩话太多了，我该和温太医要一剂药，好好治治你们。', '出来走走，也是散心。']}

4、处理数据集

1）定义分词器

tokenizer = AutoTokenizer.from_pretrained('/root/autodl-tmp/LLM-Research/Meta-Llama-3___1-8B-Instruct', use_fast=False, trust_remote_code=True)
tokenizer.pad_token = tokenizer.eos_token

2）消息格式查看

messages = [
{"role": "system", "content": "现在你要扮演皇帝身边的女人--甄嬛"},
{"role": "user", "content": '你好呀'},
{"role": "assistant", "content": "你好，我是甄嬛，你有什么事情要问我吗？"},    
]
print(tokenizer.apply_chat_template(messages, tokenize=False))

输出：

<|begin_of_text|><|start_header_id|>system<|end_header_id|>
现在你要扮演皇帝身边的女人--甄嬛<|eot_id|><|start_header_id|>user<|end_header_id|>
你好呀<|eot_id|><|start_header_id|>assistant<|end_header_id|>
你好，我是甄嬛，你有什么事情要问我吗？<|eot_id|><|start_header_id|>assistant<|end_header_id|>

3）数据处理函数

def process_func(example):
    MAX_LENGTH = 384    # Llama分词器会将一个中文字切分为多个token，因此需要放开一些最大长度，保证数据的完整性
    input_ids, attention_mask, labels = [], [], []
    instruction = tokenizer(f"<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n现在你要扮演皇帝身边的女人--甄嬛<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n{example['instruction'] + example['input']}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n", add_special_tokens=False)  # add_special_tokens 不在开头加 special_tokens
    response = tokenizer(f"{example['output']}<|eot_id|>", add_special_tokens=False)
    input_ids = instruction["input_ids"] + response["input_ids"] + [tokenizer.pad_token_id]
    attention_mask = instruction["attention_mask"] + response["attention_mask"] + [1]  # 因为eos token咱们也是要关注的所以 补充为1
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"] + [tokenizer.pad_token_id]  
    if len(input_ids) > MAX_LENGTH:  # 做一个截断
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }

4）数据处理

输出：

大模型Llama 3.1（三）Llama 3.1模型微调实战_AI_02

5）解码查看input_ids

输出：

'<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n现在你要扮演皇帝身边的女人--甄嬛<|eot_id|><|start_header_id|>user<|end_header_id|>\n\n小姐，别的秀女都在求中选，唯有咱们小姐想被撂牌子，菩萨一定记得真真儿的——<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n嘘——都说许愿说破是不灵的。<|eot_id|><|eot_id|>'

6）解码查看labels

输出：

'你们俩话太多了，我该和温太医要一剂药，好好治治你们。<|eot_id|><|eot_id|>'

5、定义模型

import torch
model = AutoModelForCausalLM.from_pretrained('/root/autodl-tmp/LLM-Research/Meta-Llama-3___1-8B-Instruct', device_map="auto",torch_dtype=torch.bfloat16)
model

输出如下：

大模型Llama 3.1（三）Llama 3.1模型微调实战_人工智能_03

查看模型加载的精度

输出：

6、Lora配置

LoraConfig这个类中可以设置很多参数，但主要的参数如下

task_type：模型类型
target_modules：需要训练的模型层的名字，主要就是attention部分的层，不同的模型对应的层的名字不同，可以传入数组，也可以字符串，也可以正则表达式。
r：lora的秩，
具体可以看Lora原理lora_alpha：Lora alaph，具体作用参见 Lora 原理

Lora的缩放是啥？不是r（秩），这个缩放就是lora_alpha/r, 在这个LoraConfig中缩放就是4倍。

from peft import LoraConfig, TaskType, get_peft_model

config = LoraConfig(
    task_type=TaskType.CAUSAL_LM, 
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    inference_mode=False, # 训练模式
    r=8, # Lora 秩
    lora_alpha=32, # Lora alaph，具体作用参见 Lora 原理
    lora_dropout=0.1# Dropout 比例
)
config

输出：

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path=None, revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, r=8, target_modules={'k_proj', 'v_proj', 'up_proj', 'o_proj', 'down_proj', 'gate_proj', 'q_proj'}, lora_alpha=32, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None)

加载微调配置

输出：

LoraConfig(peft_type=<PeftType.LORA: 'LORA'>, auto_mapping=None, base_model_name_or_path='/root/autodl-tmp/LLM-Research/Meta-Llama-3___1-8B-Instruct', revision=None, task_type=<TaskType.CAUSAL_LM: 'CAUSAL_LM'>, inference_mode=False, r=8, target_modules={'k_proj', 'v_proj', 'up_proj', 'o_proj', 'down_proj', 'gate_proj', 'q_proj'}, lora_alpha=32, lora_dropout=0.1, fan_in_fan_out=False, bias='none', use_rslora=False, modules_to_save=None, init_lora_weights=True, layers_to_transform=None, layers_pattern=None, rank_pattern={}, alpha_pattern={}, megatron_config=None, megatron_core='megatron.core', loftq_config={}, use_dora=False, layer_replication=None)

查看可训练的参数

输出：

7、配置训练参数

TrainingArguments这个类的源码也介绍了每个参数的具体作用，当然大家可以来自行探索，这里就简单说几个常用的。

output_dir：模型的输出路径
per_device_train_batch_size：顾名思义 batch_size
gradient_accumulation_steps: 梯度累加，如果你的显存比较小，那可以把 batch_size 设置小一点，梯度累加增大一些。
logging_steps：多少步，输出一次log
num_train_epochs：顾名思义 epoch
gradient_checkpointing：梯度检查，这个一旦开启，模型就必须执行model.enable_input_require_grads()

args = TrainingArguments(
    output_dir="./output/llama3_1_instruct_lora",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    logging_steps=10,
    num_train_epochs=3,
    save_steps=100, # 为了快速演示，这里设置10，建议你设置成100
    learning_rate=1e-4,
    save_on_each_node=True,
    gradient_checkpointing=True
)

8、开始Trainer训练

trainer = Trainer(
model=model,
args=args,
train_dataset=tokenized_id,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
trainer.train()

训练完成如下：

大模型Llama 3.1（三）Llama 3.1模型微调实战_人工智能_04

9、合并模型

将训练后的权重文件合并到基础模型中，产生新的模型文件

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from peft import PeftModel

mode_path = '/root/autodl-tmp/LLM-Research/Meta-Llama-3___1-8B-Instruct'
lora_path = '/root/autodl-tmp/output/llama3_1_instruct_lora/checkpoint-100' # 这里改称你的 lora 输出对应 checkpoint 地址

# 加载tokenizer
tokenizer = AutoTokenizer.from_pretrained(mode_path, trust_remote_code=True)

# 加载模型
model = AutoModelForCausalLM.from_pretrained(mode_path, device_map="auto",torch_dtype=torch.bfloat16, trust_remote_code=True).eval（)

# 加载lora权重
model = PeftModel.from_pretrained(model, model_id=lora_path)

合并完成如下：

大模型Llama 3.1（三）Llama 3.1模型微调实战_llama_05

10、模型推理

prompt = "你是谁？"

messages = [
        {"role": "system", "content": "假设你是皇帝身边的女人--甄嬛。"},
        {"role": "user", "content": prompt}
]

input_ids = tokenizer.apply_chat_template(messages, tokenize=False)
model_inputs = tokenizer([input_ids], return_tensors="pt").to('cuda')
generated_ids = model.generate(model_inputs.input_ids,max_new_tokens=512)
generated_ids = [
    output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
print(response)

推理结果输出：

文章最后

AI大模型作为人工智能领域的重要技术突破，正成为推动各行各业创新和转型的关键力量。抓住AI大模型的风口，掌握AI大模型的知识和技能将变得越来越重要。

学习AI大模型是一个系统的过程，需要从基础开始，逐步深入到更高级的技术。

原创作者: u_15620990 转载于: https://blog.51cto.com/u_15620990/11715776

束骞

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
大模型Llama 3.1（三）Llama 3.1模型微调实战

1、数据集准备微调大型语言模型（LLM）通常涉及指令微调，这是一种特定的数据准备和训练过程。在指令微调中，数据集由一系列包含指令、输入和输出的条目组成，例如：登录后复制 {"instruction": "回答以下用户问题，仅输出答案。","input": "1+1等于几?","output": "2"}1.2.3...
复制链接

扫一扫