LoRA微调中文版Llama3模型

LLM挣扎学员

已于 2024-07-13 18:56:05 修改

阅读量2.3k

点赞数 40

文章标签：深度学习机器学习人工智能自然语言处理 llama

于 2024-07-12 17:07:53 首次发布

本文链接：https://blog.csdn.net/zc1226/article/details/140381535

版权

文章目录

概要

LoRA（Low-Rank Adaptation）是一种微调大模型的方法，通过引入低秩矩阵来减少参数量和计算复杂度，主要应用于大型预训练语言模型的微调过程。本文章讲解的是使用LoRA技术微调Llama3中文版模型。实验环境为kaggle、GPU环境为2块T4.

整体微调流程

1.准备数据：收集并预处理用于微调的训练数据。
2.模型初始化：加载预训练的大模型Llama3。
3.引入LoRA层：在特定层引入低秩矩阵，使其参数可学习，而不改变原始模型的大部分参数。
4.微调训练：使用准备好的数据进行微调训练，只更新低秩矩阵的参数。
5.评估与调优：在验证集上评估微调后的模型效果，并根据需要调整超参数。
接下来详细介绍微调流程

导入所需要的库

# 基本都是些常见库，有些可能没用到，但无伤大雅
import torch
import shutil
from torch import cuda
import torch.nn as nn
from torch.utils.data import Dataset, DataLoader
torch.backends.cuda.enable_mem_efficient_sdp(False)
torch.backends.cuda.enable_flash_sdp(False)
from transformers import AutoModelForCausalLM, AutoTokenizer, Trainer, TrainingArguments,AutoConfig
from transformers import GenerationConfig,AutoModelForSequenceClassification,DataCollatorWithPadding
import os
os.environ["TOKENIZERS_PARALLELISM"] = "false"
os.environ['CUDA_LAUNCH_BLOCKING'] = '1'
!pip install -q peft==0.3.0
from peft import TaskType,LoraConfig,get_peft_model,set_peft_model_state_dict
from peft.utils import TRANSFORMERS_MODELS_TO_LORA_TARGET_MODULES_MAPPING
import pandas as pd
from datasets import Dataset
import json
from sklearn.model_selection import train_test_split
from transformers import TrainerCallback

将所需要的模型下载下来

model_id为模型版本，由于环境限制只采用了16位计算，device_map="auto"会自动将模型分布在两张GPU上

model_id = 'hfl/llama-3-chinese-8b-instruct-v3'
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.float16, device_map="auto")
print('模型下载好了')

准备微调的数据集，本次实验数据集链接为https://github.com/SCIR-HI/Huatuo-Llama-Med-Chinese/tree/main/data

#数据集样例
"instruction":"一名年龄在70岁的女性，出现了晕厥、不自主颤抖、情绪不稳等症状，请详细说明其手术治疗和术前准备。"
"input":""
"output":"该病需要进行电极导线、脉冲发生器和永久心脏起搏器置入术，并需要使用镇静药物和局麻对病人进行手术治疗。术前准备包括1-3天的时间进行术前检查和生活方式的调整。"
#计算一下数据集的长度
flag = 0
for entry in data_list:
    instruction = entry.get('instruction', '')
    output_text = entry.get('output', '')

    context = f"""下面是一个问题，运用医学知识来正确回答提问.\n问题:\n{instruction}\n"""
    target = f"回答:\n{output_text}"
    combined_input = context + target
    encoded = tokenizer(combined_input)
    input_ids = encoded['input_ids']
    if len(input_ids)>=256:
#        print(combined_input)
        flag+=1
#         print('---------')
print(flag)#按照输入模版编码后长度超过256的仅有26条数据，所以256的max_length是够的

由于原数据不是完全按照json格式，所以要处理一下

input_file = '/kaggle/input/medical/llama_data.json'
output_file = 'data.json'
# 读取文件内容
with open(input_file, 'r', encoding='utf-8') as file:
    lines = file.readlines()
# 处理每一行并转换为字典
data_list = [json.loads(line) for line in lines]
# 将列表保存为JSON格式的文件
with open(output_file, 'w', encoding='utf-8') as file:
    json.dump(data_list, file, ensure_ascii=False, indent=2)
print(f"数据成功保存 {output_file}")

# 创建包含指令、输入和输出的列表
processed_data = []
# 设置pad_token
tokenizer.pad_token_id = (0)

# 1. **`input_ids`** 是模型看到的所有上下文，用于生成输出。
# 2. **`labels`** 是模型应该生成的文本，与 `input_ids` 对应的位置对齐，其他部分填充特殊的忽略标记（如 `-100`），以确保它们不影响损失计算。
max_length = 256
for entry in data_list:
    instruction = entry.get('instruction', '')
    output_text = entry.get('output', '')

    context = f"""下面是一个问题，运用医学知识来正确回答提问.\n问题:\n{instruction}\n"""
    target = f"回答:\n{output_text}"
    combined_input = context + target
    # 对输入文本进行编码
    encoded = tokenizer(combined_input, truncation=True, padding='max_length', max_length=max_length)
    input_ids = encoded['input_ids']
    attention_mask = encoded['attention_mask']
    
    #保证labels的长度跟Input_ids的一致
    labels = [-100] * len(context) 
    target = tokenizer(target, truncation=True, padding='max_length', max_length=max_length-len(labels))
    # 对输出文本进行编码作为labels
    labels = labels + target['input_ids']
#     print(len(labels))
#     print(labels)
    processed_data.append({
        'input_ids': input_ids,
        'attention_mask': attention_mask,
        'labels': labels
    })

# 转换为Dataset
dataset = Dataset.from_list(processed_data)   #
dataset.set_format(type='torch', columns=['input_ids', 'attention_mask', 'labels'])
dataset_dict = dataset.train_test_split(test_size=0.05)
# 获取训练集和测试集
train_dataset = dataset_dict['train']
test_dataset = dataset_dict['test']

print(len(train_dataset))
print(train_dataset)
print('数据集加载好了')

配置LoRA参数以及训练参数

# 配置lora
lora_config = LoraConfig(
    r=8,
    lora_alpha=16,  
    # 微调哪些模块，可以打印model看看有哪些模块
    target_modules=['q_proj', 'v_proj'], #, 'k_proj', 'o_proj','gate_proj','up_proj','down_proj'
    lora_dropout=0.1,
    bias='none',
    task_type=TaskType.CAUSAL_LM
)
model_lora = get_peft_model(model, lora_config)

training_args = TrainingArguments(
    output_dir='./results',
    #一种按多少步  一种按轮数，前期试验可以用步数，确定之后可以用轮数
    eval_strategy="epoch",#steps  epoch  
#     max_steps=50,
    num_train_epochs=5,#可以看看几轮效果最好，这个调参是个长期的过程
    learning_rate=1e-5,
    per_device_train_batch_size=4,
    per_device_eval_batch_size=4,
    weight_decay=0.01,
    remove_unused_columns=True,
    report_to="none",
    fp16=True,
    ddp_find_unused_parameters=True,
    gradient_accumulation_steps=16,
    save_strategy="no",  # 不保存检查点
    logging_steps=100, 
)

配置trainer开始训练

# 数据填充器
data_collator = DataCollatorWithPadding(tokenizer)

trainer = Trainer(
    model=model_lora,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    tokenizer=tokenizer,
    data_collator=data_collator
)
torch.cuda.empty_cache()
print('开始训练了')
trainer.train()

def count_parameters(model):
    trainable_params = sum(p.numel() for p in model.parameters() if p.requires_grad)
    frozen_params = sum(p.numel() for p in model.parameters() if not p.requires_grad)
    return trainable_params, frozen_params

model_lora = trainer.model

# 计算可训练参数和冻结参数数量
trainable_params, frozen_params = count_parameters(model_lora)

print(f"Total trainable parameters: {trainable_params}")
print(f"Total frozen parameters: {frozen_params}")
print('训练好了')

训练完了别忘了保存参数

保存的lora参数很小，用的时候再跟基准模型组合一下就可以了

# 保存LoRa参数
model_lora.save_pretrained("save/lora")
print('模型参数保存好了')

推理看看结果

messages = [
    {
        'role':'user',
        'content':"""一位60岁男性患者由于肝动脉瘤出现肝功能异常，具体表现为黄疸，该如何诊治？"""
    }
]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)

outputs = model_lora.generate(
    input_ids=input_ids,
    max_new_tokens=256,
    temperature=0.8,
)
response = outputs[0][input_ids.shape[-1]:]
print(tokenizer.decode(response, skip_special_tokens=True))

多训练几轮，可能效果会更好，我训练了3轮，感觉没多大变化。接下来尝试5-10轮。平均一轮要1h+。
在这里插入图片描述

疑难问题有哪些

例如：

模型的选择，选择一个适合的，比如我一开始想用原版的llama3，但这个微调效果肯定不会太好，所以在中文的版本下微调
输入模板的问题，刚开始以为就是输入：问，输出：答，但这样会导致模型输出有问题，如果还有问题就看链接
参数的问题，微调模块越多，那么batchsize或者max_length就要降低，因为只有2块T4毕竟免费的
有个问题自己还没弄懂，就是微调保存参数很小，都几十MB左右，看别人说微调后2、3、4个G（原模型16G），这是量化了吗？还不懂

小结

LoRA的优势在于其减少了需要微调的参数量，从而降低了计算成本和内存需求，同时保持了模型性能。现在还有QLoRA，但个人感觉这个参数不是很巨大的话LoRA已经够用了。只是个人见解，如有错误请大家指正，有问题可以在评论区联系我。

LLM挣扎学员

关注

40
点赞
踩
17

收藏

觉得还不错? 一键收藏
0
评论
LoRA微调中文版Llama3模型

LoRA（Low-Rank Adaptation）是一种微调大模型的方法，通过引入低秩矩阵来减少参数量和计算复杂度，主要应用于大型预训练语言模型的微调过程。LoRA的优势在于其减少了需要微调的参数量，从而降低了计算成本和内存需求，同时保持了模型性能。
复制链接

扫一扫