LLM的微调（finetuning）记录——基于huggingface的qlora及RAG文本生成模型实践（AI回复评论任务）。

最新推荐文章于 2024-08-03 01:32:56 发布

努力冲锋的阿东-

最新推荐文章于 2024-08-03 01:32:56 发布

阅读量313

点赞数 5

文章标签：人工智能 AIGC

本文链接：https://blog.csdn.net/qq_43700488/article/details/138352762

版权

本文介绍了如何在Colab上利用免费GPU进行大规模预训练模型（如TheBloke/Mistral-7B-Instruct-v0.2-GPTQ）的加载、分词、提示词设置、量化训练（QloRA）、轻量化（Peft）以及模型微调的过程。作者还分享了相关工具包的安装和使用方法，以及模型保存和加载的最佳实践。

摘要由CSDN通过智能技术生成

使用Colab免费GPU进行训练

一、安装必要的库

!pip install auto-gptq # 生成GPT提示
!pip install optimum # 提高模型训练效率
!pip install bitsandbytes # 显示模型内部结构、发现模型潜在问题
!pip install torch==2.1

####加载transformers、轻量级模型训练包peft、数据集等包
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import prepare_model_for_kbit_training
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
import transformers

二、从huggingface(提供了大量NLP相关的开源预训练模型)加载预训练模型

model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto", 
# automatically figures out how to best use CPU + GPU for loading model
                                             trust_remote_code=False, 
# prevents running custom model files on your machine
                                             revision="main") 
# which version of model to use in repo

预训练模型是很大的，而且需要科学上网才能下载。

三、加载分词器，测试基本预训练模型

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model.eval() # 评估模式 (评估模式不会用 dropout)
# 评论
comment = "你写的内容对我很有帮助,谢谢!"
prompt=f'''[INST] {comment} [/INST]'''
# tokenize input

inputs = tokenizer(prompt, return_tensors="pt")

# 生成 output/受到max_new_tokens的限制
outputs = model. Generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=140)

print(tokenizer.batch_decode(outputs)[0])

分词器相比预训练模型要小很多。

四、提示词设定

#####提示词可以使用中文或者英文，但是英文可能速度更快、效果也更好。
intstructions_string = f"""
TikaGPT,functioning as a virtual data science consultant on Social media, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–TikaGPT'. \
TikaGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.
Please translate all the final replies into appropriate Chinese
And please respond to the following comment.
"""
###lambda函数用于拼接提示词和comment
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment}\n[/INST]'''
prompt = prompt_template(comment)
print(prompt)

## 评估
# tokenize input
inputs = tokenizer(prompt, return_tensors="pt")

# generate output
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=140)

print(tokenizer.batch_decode(outputs)[0])

提示词也是自定义的，可以自己DIY，进行测试和训练。

五、准备训练模型：用到量化减少训练负担（qlora）

model.train() # model in training mode (dropout modules are activated)

# 模型检查点文件
model.gradient_checkpointing_enable()

# 量化 训练
model = prepare_model_for_kbit_training(model)

# LoRA 参数设定
config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# LoRA 模型再训练（peft，轻量化）

model = get_peft_model(model, config)

# 打印可训练参数的数目
model.print_trainable_parameters()

可以看出，相比于原模型，peft模型要训练的参数量大大降低。

六、加载数据、处理数据（此过程相对固定），

# load dataset
data = load_dataset("shawhin/shawgpt-youtube-comments")

# create tokenize function
def tokenize_function(examples):
    # extract text
    text = examples["example"]

    #tokenize and truncate text
    tokenizer.truncation_side = "left"
    tokenized_inputs = tokenizer(
        text,
        return_tensors="np",
        truncation=True,
        max_length=512
    )

    return tokenized_inputs

# tokenize training and validation datasets
tokenized_data = data. Map(tokenize_function, batched=True)

# setting pad token
tokenizer.pad_token = tokenizer.eos_token

# data collator
data_collator = transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)

用于再训练的数据集不算大。

七、fine-tuning模型

# 超参数设定
lr = 2e-4
batch_size = 4
num_epochs = 10

# 定义训练参数
training_args = transformers.TrainingArguments(
    output_dir= "Tika-ft",
    learning_rate=lr,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    weight_decay=0.01,
    logging_strategy="epoch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    gradient_accumulation_steps=4,
    warmup_steps=2,
    fp16=True,
    optim="paged_adamw_8bit",

)

参数都可以自己通过训练效果微调。

八、模型再训练

# configure trainer
trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["test"],
    args=training_args,
    data_collator=data_collator
)


# train model
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

# renable warnings
model.config.use_cache = True

可以看出训练的loss和验证集的loss都在下降，只是训练了30个回合，有条件的话可以多训练几个回合。10分钟训练了10个epoch。

九、将模型保存到huggingface中，并利用保存的模型进行实践

from huggingface_hub import notebook_login
notebook_login()

hf_name = 'Xiaodong1' # your hf username or org name
model_id = hf_name + "/" + "Tika-ft"

##加载微调后的模型
# # load model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

 model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
 model = AutoModelForCausalLM.from_pretrained(model_name,
                                              device_map="auto",
                                              trust_remote_code=False,
                                              revision="main")
config = PeftConfig.from_pretrained("Shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "Shawhin/shawgpt-ft")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

##使用：
comment = "What is fat-tailedness?"
prompt = prompt_template(comment)

model.eval()
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])

将模型保存至huggingface中，可方便加载和使用。

关于QloRA、RAG相关的内容可参加另一篇文章：
微调大型语言模型（LLM） |由 Shawhin Talebi |迈向数据科学 (towardsdatascience.com)

（包含了详细的视屏讲解和文档描述！是很好的资源。）

本文的内容参考自：GitHub: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/qlora