LLM的微调(finetuning)记录——基于huggingface的qlora及RAG文本生成模型实践(AI回复评论任务)。

本文介绍了如何在Colab上利用免费GPU进行大规模预训练模型(如TheBloke/Mistral-7B-Instruct-v0.2-GPTQ)的加载、分词、提示词设置、量化训练(QloRA)、轻量化(Peft)以及模型微调的过程。作者还分享了相关工具包的安装和使用方法,以及模型保存和加载的最佳实践。
摘要由CSDN通过智能技术生成

使用Colab免费GPU进行训练

一、安装必要的库

!pip install auto-gptq # 生成GPT提示
!pip install optimum # 提高模型训练效率
!pip install bitsandbytes # 显示模型内部结构、发现模型潜在问题
!pip install torch==2.1

####加载transformers、轻量级模型训练包peft、数据集等包
from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
from peft import prepare_model_for_kbit_training
from peft import LoraConfig, get_peft_model
from datasets import load_dataset
import transformers

二、从huggingface(提供了大量NLP相关的开源预训练模型)加载预训练模型

model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
model = AutoModelForCausalLM.from_pretrained(model_name,
                                             device_map="auto", 
# automatically figures out how to best use CPU + GPU for loading model
                                             trust_remote_code=False, 
# prevents running custom model files on your machine
                                             revision="main") 
# which version of model to use in repo

预训练模型是很大的,而且需要科学上网才能下载。

三、加载分词器,测试基本预训练模型

tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)
model.eval() # 评估模式 (评估模式不会用 dropout)
# 评论
comment = "你写的内容对我很有帮助,谢谢!"
prompt=f'''[INST] {comment} [/INST]'''
# tokenize input

inputs = tokenizer(prompt, return_tensors="pt")

# 生成 output/受到max_new_tokens的限制
outputs = model. Generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=140)

print(tokenizer.batch_decode(outputs)[0])

分词器相比预训练模型要小很多。

四、提示词设定

#####提示词可以使用中文或者英文,但是英文可能速度更快、效果也更好。
intstructions_string = f"""
TikaGPT,functioning as a virtual data science consultant on Social media, communicates in clear, accessible language, escalating to technical depth upon request. \
It reacts to feedback aptly and ends responses with its signature '–TikaGPT'. \
TikaGPT will tailor the length of its responses to match the viewer's comment, providing concise acknowledgments to brief expressions of gratitude or feedback, \
thus keeping the interaction natural and engaging.
Please translate all the final replies into appropriate Chinese
And please respond to the following comment.
"""
###lambda函数用于拼接提示词和comment
prompt_template = lambda comment: f'''[INST] {intstructions_string} \n{comment}\n[/INST]'''
prompt = prompt_template(comment)
print(prompt)

## 评估
# tokenize input
inputs = tokenizer(prompt, return_tensors="pt")

# generate output
outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=140)

print(tokenizer.batch_decode(outputs)[0])

提示词也是自定义的,可以自己DIY,进行测试和训练。

五、准备训练模型:用到量化减少训练负担(qlora)

model.train() # model in training mode (dropout modules are activated)

# 模型检查点文件
model.gradient_checkpointing_enable()

# 量化 训练
model = prepare_model_for_kbit_training(model)

# LoRA 参数设定
config = LoraConfig(
    r=8,
    lora_alpha=32,
    target_modules=["q_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)

# LoRA 模型再训练(peft,轻量化)

model = get_peft_model(model, config)

# 打印可训练参数的数目
model.print_trainable_parameters()

可以看出,相比于原模型,peft模型要训练的参数量大大降低。

六、加载数据、处理数据(此过程相对固定),

# load dataset
data = load_dataset("shawhin/shawgpt-youtube-comments")

# create tokenize function
def tokenize_function(examples):
    # extract text
    text = examples["example"]

    #tokenize and truncate text
    tokenizer.truncation_side = "left"
    tokenized_inputs = tokenizer(
        text,
        return_tensors="np",
        truncation=True,
        max_length=512
    )

    return tokenized_inputs

# tokenize training and validation datasets
tokenized_data = data. Map(tokenize_function, batched=True)

# setting pad token
tokenizer.pad_token = tokenizer.eos_token

# data collator
data_collator = transformers.DataCollatorForLanguageModeling(tokenizer, mlm=False)

用于再训练的数据集不算大。

七、fine-tuning模型

# 超参数设定
lr = 2e-4
batch_size = 4
num_epochs = 10

# 定义训练参数
training_args = transformers.TrainingArguments(
    output_dir= "Tika-ft",
    learning_rate=lr,
    per_device_train_batch_size=batch_size,
    per_device_eval_batch_size=batch_size,
    num_train_epochs=num_epochs,
    weight_decay=0.01,
    logging_strategy="epoch",
    evaluation_strategy="epoch",
    save_strategy="epoch",
    load_best_model_at_end=True,
    gradient_accumulation_steps=4,
    warmup_steps=2,
    fp16=True,
    optim="paged_adamw_8bit",

)

参数都可以自己通过训练效果微调。

八、模型再训练

# configure trainer
trainer = transformers.Trainer(
    model=model,
    train_dataset=tokenized_data["train"],
    eval_dataset=tokenized_data["test"],
    args=training_args,
    data_collator=data_collator
)


# train model
model.config.use_cache = False  # silence the warnings. Please re-enable for inference!
trainer.train()

# renable warnings
model.config.use_cache = True

可以看出训练的loss和验证集的loss都在下降,只是训练了30个回合,有条件的话可以多训练几个回合。10分钟训练了10个epoch。

九、将模型保存到huggingface中,并利用保存的模型进行实践

from huggingface_hub import notebook_login
notebook_login()

hf_name = 'Xiaodong1' # your hf username or org name
model_id = hf_name + "/" + "Tika-ft"

##加载微调后的模型
# # load model from hub
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM

 model_name = "TheBloke/Mistral-7B-Instruct-v0.2-GPTQ"
 model = AutoModelForCausalLM.from_pretrained(model_name,
                                              device_map="auto",
                                              trust_remote_code=False,
                                              revision="main")
config = PeftConfig.from_pretrained("Shawhin/shawgpt-ft")
model = PeftModel.from_pretrained(model, "Shawhin/shawgpt-ft")

# load tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_name, use_fast=True)

##使用:
comment = "What is fat-tailedness?"
prompt = prompt_template(comment)

model.eval()
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(input_ids=inputs["input_ids"].to("cuda"), max_new_tokens=280)
print(tokenizer.batch_decode(outputs)[0])

将模型保存至huggingface中,可方便加载和使用。

关于QloRA、RAG相关的内容可参加另一篇文章:
微调大型语言模型 (LLM) |由 Shawhin Talebi |迈向数据科学 (towardsdatascience.com)

(包含了详细的视屏讲解和文档描述!是很好的资源。)

本文的内容参考自:GitHub: https://github.com/ShawhinT/YouTube-Blog/tree/main/LLMs/qlora

评论 4
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值