peft模型微调_Prefix-tuning

Prefix-Tuning是一种针对预训练语言模型(如GPT系列或BERT等)的参数微调方法,由Google团队在2020年提出。与传统的Fine-tuning方式不同,Prefix-Tuning不是对模型的所有参数进行更新,而只是在每个Transformer层的输入序列前添加一个可学习的“prefix”向量序列,仅对这些prefix向量进行优化以适应特定任务。

在实际应用中,对于不同的下游任务,只需训练一组特定的prefix向量,而无需改变模型原有的参数。这种方式既能利用预训练模型强大的语言理解能力,又可以有效缓解过拟合问题,减少对大量任务数据的需求,同时也能提高模型的泛化能力和多任务处理效率。

在微调过程中,主要步骤如下:

1、为每层Transformer添加一个可学习的prefix向量。
使用下游任务的数据集进行训练,优化的目标是使得添加了prefix向量后模型的输出能够更好地匹配任务的真实标签。

2、训练完成后,保存最优的prefix向量,然后在预测阶段将它们插入到对应层的输入序列前端,与原始输入文本一起输入到预训练模型中,从而得到任务相关的预测结果。

下面简单介绍通过peft使用prefix-tuning对模型进行微调的简单流程。

# 基于peft使用Prefix tuning对生成式对话模型进行微调 
from datasets import Dataset
from transformers import AutoTokenizer, AutoModelForCausalLM, DataCollatorForSeq2Seq, TrainingArguments, Trainer
# 加载数据集
ds = Dataset.load_from_disk("../alpaca_data_zh")
print(ds[:3])
# 数据集处理
tokenizer = AutoTokenizer.from_pretrained("../models/bloom-1b4-zh")
# 数据处理函数
def process_func(example):
    MAX_LENGTH = 256
    input_ids, attention_mask, labels = [], [], []
    instruction = tokenizer("\n".join(["Human: " + example["instruction"], example["input"]]).strip() + "\n\nAssistant: ")
    response = tokenizer(example["output"] + tokenizer.eos_token)
    input_ids = instruction["input_ids"] + response["input_ids"]
    attention_mask = instruction["attention_mask"] + response["attention_mask"]
    labels = [-100] * len(instruction["input_ids"]) + response["input_ids"]
    if len(input_ids) > MAX_LENGTH:
        input_ids = input_ids[:MAX_LENGTH]
        attention_mask = attention_mask[:MAX_LENGTH]
        labels = labels[:MAX_LENGTH]
    return {
        "input_ids": input_ids,
        "attention_mask": attention_mask,
        "labels": labels
    }
# 数据处理
tokenized_ds = ds.map(process_func, remove_columns=ds.column_names)
tokenized_ds
# 创建模型
model = AutoModelForCausalLM.from_pretrained("../models/bloom-1b4-zh", low_cpu_mem_usage=True)
from peft import PrefixTuningConfig, get_peft_model, TaskType
# 配置微调参数
config = PrefixTuningConfig(task_type=TaskType.CAUSAL_LM, 
                            num_virtual_tokens=10, 
                            prefix_projection=True)
# 创建模型
model = get_peft_model(model, config)

# 查看微调参数
model.print_trainable_parameters()
# 配置训练参数
args = TrainingArguments(
    output_dir="./chatbot",
    per_device_train_batch_size=1,
    gradient_accumulation_steps=8,
    logging_steps=10,
    num_train_epochs=1
)

# 创建训练器
trainer = Trainer(
    model=model,
    args=args,
    train_dataset=tokenized_ds,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer, padding=True),
)
# 模型训练
trainer.train()
# 模型推理
model = model.cuda()
ipt = tokenizer("Human: {}\n{}".format("周末去重庆怎么玩?", "").strip() + "\n\nAssistant: ", return_tensors="pt").to(model.device)
res = tokenizer.decode(model.generate(**ipt, max_length=128, do_sample=True)[0], skip_special_tokens=True)
print(res)
### LLaMA Model PEFT Parameter Efficient Fine-Tuning Guide Parameter-efficient fine-tuning (PEFT) methods allow models like LLaMA to be adapted for specific tasks without requiring the retraining of all parameters, thus saving computational resources and time. One approach that has gained attention is UniPELT, which integrates LoRa, Prefix-Tuning, and Adapters with a gating mechanism to achieve parameter efficiency during micro-adjustments[^2]. For implementing PEFT on an LLaMA model specifically: #### Selecting Appropriate Techniques Choosing between Adapter-based tuning, Low-Rank Adaptation (LoRA), or Prefix Tuning depends largely upon the application scenario as well as available hardware constraints. Each technique offers unique advantages when it comes to optimizing performance versus resource usage. #### Applying LoRA Methodology Low-rank adaptation involves modifying only certain parts of pre-trained weights by adding low-rank matrices while keeping others frozen. This method reduces memory footprint significantly compared to full network training but still manages to capture task-specific information effectively[^1]. ```python from peft import get_peft_model, LoraConfig config = LoraConfig( r=8, lora_alpha=32, target_modules=["q_proj", "v_proj"], lora_dropout=0.05, ) model = get_peft_model(model, config) ``` #### Utilizing Adapter Mechanisms Adapter modules are inserted into existing layers within transformer architectures such as those found in LLaMA models. These lightweight structures learn transformations applied directly onto hidden states rather than altering original weight values entirely. Consequently, this leads to faster convergence times alongside reduced storage requirements since adapters typically contain far fewer trainable elements relative to their base counterparts. #### Implementing Prefix Tuning Strategy Prefix tuning focuses on introducing additional tokens at input sequences' beginnings before passing them through encoders/decoders. By doing so, these prefixes act similarly to prompts guiding subsequent generation processes towards desired outcomes based on learned patterns from datasets used during adjustment phases[^3].
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

灯下夜无眠

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值