（21-7-02）基于Gemma 2B模型的智能文本摘要系统：微调（2）

最新推荐文章于 2024-07-05 14:37:41 发布

码农三叔

最新推荐文章于 2024-07-05 14:37:41 发布

阅读量820

点赞数 24

分类专栏：《NLP算法实战》大模型从入门到实战文章标签：深度学习人工智能自然语言处理语言模型 NLP langchain

本文链接：https://blog.csdn.net/asd343442/article/details/139236344

版权

大模型从入门到实战同时被 2 个专栏收录

169 篇文章 45 订阅

订阅专栏

《NLP算法实战》

127 篇文章 15 订阅

订阅专栏

（5）使用 Hugging Face中的库Transformers和库TRL（Training Reinforcement Learning）来设置和执行模型的微调训练工作，具体实现流程如下所示：

创建 SFTTrainer 实例：SFTTrainer 是一个用于训练的类，它将用于微调指定的模型。
指定模型：model 参数设置了要微调的模型，这个模型是之前通过参数高效微调和量化配置好的。
训练数据集：train_dataset 参数指定了用于训练的数据集，这里是 train_data，包含了格式化好的训练样本。
最大序列长度：max_seq_length=512 设置了模型输入的最大令牌数，超过这个长度的序列将被截断。
训练参数：使用 transformers.TrainingArguments 设置了训练过程中的参数。
PEFT 配置：peft_config=lora_config 应用了之前定义的 LoRA 配置，这是参数高效微调的一部分。
格式化函数：formatting_func=formatting_prompts_func 指定了如何格式化训练数据，这是将原始文本数据转换为模型可理解格式的函数。
执行训练：trainer.train() 启动了训练过程。

trainer = SFTTrainer(
    model=model,
    train_dataset=train_data,
    max_seq_length=512,
    args=transformers.TrainingArguments(
        per_device_train_batch_size=1,
        gradient_accumulation_steps=4,
        warmup_steps=2,
        max_steps=12,
        learning_rate=2e-4,
        fp16=True,
        logging_steps=1,
        report_to='none',
        output_dir='logs',
        optim="paged_adamw_8bit"
    ),
    peft_config=lora_config,
    formatting_func=formatting_prompts_func,
)
trainer.train()

上述代码的目的是微调一个预训练的语言模型，使其更好地适应特定的摘要生成任务。其中使用 transformers.TrainingArguments 设置了训练过程中如下参数：

per_device_train_batch_size=1：每个设备的训练批次大小。
gradient_accumulation_steps=4：梯度累积的步数，有助于模拟更大批次的训练。
warmup_steps=2：训练开始时的预热步数，用于逐渐增加学习率。
max_steps=12：训练的最大步数，用于控制训练的总时长。
learning_rate=2e-4：训练的学习率。
fp16=True：使用混合精度训练，以节省内存并加快训练速度。
logging_steps=1：每多少步记录一次训练日志。
report_to='none'：不向任何服务报告训练进度。
output_dir='logs'：输出目录，用于保存训练日志和模型权重。
optim="paged_adamw_8bit"：指定优化器，这里使用了适用于量化模型的优化器。

通过使用参数高效微调和量化技术，可以在资源受限的环境中进行有效的训练，同时保持模型的性能，这种微调方法特别适用于希望在特定数据集上改进模型性能的情况。

执行后会输出：

Map:   0%|          | 0/13368 [00:00<?, ? examples/s]
 [12/12 00:49, Epoch 0/1]
Step	Training Loss
1	3.514400
2	3.831700
3	3.750300
4	3.710400
5	3.267800
6	3.316900
7	3.217000
8	3.276300
9	2.719500
10	2.890400
11	2.966300
12	2.616700
TrainOutput(global_step=12, training_loss=3.256467600663503, metrics={'train_runtime': 54.5748, 'train_samples_per_second': 0.88, 'train_steps_per_second': 0.22, 'total_flos': 286748770713600.0, 'train_loss': 3.256467600663503, 'epoch': 0.0})

（6）将微调后的模型保存到指定的目录，保存模型后，可以轻松地重新加载模型进行推理或继续训练，而无需从头开始训练。这在实际应用中非常有用，尤其是当你需要部署模型或在不同的环境中共享模型时。

trainer.model.save_pretrained('lora_adapter')

（7）通过如下代码释放在深度学习模型训练和推理过程中使用的内存资源。首先，函数release_memory被用来释放与 trainer（训练器）、model（模型）和 tokenizer（分词器）相关联的内存，这些对象在完成训练或推理后可能仍然占用大量内存。接着，gc.collect()调用 Python 的垃圾回收机制，进一步清理程序中不再使用的对象，释放它们占用的内存。最后，torch.cuda.empty_cache()清空 PyTorch 在 CUDA 设备（如 GPU）上的缓存，为后续的计算任务腾出空间。

trainer, model, tokenizer = release_memory(trainer, model, tokenizer)
gc.collect()
torch.cuda.empty_cache()

这一系列操作有助于优化内存使用，避免内存泄漏，并确保系统资源得到有效管理，特别是在资源受限或需要运行多个模型训练任务时。

（8）加载一个基础的预训练模型，应用一个已保存的参数高效微调（PEFT）适配器，并最终保存合并后的模型。

base_model_name = "/input/gemma/transformers/2b-it/3"
adapter_model_name = "/working/lora_adapter"

model = AutoModelForCausalLM.from_pretrained(base_model_name, device_map='auto', torch_dtype=torch.float16)
model = PeftModel.from_pretrained(model, adapter_model_name, device_map='auto', torch_dtype=torch.float16)

model = model.merge_and_unload()
model.save_pretrained('final_model')

tokenizer = AutoTokenizer.from_pretrained(base_model_name)

通过这一系列步骤，可以获得一个经过特定任务微调并准备好用于部署或进一步训练的模型，可以一起使用保存的模型和分词器生成文本、进行推理或继续训练。

（9）加载一个经过微调的Gemma 2B模型，并创建一个Hugging Face文本生成管道。通过这个管道，可以利用经过微调的Gemma 2B模型来执行文本生成任务，如生成文章的摘要、续写文本或生成回复等。

model = "/working/final_model"

pipe_finetuned = pipeline(
    "text-generation",
    model=model,
    tokenizer=tokenizer,
    model_kwargs={"torch_dtype": torch.float16},
    device_map='auto',
    max_new_tokens=512
)

（10）使用前面创建的微调后的文本生成管道 pipe_finetuned 来生成文本，并以 Markdown 格式展示生成的文本。

outputs = pipe_finetuned(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_k=20,
    top_p=0.3,
    add_special_tokens=True
)

display(Markdown(outputs[0]["generated_text"][len(prompt):].replace('#', '')))

通过上述代码，可以在 Jupyter Notebook 或其他支持 Markdown 渲染的环境中以格式化的方式展示模型生成的文本摘要。执行后会输出：

（11）评估微调后的模型相比于基础模型在特定任务（生成简短摘要）上的表现，这样可以直观地看到微调是否对模型的性能产生了积极影响，从而决定是否在实际应用中部署微调后的模型。

messages = [
    {
        "role": "user",
        "content": "Write a short summary of 2-3 sentences of the following text:\n\n{}".format(writeup)
    }
]

prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)

# Using the previous pipeline with the previous model
outputs = pipe(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_k=20,
    top_p=0.3,
    add_special_tokens=True
)

display(Markdown(outputs[0]["generated_text"][len(prompt):].replace('#', '')))

prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False,

（12）下面介绍一篇论文的摘要信息，是一篇关于唇部和姿态识别的研究论文的摘要。

The paper proposes an ensemble of EfficientNet-B0, BERT, and DeBERTa models for lip and pose recognition.
The ensemble is trained on a single fold with a random split and then converted to TFlite format.
The EfficientNet-B0 model achieves a leaderboard score of 0.8, while the BERT and DeBERTa models achieve scores of 0.81 and 0.80, respectively.
The ensemble outperforms the individual models, with the EfficientNet-B0 model achieving the highest accuracy.
The paper also provides insights into the differences between different depthwise convolution implementations in TFlite.

上面的摘要概述了论文的主要贡献，包括提出的集成模型、模型的性能评估以及对深度卷积实现的分析。通过这些信息，读者可以快速了解论文的核心内容和研究成果。请看下面的代码，使用微调后的模型管道 pipe_finetuned 生成文本的摘要，并以 Markdown 格式展示结果。

outputs = pipe_finetuned(
    prompt,
    do_sample=True,
    temperature=0.1,
    top_k=20,
    top_p=0.3,
    add_special_tokens=True
)

display(Markdown(outputs[0]["generated_text"][len(prompt):].replace('#', '')))

执行后会输出下面的摘要信息，通过比较基础模型和微调模型生成的摘要，可以评估微调是否提高了模型在生成摘要任务上的性能，这有助于确定微调模型是否值得在特定任务中部署。

The text describes the training of an EfficientNet-B0 model and a BERT model on a dataset of lip and pose data. The EfficientNet-B0 model achieved a leaderboard score of 0.8, while the BERT model achieved a score of 0.81. The ensemble of these models achieved a score of 0.82.


The text also provides details about the data preprocessing, augmentation, and training process. It also discusses the differences between the EfficientNet-B0 and BERT models, and the reasons why the ensemble of these models achieved a higher score than either model alone.

本项目已完结：

（21-1）基于Gemma 2B模型的智能文本摘要系统：背景介绍+项目介绍-CSDN博客

（21-2）基于Gemma 2B模型的智能文本摘要系统：系统设置-CSDN博客

（21-3-01）基于Gemma 2B模型的智能文本摘要系统：聊天模块(1)聊天模板+提示工程-CSDN博客

（21-3-02）基于Gemma 2B模型的智能文本摘要系统：聊天模块(2)Few-Shot Prompting+关键参数总结-CSDN博客

（21-5）基于Gemma 2B模型的智能文本摘要系统：文本摘要-CSDN博客

（21-6-01）基于Gemma 2B模型的智能文本摘要系统：实验（1）-CSDN博客

（21-6-02）基于Gemma 2B模型的智能文本摘要系统：实验（2）-CSDN博客

（21-6-03）基于Gemma 2B模型的智能文本摘要系统：实验（3）-CSDN博客

（21-7-01）基于Gemma 2B模型的智能文本摘要系统：微调（1）-CSDN博客

码农三叔

关注

24
点赞
踩
27

收藏

觉得还不错? 一键收藏
打赏
0
评论
（21-7-02）基于Gemma 2B模型的智能文本摘要系统：微调（2）

（11）评估微调后的模型相比于基础模型在特定任务（生成简短摘要）上的表现，这样可以直观地看到微调是否对模型的性能产生了积极影响，从而决定是否在实际应用中部署微调后的模型。执行后会输出下面的摘要信息，通过比较基础模型和微调模型生成的摘要，可以评估微调是否提高了模型在生成摘要任务上的性能，这有助于确定微调模型是否值得在特定任务中部署。通过使用参数高效微调和量化技术，可以在资源受限的环境中进行有效的训练，同时保持模型的性能，这种微调方法特别适用于希望在特定数据集上改进模型性能的情况。
复制链接

扫一扫