大语言模型生成式AI学习笔记——2. 1.5LLM指令微调——缩放指令模型

Scaling instruct models(缩放指令模型)

This paper introduces FLAN (Fine-tuned LAnguage Net), an instruction finetuning method, and presents the results of its application. The study demonstrates that by fine-tuning the 540B PaLM model on 1836 tasks while incorporating Chain-of-Thought Reasoning data, FLAN achieves improvements in generalization, human usability, and zero-shot reasoning over the base model. The paper also provides detailed information on how each these aspects was evaluated.

Here is the image from the lecture slides that illustrates the fine-tuning tasks and datasets employed in training FLAN. The task selection expands on previous works by incorporating dialogue and program synthesis tasks from Muffin and integrating them with new Chain of Thought Reasoning tasks. It also includes subsets of other task collections, such as T0 and Natural Instructions v2. Some tasks were held-out during training, and they were later used to evaluate the model's performance on unseen tasks.

本文介绍了FLAN(Fine-tuned LAnguage Net),一种指令微调方法,并展示了其应用的结果。研究表明,通过在1836个任务上对540B PaLM模型进行微调,同时结合思维链推理数据,FLAN在泛化能力、人类可用性和零样本推理方面取得了比基础模型更好的改进。论文还提供了关于这些方面如何评估的详细信息。

以下是来自讲座幻灯片的图片,展示了训练FLAN时使用的微调任务和数据集。任务选择扩展了之前的工作,通过纳入Muffin的对话和程序合成任务,并将其与新的思维链推理任务整合。它还包括其他任务集合的子集,如T0和Natural Instructions v2。一些任务在训练过程中被保留,后来用于评估模型在未见任务上的性能。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值