【Datawhale 大模型基础】第十章 大模型的 Adaptation

第十章 大模型的 Adaptation

This blog is based on datawhale files and a nice survey.

Following pre-training, LLMs can develop general capabilities for addressing various tasks. However, an increasing body of research indicates that the abilities of LLMs can be further tailored to specific objectives. This blog will present two primary methods for adapting pre-trained LLMs: instruction tuning and alignment tuning. The former primarily seeks to enhance or unlock the capabilities of LLMs, while the latter aims to align the behaviors of LLMs with human values or preferences. Additionally, this blog will also explore efficient tuning and quantization for model adaptation in resource-constrained environments. And this topic contains so much knowledge that for further study, can see the survey.

10.1 Instruction Tuning

Essentially, instruction tuning involves fine-tuning pre-trained LLMs using a set of formatted instances in natural language, which is closely related to supervised fine-tuning and multi-task prompted training. To carry out instruction tuning, the first step is to gather or create instances formatted as instructions. These formatted instances are then used to fine-tune LLMs in a supervised learning manner, such as training with sequence-to-sequence loss. Following instruction tuning, LLMs can exhibit enhanced abilities to generalize to unseen tasks, even in multilingual settings.

And Instruction Tuning contains:

  • Formatted Instance Construction

    • Formatting NLP Task Datasets
    • Formatting Daily Chat Data
    • Formatting Synthetic Data
    • Key Factors for Instance Construction
      • Scaling the instructions
      • Formatting design
  • Instruction Tuning Strategies

    • Balancing the Data Distribution
    • Combining Instruction Tuning and Pre-Training
    • Multi-stage Instruction Tuning
    • Other Practical Tricks
      • Efficient training for multi-turn chat data
      • Establishing self-identification for LLM
  • The Effect of Instruction Tuning

    • Performance Improvement
    • Task Generalization
    • Domain Specialization
  • Empirical Analysis for Instruction Tuning

    • Task-specific instructions are better suited for the QA environment but may not be applicable in a chat context
    • Increasing the intricacy and variety of instructions results in enhanced model performance
    • A larger model size results in improved performance in following instructions

10.2 Alignment Tuning

This section initially provides an overview of alignment, including its definition and criteria, then delves into the acquisition of human feedback data for aligning LLMs, and ultimately explores the pivotal technique of reinforcement learning from human feedback (RLHF) for alignment tuning.

Alignment Tuning contains:

  • Alignment Criteria

    • Helpfulness
    • Honesty
    • Harmlessness
  • Collecting Human Feedback

    • Human Labeler Selection
    • Human Feedback Collection
      • Ranking-based approach
      • Question-based approach
      • Rule-based approach
  • Reinforcement Learning from Human Feedback
    在这里插入图片描述

10.3 Parameter-Efficient Fine-Tuning

In prior research, there has been significant focus on parameter-efficient fine-tuning, which seeks to minimize the number of trainable parameters while maintaining optimal performance.

在这里插入图片描述

  • Adapter Tuning
  • Prefix Tuning
  • Prompt Tuning
  • Low-Rank Adaptation (LoRA)

10.4 Memory-Efficient Model Adaptation

Because of the substantial quantity of model parameters, LLMs require a significant memory footprint for inference, rendering deployment in real-world applications costly.

  • Post-Training Quantization (PTQ)
    • Mixed-precision decomposition
    • Fine-grained quantization
    • Balancing the quantization difficulty
    • Layerwise quantization
  • Other Quantization Methods
    • Efficient fine-tuning enhanced quantization
    • Quantization-aware training (QAT) for LLMs
  • Important Findings from Existing Work
    • INT8 weight quantization frequently produces excellent results for LLMs, whereas the effectiveness of lower precision weight quantization relies on specific methods.
    • Quantizing activations is more challenging than quantizing weights
    • Efficient fine-tuning enhanced quantization is a good option to enhance the performance of quantized LLMs

In the end, I collect some surveys about this topic, readers interested in this field can further read:

DomainTitlePaper URLProject URLRelease Month
Instruction TuningAre Prompts All the Story? No. A Comprehensive and Broader View of Instruction Learninghttps://arxiv.org/pdf/2303.10475.pdfhttps://github.com/RenzeLou/awesome-instruction-learning2023.03
Instruction TuningInstruction Tuning for Large Language Models: A Surveyhttps://arxiv.org/pdf/2308.10792.pdfNone2023.08
Instruction TuningVision-Language Instruction Tuning: A Review and Analysishttps://arxiv.org/pdf/2311.08172.pdfhttps://github.com/palchenli/VL-Instruction-Tuning2023.11
Human Alignment for LLMAligning Large Language Models with Human: A Surveyhttps://arxiv.org/pdf/2307.12966.pdfhttps://github.com/GaryYufei/AlignLLMHumanSurvey2023.07
Human Alignment for LLMFrom Instructions to Intrinsic Human Values – A Survey of Alignment Goals for Big Modelhttps://arxiv.org/pdf/2308.12014.pdfhttps://github.com/ValueCompass/Alignment-Goal-Survey2023.08
Human Alignment for LLMLarge Language Model Alignment: A Surveyhttps://arxiv.org/pdf/2309.15025.pdfNone2023.09
Human Alignment for LLMAI Alignment: A Comprehensive Surveyhttps://arxiv.org/pdf/2310.19852https://www.alignmentsurvey.com/2023.10
Efficient LLMsThe Efficiency Spectrum of Large Language Models: An Algorithmic Surveyhttps://arxiv.org/pdf/2310.10844.pdfhttps://github.com/tding1/Efficient-LLM-Survey2023.12
Efficient LLMsEfficient Large Language Models: A Surveyhttps://arxiv.org/pdf/2312.03863https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey2023.12

END

  • 18
    点赞
  • 22
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值