Re33：读论文 Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Languag

最新推荐文章于 2024-05-09 08:15:00 发布

诸神缄默不语

最新推荐文章于 2024-05-09 08:15:00 发布

阅读量1.3k

点赞数

分类专栏：人工智能学习笔记文章标签： prompt 自然语言处理 NLP 预训练语言模型提示学习

本文链接：https://blog.csdn.net/PolarisRisingWar/article/details/126960113

版权

人工智能学习笔记专栏收录该内容

245 篇文章 272 订阅

订阅专栏

诸神缄默不语-个人CSDN博文目录

论文名称：Pre-train, Prompt, and Predict: A Systematic Survey of Prompting Methods in Natural Language Processing
ArXiv下载地址：https://arxiv.org/abs/2107.13586
ACM Computing Surveys官方下载地址：https://dl.acm.org/doi/abs/10.1145/3560815
（我看的是ArXiv版）

官网：Pretrain Language Models
（原文中的meta analysis部分我就不截图了）

这篇prompt综述在NLP领域应该挺出名的，我记得当年好几个微信公众号争相报道啊。
作者来自卡耐基梅隆大学。
最近发现2022年ACM Computing Surveys刚接收了这篇综述，看了一下页数比ArXiv版的还少，我就还是继续用ArXiv版的来写笔记了。

本文介绍了prompt的统一定义和当前所用的方法。

1. 什么是prompt-based learning

传统有监督学习根据输入 $x$ 预测输出 $y$ 的概率 $P(y|x;\theta)$
（ $\theta$ 是模型参数）
label
label set

prompt-based learning直接基于预训练语言模型建模文本概率：
输入 $x$
template/prompting function（有2个slot，一个填输入，一个用来输出结果）
用template将 $x$ 处理为textual string prompt $x^{'}$ （将 $x$ 填进template）（包含了一些unfilled slots）
用语言模型根据概率填补unfilled slots，得到final string $\hat{x}$
通过 $\hat{x}$ 得到最终输出 $y$

示例：

推文：I missed the bus today.
预测情感的话就在后面加上：I felt so ____
翻译的 $x^{'}$ ：English: I missed the bus today. French: _____

术语列表：
在这里插入图片描述

子任务示例：
在这里插入图片描述

在这里插入图片描述

优势：可以直接应用于小样本甚至零样本学习上

1.1 Prompt Addition

slot在template中间叫cloze prompt，在尾部叫prefix prompt
template不一定要是自然语言tokens，也可以是假词（也能嵌入到连续向量）或者直接就是连续向量
slots数不固定

1.2 Answer Search

找到得分最高的 $\hat{z}$

$Z$ ： $z$ 的取值范围

基于
在这里插入图片描述
计算

argmax search或sampling

1.3 Answer Mapping

将 $\hat{z}$ 转换为 $\hat{y}$

2. NLP学习范式的变迁

Fully supervised learning：传统机器学习范式
为了向模型提供合适的inductive bias，早期NLP模型依赖特征工程，神经网络出现后依赖architecture engineering。
在这一阶段出现了少量预训练模型（如word2vec和GloVe），但只占模型参数的一小部分。
pre-train and fine-tune
预训练固定结构的模型（语言模型LM），用以预测未观测到的文本数据的结果。
依赖objective engineering。
不利于探索模型架构：1. 无监督预训练使structural priors选择范围小。2. 测试不同结构的预训练代价太高。
pre-train, prompt, and predict
通过引入文本prompt，下游任务与预训练模型更相似。可以直接不训练。
依赖prompt engineering。

在这里插入图片描述

3. Design Considerations for Prompting

在这里插入图片描述

3.1 Pre-trained Model Choice

对本文预训练模型介绍部分的笔记放在了另一篇博文中：预训练语言模型概述（持续更新ing…）

训练目标的选择取决于对特定prompting任务的适配，如left-to-right AR LMs适用于prefix prompts，reconstruction目标适用于cloze prompts。标准LM和FTR目标更适宜于文本生成任务。

prefix LM和encoder-decoder架构自然适用于文本生成任务，但也可以根据prompt修改得适用于其他任务。

3.2 Prompt Engineering

prompt template engineering→首先选择prompt shape，接下来考虑用manual or automated的方式

Prompt Shape
cloze prompts VS. prefix prompts
Manual Template Engineering
Automated Template Learning
1. discrete prompts / hard prompts：文本（其实这一部分总容易让我联想到传统NLG使用模板/规则的方法，本文参考文献里还真的有Re3Sum¹，但是似乎在正文中没有引用过）
  1. Prompt Mining：从语料库中挖掘
  2. Prompt Paraphrasing：复述已有的seed prompt
  3. Gradient-based Search
  4. Prompt Generation：直接视作文本生成任务
  5. Prompt Scoring
2. continuous prompts / soft prompts：LM嵌入域的向量
  1. Prefix Tuning
    $M_\phi$ ：可训练的prefix matrix
    $\theta$ ：fixed pre-trained LM参数
    
    如时间步在prefix内，直接从 $M_\phi$ 中复制；否则用预训练模型计算。
    （后文具体介绍有些没看懂，略）
  2. Tuning Initialized with Discrete Prompts
  3. Hard-Soft Prompt Hybrid Tuning
3. static
4. dynamic

3.3 Answer Engineering

包括对 $Z$ 和mapping function的设计

answer shape：粒度
1. tokens
2. spans：常用于 cloze prompts
3. sentence：常用于 prefix prompts
answer design method
1. Manual Design
  1. Unconstrained Spaces：所有可选填入项，往往直接将answer $z$ 匹配到 $y$
  2. Constrained Spaces
2. Discrete Answer Search
  1. Answer Paraphrasing：初始化 answer space $\mathcal{Z}'$ （后面的没看懂）
  2. Prune-then-Search
    $y \to z$ ：verbalizer（后面的没看懂）
  3. Label Decomposition：关系抽取
    关系：
    分解后的标签：
    answer span的概率是每个token概率的总和
3. Continuous Answer Search：略

3.4 Multi-Prompt Learning

在这里插入图片描述

Prompt Ensembling：连续prompts可能是通过不同初始化或随机种子学到的
1. Uniform averaging
2. Weighted averaging
3. Majority voting
4. Knowledge distillation
5. Prompt ensembling for text generation：逐token ensemble：
  
  ²
Prompt Augmentation / demonstration learning：细节略
提供answered prompts来类比（学习重复的模式）
1. Sample Selection
2. Sample Ordering
Prompt Composition
Prompt Decomposition

3.5 Training Strategies for Prompting Methods / Prompt-based Training Strategies

Training Settings
不用训练：zero-shot setting（非真，详细略）
full-data learning
few-shot learning
Parameter Update Methods
1. Promptless Fine-tuning：pre-train and fine-tune strategy
  问题是容易过拟合或不鲁棒，容易灾难性遗忘
2. Tuning-free Prompting
  可以用answered prompts增强输入：in-context learning
3. Fixed-LM Prompt Tuning：缺点略
4. Fixed-prompt LM Tuning
  具体细节略
  null prompt
5. Prompt+LM Tuning：优缺点略

4. 应用

在这里插入图片描述

具体的论文列表略。

Knowledge Probing
1. Factual Probing / fact retrieval：计算预训练模型的表征包含多少事实知识，关注对模板的学习
2. Linguistic Probing
Classification-based Tasks：如以slot filling的形式实现
1. Text Classification：常用cloze prompts, prompt engineering + answer engineering, few-shot, fixed-prompt LM Tuning
2. Natural Language Inference (NLI)：常用cloze prompts，prompt engineering关注少样本学习场景下的template search。answer spaces常从词表中手动提前选好。
Information Extraction：细节略
1. Relation Extraction
2. Semantic Parsing
3. Named Entity Recognition (NER)
“Reasoning” in NLP：细节略
1. Commonsense Reasoning
2. Mathematical Reasoning
Mathematical Reasoning
extractive QA
multiple-choice QA
free-form QA
Text Generation：其他细节略
prefix prompts + AR预训练语言模型：文本摘要、机器翻译
in-context learning
fixed-LM prompt tuning：data-to-text generation
Automatic Evaluation of Text Generation：建模成文本生成任务（套娃是吧）
Multi-modal Learning
Meta-Applications
1. Domain Adaptation（感觉看起来有点像文本风格迁移，所以文本风格迁移应该也有用prompt来做的工作吧？）
2. Debiasing
3. Dataset Construction

数据集：
在这里插入图片描述

在这里插入图片描述

5. Prompt-relevant Topics

在这里插入图片描述

Ensemble Learning VS. prompt ensembling
Few-shot Learning
Prompt augmentation / priming-based few-shot learning
Larger-context Learning
Query Reformulation
QA-based Task Formulation
Controlled Generation
Supervised Attention
Data Augmentation

6. Challenges

Prompt Design
1. Tasks beyond Classification and Generation
2. Prompting with Structured Information
3. Entanglement of Template and Answer
Answer Engineering
1. Many-class and Long-answer Classification Tasks
2. Multiple Answers for Generation Tasks
Selection of Tuning Strategy
Multiple Prompt Learning
1. Prompt Ensembling
2. Prompt Composition and Decomposition
3. Prompt Augmentation
4. Prompt Sharing
Selection of Pre-trained Models
Theoretical and Empirical Analysis of Prompting
Transferability of Prompts
Combination of Different Paradigms
Calibration of Prompting Methods
概率预测？这部分没看懂这个术语实际上是什么意思？指的是一种模型对某方面的预测倾向，通过一些方式来进行修正吗？