Encoder&Decoder 结构—T5：统一文本生成框架

最新推荐文章于 2025-03-27 15:45:49 发布

DoYangTan

最新推荐文章于 2025-03-27 15:45:49 发布

阅读量915

点赞数 13

分类专栏： EncoderDecoder结构系列文章标签：人工智能计算机视觉自然语言处理

本文链接：https://blog.csdn.net/Azperk/article/details/145897485

版权

EncoderDecoder结构系列专栏收录该内容

4 篇文章

订阅专栏

Encoder&Decoder 结构—T5：统一文本生成框架

1. 引言

T5（Text-to-Text Transfer Transformer） 是 Google 研究团队提出的一种 统一的文本生成模型，它将 所有 NLP 任务都转换为文本到文本的形式。T5 采用 Encoder-Decoder 结构，并通过 大规模预训练 实现了在多个 NLP 任务上的领先性能。

2. T5 简介

T5 论文 “Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer” 提出了 将所有 NLP 任务转换为文本生成问题，其核心思想是：

所有任务输入都是文本，输出也是文本，无论是分类、翻译、摘要还是问答任务。
采用标准的 Encoder-Decoder Transformer 结构，相比于 BERT 只使用 Encoder，T5 具有更强的生成能力。
使用 C4 数据集进行大规模预训练，然后在不同任务上进行微调。

3. T5 关键技术

3.1 统一的文本生成框架

T5 的最大特点是将所有 NLP 任务 转换为文本到文本的形式，具体示例如下：

任务	输入示例	输出示例
机器翻译	`translate English to French: How are you?`	`Comment ça va?`
文本摘要	`summarize: The article discusses...`	`Main idea of the article is...`
语法纠正	`grammar correction: He go to school.`	`He goes to school.`
问答	`question: Who is the president of USA?`	`Joe Biden.`

3.2 Pretraining + Fine-tuning

T5 采用 自监督学习（Self-Supervised Learning） 进行预训练，其目标类似于 BERT 的 Masked Language Modeling（MLM），但 T5 使用 Span Corruption，即 随机删除文本片段，并要求模型填充缺失部分。

预训练阶段：
- 训练数据：C4（Colossal Clean Crawled Corpus）
- 任务：Span Corruption（类似填空任务）
微调阶段：
- 适配具体任务（如翻译、摘要、QA 等）

3.3 Encoder-Decoder Transformer 结构

T5 采用 标准的 Transformer 编码器-解码器架构，其结构包括：

Encoder（编码器）：处理输入文本，生成上下文表示。
Decoder（解码器）：根据编码信息和过去的预测，逐步生成目标文本。

相比于 GPT（只有 Decoder）和 BERT（只有 Encoder），T5 具有更灵活的 文本理解+生成能力。

4. T5 代码示例

我们可以使用 Hugging Face 的 transformers 库加载 T5 进行文本生成。

from transformers import T5Tokenizer, T5ForConditionalGeneration

# 加载 T5 模型和分词器
tokenizer = T5Tokenizer.from_pretrained("t5-small")
model = T5ForConditionalGeneration.from_pretrained("t5-small")

# 任务示例：文本摘要
input_text = "summarize: The article discusses the effects of climate change..."
input_ids = tokenizer(input_text, return_tensors="pt").input_ids

# 生成摘要
output_ids = model.generate(input_ids, max_length=50)
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print("生成的摘要:", summary)

5. T5 与 BERT、GPT 对比

特性	BERT	GPT	T5
模型结构	Encoder-only	Decoder-only	Encoder-Decoder
训练目标	Masked Language Model	自回归语言模型	Span Corruption
适用任务	分类、NER、QA	生成任务	分类、生成、翻译、摘要等
生成能力	无	强	强
训练数据	Wikipedia + Books	WebText	C4