【对比】快速理解BLEU和ROUGE区别

最新推荐文章于 2025-05-22 16:41:47 发布

TIM老师

最新推荐文章于 2025-05-22 16:41:47 发布

阅读量340

点赞数 8

文章标签： python 开发语言

本文链接：https://blog.csdn.net/AuGuSt_81/article/details/148117498

版权

场景：生成文本包含大量冗余或重复的 n-gram，覆盖了参考文本中的所有 n-gram，但自身存在大量无意义的重复。

参考文本：
"The cat sat on the mat."
生成文本：
"The cat cat sat on the mat mat."

分析：

ROUGE-2：
- 参考文本的 2-gram 为："The cat", "cat sat", "sat on", "on the", "the mat"。
- 生成文本覆盖了所有参考 2-gram（如 "The cat", "cat sat", "on the", "the mat"），因此 ROUGE-2 接近 1。
BLEU-2：
- 生成文本的 2-gram 为："The cat", "cat cat", "cat sat", "sat on", "on the", "the mat", "mat mat"。
- 其中只有 5 个匹配参考文本（"The cat", "cat sat", "sat on", "on the", "the mat"），但总共有 7 个 2-gram，因此 BLEU-2 ≈ 5/7 ≈ 0.71。
- 若生成文本进一步增加冗余（如 "The cat cat cat sat on the mat mat mat"），BLEU-2 会更低。

场景：生成文本非常简洁，精准匹配参考文本的 n-gram，但未覆盖参考文本的所有 n-gram。

参考文本：
"The quick brown fox jumps over the lazy dog."
生成文本：
"The quick fox jumps over the dog."

分析：

ROUGE-2：
- 参考文本的 2-gram 包括："quick brown", "brown fox", "fox jumps", "jumps over", "over the", "the lazy", "lazy dog"。
- 生成文本只覆盖了部分（如 "quick fox", "fox jumps", "jumps over", "over the", "the dog"），因此 ROUGE-2 ≈ 5/7 ≈ 0.71。
BLEU-2：
- 生成文本的 2-gram 为："The quick", "quick fox", "fox jumps", "jumps over", "over the", "the dog"。
- 所有 2-gram 都匹配参考文本，因此 BLEU-2 = 1。

指标	特点	示例一（ROUGE 高 BLEU 低）	示例二（ROUGE 低 BLEU 高）
ROUGE-2	基于召回率：衡量生成文本覆盖参考 n-gram 的比例。	冗余生成 → 覆盖所有参考 n-gram → 高 ROUGE。	精准但不完整 → 覆盖部分参考 n-gram → 低 ROUGE。
BLEU-2	基于精确率：衡量生成文本中匹配参考 n-gram 的比例。	冗余生成 → 匹配比例低 → 低 BLEU。	精准生成 → 匹配比例高 → 高 BLEU。

这种差异源于两者的计算目标不同：ROUGE 关注“生成内容是否包含参考内容”，而 BLEU 关注“生成内容是否精炼且匹配参考内容”。