230530-论文整理-课题组2_decoder-only or encoder-decoder? interpreting lang-CSDN博客

本文链接：https://blog.csdn.net/Hekena/article/details/130954600

对这些研究有点兴趣颇微。

文章目录

Rethinking Dense Retrieval’s Few-Shot Ability
Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder
PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction
Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

Rethinking Dense Retrieval’s Few-Shot Ability

我们定制了一个标准的FewDR数据集和评估协议，用于少量密集的检索。该数据集是在维基百科语料库上构建的，包含41,420个样本，有60个细粒度的类别。
具体内容上，和其他的dense retrieval方法，没有感觉到有太大的不同。
在这里插入图片描述

Decoder-Only or Encoder-Decoder? Interpreting Language Model as a Regularized Encoder-Decoder

传统上，大部分seq2seq任务是由编码器-解码器框架解决的，它需要一个编码器来编码源序列，一个解码器来生成目标文本。

This paper aims to address this gap by conducting a detailed comparison between the encoder-decoder architecture and the decoder-only language model framework through the analysis of a regularized encoder-decoder structure.

问题矛盾点：
1.encoder-decoder模型结构相比于decoder-ONLY结构，哪个更有优势？
2.我们揭示了语言模型中的注意力退化问题，即随着生成步骤数的增加，越来越少的注意力被集中在源序列上。

在这里插入图片描述
traditional ED structure named as Regularized Encoder-Decoder (RED) framework

在这里插入图片描述

1.为了避免注意力退化的问题，提出了单向交叉注意，单向的交叉注意同时关注源矩阵和目标矩阵；
2.连续位置编码，在target序列中的位置编码和source序列中的位置编码是连续，而不是在target中从头开始排序。

PLOME: Pre-training with Misspelled Knowledge for Chinese Spelling Correction

语音和视觉相似性知识对这项任务很重要。 PLOME 利用 GRU 网络根据字符的语音和笔画对此类知识进行建模。

在这里插入图片描述
所提出的模型将每个字符的笔画和拼音作为输入，这使得 PLOME 能够对任意字符之间的相似性进行建模。
PLOME 通过联合恢复掩码标记的真实字符和语音来学习字符和语音级别的拼写错误知识。
模型结构图
在这里插入图片描述

we randomly mask some percentage of the input tokens and then recover them
mask 15% of tokens in the corpus. In addition, we use dynamic masking strategy
the final embedding of each character is the sum of character embedding, position embedding, phonic embedding and shape embedding

The probability of the character predicted for the i-th token in a given
sentence is defined as

在这里插入图片描述

The probability of pronunciation prediction
is defined as:

在这里插入图片描述
损失函数：

Read, Listen, and See: Leveraging Multimodal Information Helps Chinese Spell Checking

汉字中常见的错误类型如上文所述，一个是拼音，一个是字形。
在这里插入图片描述
模型结构图

The Semantic Encoder

The input tokens X = (x1, . . . , xN ) are first
projected into Ht0
through the input embedding.
Then the computation of Transformer (Vaswani
et al., 2017) encoder layers can be formulated as:

在这里插入图片描述

The Phonetic Encoder（拼音encoder）

 The 5 kinds of tones (take
the final “a” as an example, { a,¯ a,´ a,ˇ a, a ` }) can be
mapped into numbers {1, 2, 3, 4, 0}

The Character-level Encoder

a single-layer
uni-directional GRU (Cho et al., 2014), which encodes the pinyin of the i-th character xi as:

在这里插入图片描述
The Graphic Encoder

**fused module **
采用的gate机制实现的embedding的融合。

在这里插入图片描述