Using the Output Embedding to Improve Language Models 阅读记录

最新推荐文章于 2024-03-14 11:02:42 发布

那时那月那人

最新推荐文章于 2024-03-14 11:02:42 发布

阅读量456

点赞数

分类专栏：论文解析文章标签： nlp

本文链接：https://blog.csdn.net/xiaoxu1025/article/details/111623629

版权

论文解析专栏收录该内容

7 篇文章

订阅专栏

论文研究了输入和输出嵌入在神经语言模型中的表现，并提出了权重绑定（weight tying）的概念。作者发现，在word2vec skip-gram模型中，输出嵌入略逊于输入嵌入，而在循环神经网络中则相反。通过绑定两者，模型的困惑度降低，性能得到提升。此外，当不使用dropout时，添加投影矩阵P并进行正则化也能改善模型。权重绑定还能显著减少神经翻译模型的参数数量而不牺牲性能。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

论文主要提出了一个weight tying概念

作者在introduction中提出模型输入有个input embedding U 输出有个output embedding V 两个矩阵维度same size 而且两者都可以作为word embedding

然后作者 compare the quality of the input embedding to that of the output embedding 然后提出了下面几个方式可以用来improve neural network language models

(i) We show that in the word2vec skip-gram model, the output embedding is only slightly inferior to the input embedding. This is shown using metrics that are commonly used in order to measure embedding quality.

传统的模型 word2vec和skip-gram输入embedding稍微好于output embedding (output embedding is only slightly inferior to the input embedding.)

(ii) In recurrent neural network based language models, the output embedding outperforms the input embedding.

在循环神经网络中 output embedding 优于 input embedding

(iii) By tying the two embeddings together, i.e., enforcing U = V , the joint embedding evolves in a more similar way to the output embedding than to the input embedding of the untied model.

基于第二点作者想到为什么不吧 input embedding 和 output embedding 用同一个embedding martrix来表示

(iv) Tying the input and output embeddings leads to an improvement in the perplexity of various language models. This is true both when using dropout or when not using it.

通过第三点作者发现绑定 input embedding 和 output embedding让模型更加好 perplexity有提升

(v) When not using dropout, we propose adding an additional projection P before V , and apply regularization to P .

如果不用dropout 可以添加一个投影矩阵P (embedding_size, embedding_size) 并加上正则化

(vi) Weight tying in neural translation models can reduce their size (number of parameters) to less than half of their original size without harming their performance.

通过绑定 input embedding和output embedding 模型参数减少而且不影响模型性能

最后作者对为什么weight tying会有效果给出了自己的解释