论文笔记--Deep contextualized word representations

最新推荐文章于 2023-11-21 20:36:18 发布

Isawany

最新推荐文章于 2023-11-21 20:36:18 发布

阅读量877

点赞数

分类专栏：论文阅读文章标签：论文阅读语言模型 nlp 自然语言处理神经网络

本文链接：https://blog.csdn.net/weixin_38124427/article/details/130978247

版权

论文笔记--Deep contextualized word representations

1. 文章简介
2. 文章概括
3 文章重点技术
4. 文章亮点
5. 原文传送门

1. 文章简介

标题：Deep contextualized word representations
作者：Matthew E. Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, Luke Zettlemoyer
日期：2018
期刊：arxiv preprint

2. 文章概括

文章提出了一种语言模型的预训练方法ELMo（Embeddings from Language Models）。与传统仅仅使用最顶层隐藏层的神经网络不同，ELMo将所有biLM隐藏层信息通过线性层汇总，从而使得模型同时将高级特征和低级特征输入到模型输出阶段。ELMo在文章实验的所有NLP任务上均达到或超过了SOTA。

3 文章重点技术

3.1 BiLM(Bidirectional Language Model)

给定序列 $(t_1, \dots, t_N)$ ，前向语言模型（生成式）基于当前时刻前的token计算当前时刻的token概率，即在时刻 $t$ ，给定 $(t_1, \dots t_{k-1})$ ，计算 $p(t_1,\dots, t_N) = \prod_{k=1}^N p(t_k|t_1, \dots, t_{k-1}).$
后向语言模型则相反，即通过当前时刻之后的token预测当前时刻token的概率 $p(t_1,\dots, t_N) = \prod_{k=1}^N p(t_k|t_{k+1}, \dots, t_N).$