读论文：Pooled Contextualized Embeddings for Named Entity Recognition

最新推荐文章于 2024-05-03 15:42:48 发布

保护敌方输出

最新推荐文章于 2024-05-03 15:42:48 发布

阅读量657

点赞数

分类专栏： NLP 文章标签： nlp 命名实体识别前沿实体识别最新文章 SOTA

本文链接：https://blog.csdn.net/qq_35376241/article/details/99433231

版权

NLP 专栏收录该内容

4 篇文章 1 订阅

订阅专栏

最近在看命名实体识别方向的最新的paper。在这个方向，18年年底有一篇<contextual string embedding for sequence labeling>，在CoNLL03 数据集的F1值超过BERT达到了93.09。做法是弄了个预训练的character_embedding，用character_embedding动态地生成word_embedding，然后再接经典的BiLSTM-CRF结构。

然后我们今天来看看这篇<pooled contextualized emdeddings for named entity recognition>，2019年6月NCALL的文章，CoNLL03的分数是93.18，提高了一丢丢。这篇文章其实就是在<contextual string embedding for sequence labeling>上的改进。他首先指出前者的向量生成存在一个问题，在生成word embedding时，会存在遇到不指定上下文时的生僻词会出错的问题。

我们平常在阅读时有一个共识，“We intuit that entities are normally only used in underspecified contexts if they are expected to be known to the reader. That is,they are either more clearly introduced in an earlier sentence, or part of general in-domain knowledge a reader is expected to have 。”就是说，如果一个句子中出现了一个生僻的实体词语，那么在这篇文档的前面的句子中，它很有可能是被介绍过的，或者是读者应该知道的一般领域的知识。作者就根据这个想法，改进了前面那篇文章的结构。

它的改进在于，处理一篇文档时，在内存中维护一个词典，存储每一个词生成的word embedding。对于一个生僻词，从词典里取该词所有word embedding，池化再结合生成新的word embedding，上图pooling+concatenation。这样子该词的embedding就不止与这个句子有关，还与文档中的前文有关。

保护敌方输出

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
读论文：Pooled Contextualized Embeddings for Named Entity Recognition

最近在看命名实体识别方向的最新的paper。在这个方向，18年年底有一篇<contextual string embedding for sequence labeling>，在CoNLL03 数据集的F1值超过BERT达到了93.09。做法是弄了个预训练的character_embedding，用character_embedding动态地生成word_embeddin...
复制链接

扫一扫