NAACL 2019 字词表示学习分析

最新推荐文章于 2020-08-24 15:57:26 发布

weixin_30321449

最新推荐文章于 2020-08-24 15:57:26 发布

阅读量670

点赞数

文章标签：人工智能开发工具

原文链接：http://www.cnblogs.com/fengyubo/p/11052269.html

版权

NAACL 2019 表示学习分析

为要找出字、词、文档等实体表示学习相关的文章。

word embedding

搜索关键词 word embedding

Vector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level Representation

In this paper, we propose a novel representation for text documents based on aggregating word embedding vectors into document embeddings.

Our approach is inspired by the Vector of Locally-Aggregated Descriptors used for image representation, and it works as follows.

First, the word embeddings gathered from a collection of documents are clustered by k-means in order to learn a codebook of semnatically-related word embeddings.

Each word embedding is then associated to its nearest cluster centroid (codeword).

The Vector of Locally-Aggregated Word Embeddings (VLAWE) representation of a document is then computed by accumulating the differences between each codeword vector and each word vector (from the document) associated to the respective codeword.

We plug the VLAWE representation, which is learned in an unsupervised manner, into a classifier and show that it is useful for a diverse set of text classification tasks.

We compare our approach with a broad range of recent state-of-the-art methods, demonstrating the effectiveness of our approach.

Furthermore, we obtain a considerable improvement on the Movie Review data set, reporting an accuracy of 93.3%, which represents an absolute gain of 10% over the state-of-the-art approach.

《局部聚合字嵌入向量：一种新的文档级表示》

在本文中，我们提出了一种新的文本表示方法，该方法是将嵌入向量集成到文档嵌入中。

我们的方法受到用于图像表示的局部聚合描述符的向量的启发，其工作原理如下。

首先，从文档集合中收集到的单词嵌入被k-means聚集起来，以便学习与语义相关的单词嵌入的代码本。

然后将每个字嵌入与其最近的簇形心（代码字）相关联。

然后，通过累积每个码字向量和与各个码字相关联的每个字向量（来自文档）之间的差异，计算文档的局部聚合字嵌入（VLAWE）表示的向量。

我们将以无监督方式学习的VLAWE表示插入到分类器中，并表明它对于不同的文本分类任务集是有用的。

我们将我们的方法与一系列最新最先进的方法进行了比较，证明了我们的方法的有效性。

此外，我们对电影评论数据集进行了相当大的改进，报告的准确率为93.3%，这意味着绝对收益比最先进的方法高出10%。

Learning Bilingual Sentiment-Specific Word Embeddings without Cross-lingual Supervision

Word embeddings learned in two languages can be mapped to a common space to produce Bilingual Word Embeddings (BWE).

Unsupervised BWE methods learn such a mapping without any parallel data.

However, these methods are mainly evaluated on tasks of word translation or word similarity.

We show that these methods fail to capture the sentiment information and do not perform well enough on cross-lingual sentiment analysis.

In this work, we propose UBiSE (Unsupervised Bilingual Sentiment Embeddings), which learns sentiment-specific word representations for two languages in a common space without any cross-lingual supervision.

Our method only requires a sentiment corpus in the source language and pretrained monolingual word embeddings of both languages.

We evaluate our method on three language pairs for cross-lingual sentiment analysis.

Experimental results show that our method outperforms previous unsupervised BWE methods and even supervised BWE methods.

Our method succeeds for a distant language pair English-Basque.

《在不借助跨语言监督的情况下学习双语情感特定词嵌入》

两种语言学习的单词嵌入可以映射到一个公共空间，以生成双语单词嵌入（BWE）。

无监督的BWE方法在没有任何并行数据的情况下学习这种映射。

然而，这些方法主要是对翻译任务或词的相似性进行评价。

我们发现，这些方法无法捕获情感信息，并且在跨语言情感分析中表现不佳。

在这项工作中，我们提出了Ubise（无监督双语情绪嵌入），它可以在没有任何跨语言监督的情况下，在一个公共空间中学习两种语言的情绪特定词汇表示。

我们的方法只需要一个情感语料库在源语言和预先训练的单语单词嵌入两种语言。

我们评估了三种语言对的跨语言情感分析方法。

实验结果表明，该方法优于以往的无监督BWE方法，甚至优于有监督BWE方法。

我们的方法适用于一对远程语言的英语巴斯克语。

ReWE: Regressing Word Embeddings for Regularization of Neural Machine Translation Systems

Regularization of neural machine translation is still a significant problem, especially in low-resource settings.

To mollify this problem, we propose regressing word embeddings (ReWE) as a new regularization technique in a system that is jointly trained to predict the next word in the translation (categorical value) and its word embedding (continuous value).

Such a joint training allows the proposed system to learn the distributional properties represented by the word embeddings, empirically improving the generalization to unseen sentences.

Experiments over three translation datasets have showed a consistent improvement over a strong baseline, ranging between 0.91 and 2.4 BLEU points, and also a marked improvement over a state-of-the-art system.

《ReWE：用于神经机器翻译系统正则化的词嵌入》

神经机器翻译的正则化仍然是一个重要的问题，特别是在低资源设置。

为了解决这个问题，我们提出了回归嵌入词（ReWE）作为一种新的正则化技术，在一个系统中联合训练预测翻译中的下一个词（分类值）及其嵌入词（连续值）。

这样的联合训练使所提出的系统能够学习单词embeddings所代表的分布性质，从而从经验上改进对未见过句子的泛化。

对三个翻译数据集进行的实验表明，相对于一个强大的基线，在0.91 到 2.4 BLEU之间有了一致的改进，并且比最先进的系统有了显著的改进。

Attentive Mimicking: Better Word Embeddings by Attending to Informative Contexts

Learning high-quality embeddings for rare words is a hard problem because of sparse context information.

Mimicking (Pinter et al., 2017) has been proposed as a solution: given embeddings learned by a standard algorithm, a model is first trained to reproduce embeddings of frequent words from their surface form and then used to compute embeddings for rare words.

In this paper, we introduce attentive mimicking: the mimicking model is given access not only to a word’s surface form, but also to all available contexts and learns to attend to the most informative and reliable contexts for computing an embedding.

In an evaluation on four tasks, we show that attentive mimicking outperforms previous work for both rare and medium-frequency words.

Thus, compared to previous work, attentive mimicking improves embeddings for a much larger part of the vocabulary, including the medium-frequency range.

《专注模仿：通过关注信息上下文来更好地嵌入单词》

由于上下文信息稀疏，学习高质量的稀有词嵌入是一个难题。

“模仿”被（Pinter et al., 2017）提出作为一个解决方案。给定标准算法学习的嵌入，首先训练一个模型从其表面形式重现频繁词的嵌入，然后用它计算稀有词的嵌入。

在本文中，我们引入了专注的模仿：模仿模型不仅可以访问一个单词的表面形式，而且可以访问所有可用的上下文，并学习如何关注信息量最大、最可靠的上下文来计算嵌入。

在对四个任务的评估中，我们发现专注模仿在稀有和中频词汇方面都优于以前的工作。

因此，与以前的工作相比，专心的模仿提高了词汇表更大部分的嵌入，包括中频范围。

Better Word Embeddings by Disentangling Contextual n-Gram Information

Pre-trained word vectors are ubiquitous in Natural Language Processing applications.

In this paper, we show how training word embeddings jointly with bigram and even trigram embeddings, results in improved unigram embeddings.

We claim that training word embeddings along with higher n-gram embeddings helps in the removal of the contextual information from the unigrams, resulting in better stand-alone word embeddings.

We empirically show the validity of our hypothesis by outperforming other competing word representation models by a significant margin on a wide variety of tasks.

We make our models publicly available.

《通过分离上下文n-gram信息更好地嵌入单词》

预先训练的词向量在自然语言处理应用中普遍存在。

本文介绍了训练词嵌入与bigram甚至tri-gram嵌入的结合，改进了unigram嵌入。

我们认为训练单词嵌入和更高的n-gram嵌入有助于从unigram中删除上下文信息，从而产生更好的独立单词嵌入。

我们通过在各种各样的任务上以显著的优势胜过其他竞争词表示模型，实证地证明了我们假设的有效性。

我们把我们的模型公开。

Word-Node2Vec: Improving Word Embedding with Document-Level Non-Local Word Co-occurrences

A standard word embedding algorithm, such as word2vec and glove, makes a strong assumption that words are likely to be semantically related only if they co-occur locally within a window of fixed size.

However, this strong assumption may not capture the semantic association between words that co-occur frequently but non-locally within documents.

In this paper, we propose a graph-based word embedding method, named ‘word-node2vec’.

By relaxing the strong constraint of locality, our method is able to capture both the local and non-local co-occurrences.

Word-node2vec constructs a graph where every node represents a word and an edge between two nodes represents a combination of both local (e.g. word2vec) and document-level co-occurrences.

Our experiments show that word-node2vec outperforms word2vec and glove on a range of different tasks, such as predicting word-pair similarity, word analogy and concept categorization.

《word-node2vec：改进文档级非本地单词的嵌入》

标准的单词嵌入算法（如word2vec和glove）作出了一个强有力的假设，即单词只有在固定大小的窗口中局部共存时，才有可能与语义相关。

然而，这一强有力的假设可能无法捕获经常出现但在文档中非本地出现的单词之间的语义关联。

本文提出了一种基于图的嵌入方法“word-node2vec”。

通过对局部强约束的放松，我们的方法能够捕获局部和非局部的共现。

word-node2vec构造了一个图，其中每个节点表示一个单词，两个节点之间的边表示本地（例如word2vec）和文档级共存的组合。

我们的实验表明，word-node2vec比word2vec和glove在预测word similarity、word analogy和概念分类等一系列不同的任务上表现出色。

Density Matching for Bilingual Word Embedding

Recent approaches to cross-lingual word embedding have generally been based on linear transformations between the sets of embedding vectors in the two languages.

In this paper, we propose an approach that instead expresses the two monolingual embedding spaces as probability densities defined by a Gaussian mixture model, and matches the two densities using a method called normalizing flow.

The method requires no explicit supervision, and can be learned with only a seed dictionary of words that have identical strings.

We argue that this formulation has several intuitively attractive properties, particularly with the respect to improving robustness and generalization to mappings between difficult language pairs or word pairs.

On a benchmark data set of bilingual lexicon induction and cross-lingual word similarity, our approach can achieve competitive or superior performance compared to state-of-the-art published results, with particularly strong results being found on etymologically distant and/or morphologically rich languages.

《双语嵌入词的密度匹配》

最近的跨语言嵌入方法一般都是基于两种语言中嵌入向量集之间的线性变换。

本文提出了一种将两个单语嵌入空间表示为高斯混合模型定义的概率密度的方法，并用归一化流方法对两个密度进行匹配。

该方法不需要显式的监督，并且只能使用具有相同字符串的单词种子字典来学习。

我们认为，这个构想有几个直观上有吸引力的性质，特别是在提高健壮性和推广到困难语言对或词对之间的映射方面。

在双语词汇归纳和跨语言词汇相似度的基准数据集上，与最新出版的结果相比，我们的方法可以获得具有竞争力的或优越的性能，尤其是在词源遥远和/或形态丰富的语言上发现了非常强的结果。

Learning Unsupervised Multilingual Word Embeddings with Incremental Multilingual Hubs

Recent research has discovered that a shared bilingual word embedding space can be induced by projecting monolingual word embedding spaces from two languages using a self-learning paradigm without any bilingual supervision.

However, it has also been shown that for distant language pairs such fully unsupervised self-learning methods are unstable and often get stuck in poor local optima due to reduced isomorphism between starting monolingual spaces.

In this work, we propose a new robust framework for learning unsupervised multilingual word embeddings that mitigates the instability issues.

We learn a shared multilingual embedding space for a variable number of languages by incrementally adding new languages one by one to the current multilingual space.

Through the gradual language addition the method can leverage the interdependencies between the new language and all other languages in the current multilingual space.

We find that it is beneficial to project more distant languages later in the iterative process.

Our fully unsupervised multilingual embedding spaces yield results that are on par with the state-of-the-art methods in the bilingual lexicon induction (BLI) task, and simultaneously obtain state-of-the-art scores on two downstream tasks: multilingual document classification and multilingual dependency parsing, outperforming even supervised baselines.

This finding also accentuates the need to establish evaluation protocols for cross-lingual word embeddings beyond the omnipresent intrinsic BLI task in future work.

《使用增量多语言集线器学习无监督多语言单词嵌入》

最近的研究发现，在没有任何双语监督的情况下，利用自学习范式从两种语言中投射出单语嵌入空间，可以诱导共享的双语嵌入空间。

然而，也有研究表明，对于远程语言对，这种完全无监督的自学习方法是不稳定的，并且由于起始单语空间之间的同构减少，常常陷入局部最优。

在这项工作中，我们提出了一个新的强大的框架学习无监督的多语言单词嵌入，以减轻不稳定的问题。

我们通过在当前多语言空间中逐个递增地添加新的语言，来学习可变数量语言的共享多语言嵌入空间。

通过逐步的语言添加，该方法可以利用新语言和当前多语言空间中所有其他语言之间的相互依赖性。

我们发现在稍后的迭代过程中投射更遥远的语言是有益的。

我们的完全无监督的多语言嵌入空间产生的结果与双语词汇诱导（BLI）任务中的最新方法相当，同时获得两个下游任务的最新分数：多语言文档分类和多语言依赖性分析，甚至优于监督的baseli。NES。

这一发现也强调了在未来工作中，除了无所不在的内在 BLI 任务之外，还需要建立跨语言单词嵌入的评估协议。

VCWE: Visual Character-Enhanced Word Embeddings

Chinese is a logographic writing system, and the shape of Chinese characters contain rich syntactic and semantic information.

In this paper, we propose a model to learn Chinese word embeddings via three-level composition:

(1) a convolutional neural network to extract the intra-character compositionality from the visual shape of a character;

(2) a recurrent neural network with self-attention to compose character representation into word embeddings;

(3) the Skip-Gram framework to capture non-compositionality directly from the contextual information.

Evaluations demonstrate the superior performance of our model on four tasks: word similarity, sentiment analysis, named entity recognition and part-of-speech tagging.

《视觉字符增强词嵌入》

中文是一种标识文字系统，汉字的形状包含着丰富的句法和语义信息。

在本文中，我们提出了一个通过三级组合学习汉字嵌入的模型：

（1）卷积神经网络，从字符的视觉形状中提取字符内部的成分；

（2）一个具有self-attention的递归神经网络，将字符表示合成单词嵌入；

（3）Skip-Gram 直接从上下文信息中捕获非组合性。

评估表明，我们的模型在四个任务上表现出色：word similarity、情感分析、命名实体识别和词性标注。

Improving Cross-Domain Chinese Word Segmentation with Word Embeddings

Cross-domain Chinese Word Segmentation (CWS) remains a challenge despite recent progress in neural-based CWS.

The limited amount of annotated data in the target domain has been the key obstacle to a satisfactory performance.

In this paper, we propose a semi-supervised word-based approach to improving cross-domain CWS given a baseline segmenter.

Particularly, our model only deploys word embeddings trained on raw text in the target domain, discarding complex hand-crafted features and domain-specific dictionaries.

Innovative subsampling and negative sampling methods are proposed to derive word embeddings optimized for CWS.

We conduct experiments on five datasets in special domains, covering domains in novels, medicine, and patent.

Results show that our model can obviously improve cross-domain CWS, especially in the segmentation of domain-specific noun entities.

The word F-measure increases by over 3.0% on four datasets, outperforming state-of-the-art semi-supervised and unsupervised cross-domain CWS approaches with a large margin.

We make our data and code available on Github.

《基于嵌入词的跨域汉语分词改进》

尽管最近神经网络有一定进展，但跨领域中文分词（CWS）仍是一个挑战。

目标域中有限数量的标注数据是实现令人满意的性能的关键障碍。

在本文中，我们提出了一种基于半监督字的方法来改善跨域连续波，并给出了一个基线系统分段器。

特别是，我们的模型只在目标域中部署针对原始文本的单词嵌入，丢弃复杂的手工制作功能和特定于域的字典。

提出了一种新颖的子抽样和负抽样方法，以获得适合CWS的词嵌入。

我们在五个特殊领域的数据集上进行实验，涵盖小说、医学和专利领域。

在四个数据集上，word f-measure的增长超过3.0%，超过了最先进的半监督和无监督跨域CWS方法，并且有很大的差距。

我们在Github上公开了数据和代码。

Misspelling Oblivious Word Embeddings

In this paper we present a method to learn word embeddings that are resilient to misspellings.

Existing word embeddings have limited applicability to malformed texts, which contain a non-negligible amount of out-of-vocabulary words.

We propose a method combining FastText with subwords and a supervised task of learning misspelling patterns.

In our method, misspellings of each word are embedded close to their correct variants.

We train these embeddings on a new dataset we are releasing publicly.

Finally, we experimentally show the advantages of this approach on both intrinsic and extrinsic NLP tasks using public test sets.

《拼错遗忘词嵌入》

在本文中，我们提出了一种学习单词嵌入的方法，这种方法可以克服拼写错误。

现有的单词嵌入对于包含大量词汇外单词的畸形文本的适用性有限。

我们提出了一种将FastText与亚词相结合的方法，以及学习拼写错误模式的监督任务。

在我们的方法中，每个单词的拼写错误都嵌入到它们正确的变体附近。

我们在公开发布的新数据集上训练这些嵌入。

最后，我们实验性地展示了这种方法在使用公共测试集执行内部和外部NLP任务时的优势。

Subword-based Compact Reconstruction of Word Embeddings

The idea of subword-based word embeddings has been proposed in the literature, mainly for solving the out-of-vocabulary (OOV) word problem observed in standard word-based word embeddings.

In this paper, we propose a method of reconstructing pre-trained word embeddings using subword information that can effectively represent a large number of subword embeddings in a considerably small fixed space.

The key techniques of our method are twofold: memory-shared embeddings and a variant of the key-value-query self-attention mechanism.

Our experiments show that our reconstructed subword-based embeddings can successfully imitate well-trained word embeddings in a small fixed space while preventing quality degradation across several linguistic benchmark datasets, and can simultaneously predict effective embeddings of OOV words.

We also demonstrate the effectiveness of our reconstruction method when we apply them to downstream tasks.

《基于亚词的紧凑重构词嵌入》

文献中提出了基于亚词嵌入的概念，主要是为了解决标准的基于词嵌入中出现的集外词（OOV）问题。

本文提出了一种利用亚词信息重构预训练词嵌入的方法，该方法能在相当小的固定空间内有效地表示大量亚词嵌入。

该方法的关键技术有两个方面：内存共享嵌入和键值查询自注意机制的变体。

我们的实验表明，我们重建的基于亚词嵌入可以成功地在一个小的固定空间模拟高质量的词嵌入，同时防止跨几个语言基准数据集的质量下降，并且可以同时预测OOV单词的有效嵌入。

当我们将重建方法应用于下游任务时，我们也证明了其有效性。

Learning to Respond to Mixed-code Queries using Bilingual Word Embeddings

We present a method for learning bilingual word embeddings in order to support second language (L2) learners in finding recurring phrases and example sentences that match mixed-code queries (e.g., “接受 sentence”) composed of words in both target language and native language (L1).

In our approach, mixed-code queries are transformed into target language queries aimed at maximizing the probability of retrieving relevant target language phrases and sentences.

The method involves converting a given parallel corpus into mixed-code data, generating word embeddings from mixed-code data, and expanding queries in target languages based on bilingual word embeddings.

We present a prototype search engine, x.Linggle, that applies the method to a linguistic search engine for a parallel corpus. Preliminary evaluation on a list of common word-translation shows that the method performs reasonablly well.

《学习使用双语单词嵌入来响应混合代码查询》

我们提出了一种学习双语单词嵌入的方法，以支持第二语言（L2）学习者查找重复出现的短语和示例句，这些短语和示例句匹配混合代码查询（例如，“接受句子”），由目标语言和母语（L1）中的单词组成。

在我们的方法中，混合代码查询被转换成目标语言查询，目的是最大化检索相关目标语言短语和句子的概率。

该方法包括将给定的并行语料库转换为混合代码数据，从混合代码数据生成单词嵌入，并基于双语单词嵌入扩展目标语言中的查询。

我们提出了一个原型搜索引擎x.linggle，它将该方法应用于并行语料库的语言搜索引擎。对常用词翻译列表的初步评价表明，该方法具有较好的翻译效果。

SWOW-8500: Word Association task for Intrinsic Evaluation of Word Embeddings

Downstream evaluation of pretrained word embeddings is expensive, more so for tasks where current state of the art models are very large architectures.

Intrinsic evaluation using word similarity or analogy datasets, on the other hand, suffers from several disadvantages.

We propose a novel intrinsic evaluation task employing large word association datasets (particularly the Small World of Words dataset).

We observe correlations not just between performances on SWOW-8500 and previously proposed intrinsic tasks of word similarity prediction, but also with downstream tasks (eg. Text Classification and Natural Language Inference).

Most importantly, we report better confidence intervals for scores on our word association task, with no fall in correlation with downstream performance.

《SWOW-8500：词语关联任务，面向词嵌入的内在评价》

对于当前最先进的模型都是非常大的架构的任务来说，对预训练的词嵌入的下游评估是昂贵的。

另一方面，使用词相似度或类比数据集的内在评价也存在一些缺点。

我们提出了一个新的内在评价任务，采用大型词汇关联数据集（尤其是小世界的词汇数据集）。

我们不仅观察到SWOW-8500的性能与先前提出的单词相似性预测的内在任务之间的相关性，而且还观察到与下游任务（如文本分类和自然语言推理）之间的相关性。

最重要的是，我们报告了单词关联任务得分的更好的置信区间，与下游任务效果没有相关性。

Word Representation

搜索关键词 Word Representation

A Systematic Study of Leveraging Subword Information for Learning Word Representations

The use of subword-level information (e.g., characters, character n-grams, morphemes) has become ubiquitous in modern word representation learning.

Its importance is attested especially for morphologically rich languages which generate a large number of rare words.

Despite a steadily increasing interest in such subword-informed word representations, their systematic comparative analysis across typologically diverse languages and different tasks is still missing.

In this work, we deliver such a study focusing on the variation of two crucial components required for subword-level integration into word representation models:

1) segmentation of words into subword units, and 2) subword composition functions to obtain final word representations.

We propose a general framework for learning subword-informed word representations that allows for easy experimentation with different segmentation and composition components, also including more advanced techniques based on position embeddings and self-attention.

Using the unified framework, we run experiments over a large number of subword-informed word representation configurations (60 in total) on 3 tasks (general and rare word similarity, dependency parsing, fine-grained entity typing) for 5 languages representing 3 language types.

Our main results clearly indicate that there is no “one-size-fits-all” configuration, as performance is both language- and task-dependent.

We also show that configurations based on unsupervised segmentation (e.g., BPE, Morfessor) are sometimes comparable to or even outperform the ones based on supervised word segmentation.

《利用亚词信息学习词汇表征的系统研究》

亚词级信息（如字符、字符n-gram、词素）的使用在现代词的表示学习中已经普遍存在。

它的重要性得到了证实，尤其是对于产生大量稀有词汇的形态丰富的语言。

尽管人们对这种亚词信息词表示法的兴趣在稳步增加，但它们在不同类型语言和不同任务中的系统比较分析仍然缺失。

在这项工作中，我们提供了这样一项研究，重点是将亚词级集成到字表示模型中所需的两个关键组件的变化：

1）将单词分割成子字单元，以及 2）子字合成功能，以获得最终的字表示。

我们提出了一个学习亚词信息词表示的通用框架，允许使用不同的分段和组合组件进行简单的实验，还包括基于位置嵌入和自注意力更先进的技术。

使用统一的框架，我们在代表3种语言类型的5种语言的3个任务（一般和罕见的词相似性、依赖性分析、细粒度实体类型）上运行了大量子字通知的词表示配置（总共60个）的实验。

我们的主要结果清楚地表明，没有“一刀切”的配置，因为性能依赖于语言和任务。

我们还表明，基于无监督分词的配置（如bpe、morfessor）有时与基于监督分词的配置相当，甚至优于基于监督分词的配置。

Gating Mechanisms for Combining Character and Word-level Word Representations: an Empirical Study

In this paper we study how different ways of combining character and word-level representations affect the quality of both final word and sentence representations.

We provide strong empirical evidence that modeling characters improves the learned representations at the word and sentence levels, and that doing so is particularly useful when representing less frequent words.

We further show that a feature-wise sigmoid gating mechanism is a robust method for creating representations that encode semantic similarity, as it performed reasonably well in several word similarity datasets.

Finally, our findings suggest that properly capturing semantic similarity at the word level does not consistently yield improved performance in downstream sentence-level tasks.

《字符和词级词表示结合的门控机制：实证研究》

本文研究了不同的字符和词级表达组合方式对词尾和句子表达质量的影响。

我们提供了强有力的经验证据，证明建模字符可以改善单词和句子层次的学习表示，并且这样做在表示不太频繁的单词时尤其有用。

我们进一步证明了一种基于特征的sigmoid选通机制是一种用于创建编码语义相似性的表示的强大方法，因为它在多个单词相似性数据集中表现得相当出色。

最后，我们的研究结果表明，在单词层面上恰当地捕捉语义相似性并不能持续提高下游句子层面任务的性能。

*2vec

搜索关键字 2vec

pair2vec: Compositional Word-Pair Embeddings for Cross-Sentence Inference

Reasoning about implied relationships (e.g. paraphrastic, common sense, encyclopedic) between pairs of words is crucial for many cross-sentence inference problems.

This paper proposes new methods for learning and using embeddings of word pairs that implicitly represent background knowledge about such relationships.

Our pairwise embeddings are computed as a compositional function of each word’s representation, which is learned by maximizing the pointwise mutual information (PMI) with the contexts in which the the two words co-occur.

We add these representations to the cross-sentence attention layer of existing inference models (e.g. BiDAF for QA, ESIM for NLI), instead of extending or replacing existing word embeddings.

Experiments show a gain of 2.7% on the recently released SQuAD 2.0 and 1.3% on MultiNLI.

Our representations also aid in better generalization with gains of around 6-7% on adversarial SQuAD datasets, and 8.8% on the adversarial entailment test set by Glockner et al. (2018).

《pair2vec：用于跨句推理的复合词对儿嵌入》

对于许多跨句推理问题来说，单词对之间的隐含关系（如释义、常识、百科全书）的推理至关重要。

本文提出了一种新的学习和使用嵌入词对的方法，它隐式地表示了这种关系的背景知识。

我们的成对嵌入被计算为每个单词表示的复合函数，这是通过最大化点态互信息（PMI）和两个单词共同出现的上下文来学习的。

我们将这些表示添加到现有推理模型的跨句子关注层（例如，QA的BiDAF 、NLI的ESIM），而不是扩展或替换现有的单词嵌入。

实验表明，在最近发布SQuAD 2.0上取得了2.7%的提升，在 MultiNLI 上取得了1.3%的提升。

我们的表示也有助于更好的概括，在对抗SQuAD数据集上获得约6-7%的提升，在Glockner（2018）等人的对抗限定测试集上获得8.8%的提升。

Augmenting word2vec with latent Dirichlet allocation within a clinical application

This paper presents three hybrid models that directly combine latent Dirichlet allocation and word embedding for distinguishing between speakers with and without Alzheimer’s disease from transcripts of picture descriptions.

Two of our models get F-scores over the current state-of-the-art using automatic methods on the DementiaBank dataset.

《在临床应用中使用LDA来增强word2vec》

本文提出了三种混合模型，将LDA和词嵌入直接结合起来，用于区分阿尔茨海默病患者和非阿尔茨海默病患者的图像描述转录本。

我们的两个模型在 DementiaBank 数据集上使用自动方法，取得了比最先进的水平更高的 F 值。

vector

搜索关键词 vector

Vector of Locally Aggregated Embeddings for Text Representation

We present Vector of Locally Aggregated Embeddings (VLAE) for effective and, ultimately, lossless representation of textual content.

Our model encodes each input text by effectively identifying and integrating the representations of its semantically-relevant parts.

The proposed model generates high quality representation of textual content and improves the classification performance of current state-of-the-art deep averaging networks across several text classification tasks.

《文本表示的局部聚集嵌入向量》

我们提出了局部聚集嵌入（VLAE）的向量，以有效且最终无损地表示文本内容。

我们的模型通过有效地识别和集成其语义相关部分的表示来对每个输入文本进行编码。

该模型可以生成高质量的文本内容表示，并提高当前最先进的深度平均网络在多个文本分类任务中的分类性能。

转载于:https://www.cnblogs.com/fengyubo/p/11052269.html

weixin_30321449

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
NAACL 2019 字词表示学习分析

NAACL 2019 表示学习分析为要找出字、词、文档等实体表示学习相关的文章。word embedding搜索关键词 word embeddingVector of Locally-Aggregated Word Embeddings (VLAWE): A Novel Document-level RepresentationIn this paper, we propose a n...
复制链接

扫一扫