面向依存关系语法分析的词向量裁剪

改进Word2Vec在依存分析中的应用

最新推荐文章于 2025-05-22 22:23:39 发布

原创最新推荐文章于 2025-05-22 22:23:39 发布 · 711 阅读

0 ·

CC 4.0 BY-SA版权

词向量化专栏收录该内容

8 篇文章

订阅专栏

本文介绍了一种改进的Word2Vec模型在依存语法分析任务中的应用，通过调整窗口大小和采样策略来提高词性标注的准确性。实验结果显示，这种方法能够有效提升词向量在词性标注任务上的表现。

文献：Bansal M, Gimpel K, Livescu K. Tailoring Continuous Word Representations for Dependency Parsing[C]//ACL (2). 2014: 809-815.

修正策略

（1）文章采用相对较小的w：实验发现，在word2vec中，窗口尺寸w越大，则捕获词语义的概率越高；窗口尺寸w越小，则捕获词POS的概率越高.
（2）negative sampling中的采样对象，传统word2vec在目标词 $v$ 的邻域中进行采样，而本文的采样对象是目标词 $v$ 在依存分析树中的特定集合(目标词的爷爷、父亲与孩子)

实验方案

评价标准1： We compute cosine similarity between the two vectors in each word pair, then order the word pairs by similarity and compute Spearman’s rank correlation coefficient (ρ) with the gold similarities
评价标准2： We use a metric based on unsupervised evaluation of POS taggers， and perform clustering and map each cluster to one POS tag so as to maximize tagging accuracy