CS224N-Notes02-GloVe, Evaluation and Training

CS224n:Natural Language Precessing with Deep Learning
Lecture Notes:Part 1
Authors: Francois Chaubard etc.

This set of notes first introduces the GloVe model for training word vectors. Then it extends our discussion of word vectors (interchangeably called word embedding) by seeing how they can be evaluated intrinsically and extrinsically. As we proceed, we discuss the example of word analogies as an intrinsic evaluation technique and how it can be sued to tune word embedding techniques. We then discuss training model weights/parameters and word vectors for extrinsic tasks. Lastly we motivate artificial neural network as a class of models for natural language processing tasks.

1. Global Vectors for Word Representation
1.1 Comparison with Previous Methods

So far, wo have looked at two main classes of methods to find word embeddings. The first set are count-based and rely on matrix factorization. While these methods effectively leverage global statistical information, they are primarily used to capture word similarities and do poorly on tasks such as analogy, indicating a sub-optimal vector space structure. The other set of methods are shallow window-based (e.g. the skim-gram and the CBOW models), which learn word embeddings by making predictions in local context windows. These models demonstrate the capacity to capture complex linguistic patterns beyond word similarity, but fail to make use of the global co-occurrence statistics.
In comparison, GloVe constants of a weighted least squares model that trains on glob word-word co-occurrence counts and thus make efficient use of statistics. The model produces a word vector space with meaningful sub-structure. It shows state-of-art performance on the word analogy task, and outperforms other current methods on several word similarity tasks.

1.2 Co-occurrence Matrix

Let X denote the word-word con-occurrence matrix, where X i j X_{ij} Xij indicates the number of times word j j j occur in the context of word i i i. Let X i = ∑ k X i k X_i=\sum_kX_{ik} Xi=kXik be the number of times of any word k appears in the context of word i. Finally, let P i j = P ( w j ∣ w i ) = X i j X i P_{ij}=P(w_j|w_i)=\frac{X_{ij}}{X_i} Pij=P(wjwi)=XiXij be the probability of j appearing in the context of word i.
Populating this matrix requires a single pass through the entire corpus to collect the statistics. For large corpus, this pass can be computationally expensive, but it is a one-time up-front cost.

1.3 Least Squares Objective

Recall that for the skip-gram model, we use softmax to compute the probability of word j j j appears in the context of word i i i:

Training proceeds in an on-line, stochastic fashion, but the implied global cross-entropy loss can be calculated as:

As the same words i and j can appear multiple times in the corpus, it is more efficient to first group together the same values for i and j:

Where the value of co-occurring frequency is given by the co-occurrence matrix X X X. One significant drawback of the cross-entropy loss is that it requires the distribution Q to be properly normalized, which involves the expensive summation over the entire vocabulary. Instead, wo use a least square objective in which the normalization factors in P P P and Q Q Q are discarded:

1.4 Conclusion

In conclusion, the GloVe model efficiently leverages gobal statistical information by training only on the nonzero elements in a word-word co-occurrence matrix, and produces a vector space with meaningful sub-structure. It consistenlu outperforms word2vec on the word analogy task, given the same corpus, vocabulary, window size, and training time. It achieves better results faster, and also obtains the best results irrespective of speed.

2. Evaluation of Word Vectors

(待补充)

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值