[cs224n] Lecture 2 – Word Vectors and Word Senses

Lecture 2 – Word Vectors and Word Senses


1. Review: Main idea of word2vec

Word2vec parameters and computations

Word2vec maximizes objective function by putting similar words nearby in space

 

2. Optimization: Gradient Descent

Gradient Descent

Stochastic Gradient Descent

Stochastic gradients with word vectors!

1b. Word2vec: More details

So far, we have looked at two main classes of methods to find word embeddings. The first set are count-based and rely on matrix factorization (e.g. LSA, HAL). While these methods effectively leverage global statistical information, they are primarily used to capture word similarities and do poorly on tasks such as word analogy, indicating a sub-optimal vector space structure. The other set of methods are shallow window-based (e.g. the skip-gram and the CBOW models), which learn word embeddings by making predictions in local context windows. These models demonstrate the capacity to capture complex linguistic patterns beyond word similarity, but fail to make use of the global co-occurrence statistics.

The skip-gram model with negative sampling (HW2)

In comparison, GloVe consists of a weighted least squares model that trains on global word-word co-occurrence counts and thus makes efficient use of statistics. The model produces a word vector space with meaningful sub-structure. It shows state-of-the-art performance on the word analogy task, and outperforms other current methods on several word similarity tasks. 


3. But why not capture co-occurrence counts directly?

Example: Window based co-occurrence matrix

Window based co-occurrence matrix

Problems with simple co-occurrence vectors

Solution: Low dimensional vectors

Method 1: Dimensionality Reduction on X (HW1)

Simple SVD word vectors in Python

Hacks to X (several used in Rohde et al. 2005)

Interesting syntactic patterns emerge in the vectors

Count based vs. direct prediction

How to evaluate word vectors?

Intrinsic word vector evaluation

Glove Visualizations

Glove Visualizations: Company - CEO

Glove Visualizations: Superlatives

Details of intrinsic word vector evaluation

Analogy evaluation and hyperparameters

 Analogy evaluation and hyperparameters

Another intrinsic word vector evaluation

Closest words to “Sweden” (cosine similarity)

Correlation evaluation

 Word senses and word sense ambiguity

pike

Improving Word Representations Via Global Context And Multiple Word Prototypes (Huang et al. 2012)

Linear Algebraic Structure of Word Senses, with Applications to Polysemy

Extrinsic word vector evaluation

Course plan: coming weeks

Office Hours / Help sessions

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值