# CS224D Lecture2 札记

1. Video/Slides

simple co-occurrence vectors -> Method 1 SVD -> SVD in python -> interesting semantic patterns emerge in the vectors -> problem with SVD -> Method 2 word2vec ->Main idea of word2vec -> Details of word2vec -> interesting semantic patterns emerge in the vectors of word2vec

a)Discrete representation of word

b)Distributional similarity based representations

c)Make neighbors represent words

1.with a co-occurrence matrix

2.Method 1: Dimensionality reduction on CM

SVD能很好的捕捉出矩阵中的variation，对降噪和降维都很有帮助

import numpy as np
import matplotlib.pyplot as plt
la = np.linalg
words = ["I", "like", "enjoy", "deep", "learning", "NLP", "flying", "."]
X = np.array([[0, 2, 1, 0, 0, 0, 0, 0],
[2, 0, 0, 1, 0, 1, 0, 0],
[1, 0, 0, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 1, 0, 0, 0],
[0, 0, 0, 1, 0, 0, 0, 1],
[0, 1, 0, 0, 0, 0, 0, 1],
[0, 0, 1, 0, 0, 0, 0, 1],
[0, 0, 0, 0, 1, 1, 1, 0]])
U, s, Vh = la.svd(X, full_matrices = False)

for i in xrange(len(words)):
plt.text(U[i, 0], U[i, 1], words[i])
plt.axis([-1, 1, -1, 1])
plt.show()

3. Hacks to CM

4. Interesting semantic patterns emerge in the vectors

5. problems with SVD

6. Main idea of word2vec

word2vec和之前的方法不同，之前建立CM是一次浇铸一次成型（- -），现在建立模型是慢慢的一个单词一个单词来。首先设定init parameter，然后根据这个parameter预测输入进来的word的context，然后如果context和ground truth类似就不变，如果不类似就penalize parameter（怎么有点像训练小狗，做的对给饭吃，做的不对抽两下 - -）

7.Details of word2vec

p(w_0|w_t)是用softmax表示的说实话没搞明白为啥这样表示，看了大纲说下一讲要讲

8.interesting semantic patterns emerge in the vectors of word2vec

9.Introduction of GloVe

2.Lecture notes 1

Lecture note 1里第一节到第三节讲的内容第一课和第二课video和slide里都有讲，就不赘述了。

a)Language Models

b)Continuous Bag of Word Model(CBOW)

CBOW做的工作是从context计算出center word。其中有几个很重要的参数：Input word matrix(W_1),Output word matrix(W_2),w_i是W_1的第i列代表了第i个单词的输入向量，v_i是W_2的第i行代表了第i个单词的输出向量。

c)skip-gram Model

skip-gram我理解的是相当于做CBOW的逆运算，从center word计算出context。具体计算的六个步骤也类似，只是第三步不再将所有的求出来的u向量取平均，而是直接赋值给h。Object function也和CBOW的object function类似使用到了softmax。

d)Negative sampling

Negative sampling相当于skip-gram Model的改良版，它改进了三个东西：objective function, gradients, update rules

eg： 0.9^3/4 = 0.92 0.09^3/4 = 0.16 0.01^3/4 = 0.032

3.Paper:Distributed Representation of words and phrases and their compositionality

4.Paper:Efficient Estimation of word representations in vector space

• 本文已收录于以下专栏：

举报原因： 您举报文章：CS224D Lecture2 札记 色情 政治 抄袭 广告 招聘 骂人 其他 (最多只允许输入30个字)