Word2Vec
Abstract
The similar structure as NNLM, bug focus on Word Embedding
Two learning approach:
Continuous Bag of Word(CBOW)
Continuous Skip-gram(Skip-gram)
1. CBow
Given its context wi-N, wi-n+1,wi-n+2, ...,wi-1, wi+1, ...wi+N+1, wi+N, predict the word w.
Use softmax after projection to get the probability of
The meaning of the word depends on the context.--NLP Group.
Skip-gram
poor performance
Given word w, predict its context , ,,...,,,...,,. N = (n-1)/2.
Use softmax after projection to get the probability of w(t-2), w(t-1), w(t+1), w(t+2).
Difficulty: Training is incredibly expensive.
Huffman encoding and hierachical Softmax
This method is used to solve the difficulty, training is incredibly expensive.
Negative Sampling
This method is used to solve the difficulty, training is incredibly expensive.
we play the game (neg: eat). Speed up convergency.
Reference
Distributed Representations: Tomas Mikolov, Jeffrey Dean. Distributed Representations of Words and Phrases and their Compositionality. 2013. https://arxiv.org/pdf/1310.4546.pdf