Main idea of word2vec
- Instead of capturing cooccurrence counts directly
- Predict surrounding words of every word
- Faster and can easily incorporate a new sentence/document or add a word to the vocabulary
Details
- Predict surrounding words in a window of length c of every word.
- Objective function: Maximize the
log probability of any context word given the current center word:
J(θ)=1T∑t=1T∑−c≤j≤c,j≠0logp(wt+j|wt)
p(wt+j|wt)=p(wo|wi)=exp(v′Twovwi)∑Ww=1exp(v′Twvwi)
where v andv′ areinput
andoutput
vector representations of w (so every word has two vectors)