源自Sebastian Ruder的博文The secret ingredients of word2vec
Questions
Q1. Are embeddings superior to distributional methods?
With the right hyperparameters, no approach has a consistent advantage over another.
Q2. Is GloVe superior to SGNS?
SGNS outperforms GloVe on all tasks.
Q3. Is CBOW a good word2vec configuration?
CBOW does not outperform SGNS on any task.
Guidelines
- DON’T use shifted PPMI with SVD.
- DON’T use SVD “correctly”, i.e. without eigenvector weighting (performance drops 15 points compared to with eigenvalue weighting with p=0.5p=0.5).
- DO use PPMI and SVD with short contexts (window size of 22).
- DO use many negative samples with SGNS.
- DO always use context distribution smoothing (raise unigram distribution to the power of α=0.75α=0.75) for all methods.
- DO use SGNS as a baseline (robust, fast and cheap to train).
- DO try adding context vectors in SGNS and GloVe.