模型:word2vec(skip-gram、CBOW)、GloVe、DNN/BP/Tips for training、RNN/GRU/LSTM、Attention、CNN、TreeRNN
应用:Neural Machine Translation、Dependency Parsing、Coreference Resolution
作业:skip-gram、window-based sentiment classification;dependency parsing;named entity recognition、RNN、GRU;question answer!
收获很大!!!
=======================课程中涉及到的模型、方法
训练word embedding的方法:
word2vec:skip-gram(根据中间的词预测周围的词)、CBOW(根据周围词的(平均)预测中间的词)
This captures co-occurrence of words one at a time
==》一般直接用W作为最终的embedding
GloVe:考虑word-word co-occurrence以及单个word本身频率
This captures co-occurrence counts directly
==》一般用U+V作为最终的embedding
DNN, BP and Tips for training:
lecture notes写的很好:gradient check、regularization、dropout、activation function、data preprocessing、parameter initialization、optimizer
RNN, GRU, LSTM
Gradient vanishing:正交初始化+relu;或者使用GRU, LSTM
梯度爆炸:gradient clipping,[-5, +5] is a good choice
Fancy RNN:GRU, LSTM, bi-directional RNN, multi-layer RNN
Attention
Vanilla attention:global attention/local attention、soft attention/hard attention è dot-product attention;Multiplicative attention;Additive attention。
控制attention的位置:encourage covering ALL important parts;prevent attending to the same part repeatedly.(主要通过修改attention weight实现)
Self-attention:同一个RNN,当前的位置attend之前所有的位置