问题
RNN在迭代运用状态转换操作“输入到隐状态”实现任意长序列的定长表示时,会遭遇到“对隐状态扰动过于敏感”的困境。
dropout
dropout的数学形式化:
- y=f(W⋅d(x)) , 其中 d(x)={
mask∗x, if train phaseing(1−p)x,otherwise
p 为dropout率,mask为以1-p为概率的贝努力分布生成的二值向量
rnn dropout
改变传统做法“在每个时间步采用不同的mask对隐节点进行屏蔽”,提出新的策略(如下图所示),其特点是:1)generates the dropout mask only at the beginning of each training sequence and fixes it through the sequence;2)dropping both the non-recurrent and recurrent connections。
Moon T, Choi H, Lee H, et al. RNNDROP: A novel dropout for RNNS in ASR[C]// Automatic Speech Recognition and Understanding. IEEE, 2016:65-70.
recurrent dropout
- 思想:通过dropout LSTM/GRU中的input或update门以prevents the loss of long-term memories built up in the states/cells 。
简单RNN及其dropout:
RNN: