dropout VS. L2 VS ensemble learning
- Ensemble learning using a different set of hidden units in every iteration (this is the dropout) performs better than when using the same set of hidden units throughout the learning.
Note that even with dropout learning using more hidden units than ensemble learning, overfitting did not occur - L2与dropout的正则化效果相当,在SGD+L2的配置中需要不断尝试学习速率α,而dropout没有对应微调参数。
Selective Dropout
文献:Barrow E, Eastwood M, Jayne C. Selective Dropout for Deep Neural Networks[M]// Neural Information Processing. Springer International Publishing, 2016.
方法:根据dropout率来决定每层需要dropout的单元数,分别以下面三个值来产生三个神经单元选择概率,值越大者越
权重变化度: avgk=1n∑j=1n(|W(i)jk−W(i−1)jk|) ,变化越大则说明该单元还处于积极学习中,则dropout的概率要越低。
权重平均值: avgk=1n∑j=1n(W(i)jk) ,该值越大意味着对应神经元基本学会,则其dropout的概率要越大。
输出方差: N_Variancek=variance(X(i−1)k) ,该值越大意味着该单元基本稳定,则其dropout的概率要越大。