调参技巧
数据增强
预处理
1️⃣zero-center
[9]将数据中心化
初始化
1️⃣Xavier initialization[7]方法
适用[9]于普通激活函数(tanh,sigmoid):scale = np.sqrt(3/n)
2️⃣He initialization[8]方法
适用[9]于ReLU:scale = np.sqrt(6/n)
3️⃣Batch normalization[10]
4️⃣RNN/LSTM init hidden state
Hinton[3]提到将RNN/LSTM的初始hidden state设置为可学习的weight
训练技巧
1️⃣Gradient Clipping[5,6]
2️⃣learning rate
原则:当validation loss开始上升时,减少学习率。
[1]Time/Drop-based/Cyclical Learning Rate
3️⃣batch size
[2]中详细论述了增加batch size而不是减小learning rate能够提升模型表现。保持学习率不变,提高batch size,直到batch size~训练集/10,接下来再采用学习率下降的策略。
Reference
[1]How to make your model happy again — part 1
[2]Don’t Decay the Learning Rate, Increase the Batch Size
[3]CSC2535 2013: Advanced Machine Learning Lecture 10 Recurrent neural networks
[4]https://zhuanlan.zhihu.com/p/25110150
[5]On the difficulty of training Recurrent Neural Networks
[6]Language Modeling with Gated Convolutional Networks
[7]Understanding the difficulty of training deep feedforward neural networks
[8]Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
[9]知乎:你有哪些deep learning(rnn、cnn)调参的经验?
[10]Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift