- TensorFlow:learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08.
- Keras:lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-08, decay=0.0.
- Blocks:learning_rate=0.002, beta1=0.9, beta2=0.999, epsilon=1e-08, decay_factor=1.
- Lasagne:learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08
- Caffe:learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-08
- MxNet:learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8
- Torch:learning_rate=0.001, beta1=0.9, beta2=0.999, epsilon=1e-8can
adam表现是最好的策略,但是上面的learning rate,如果用了normalization,设置大一些会比较好0.005,0.01什么的。如果没有预训练应该再大一些,如果预训练那就小一些。