20230117 -
在最初使用Keras进行神经网络编程的时候,除了设置神经元个数,层数,或者激活函数之后,基本上对神经网络内部就不怎么管了,所以最后很多参数都是默认的。这种情况一般遇到的数据集问题,都能轻易解决。一般不是层数非常深的神经网络,偶尔也遇到过梯度爆炸和消失的问题。
但最近遇到一个数据集,默认的情况下效果也还行,但希望更进一步。就希望通过权值初始化的角度来进行改进。Keras默认的权值初始化方式[1]是
Each layer has its own default value for initializing the weights. For most of the layers, such as Dense, convolution and RNN layers, the default kernel initializer is ‘glorot_uniform’ and the default bias intializer is ‘zeros’ (you can find this by going to the related section for each layer in the documentation; for example here is the Dense layer doc). You can find the definition of glorot_uniform initializer here in the Keras documentation.
这个从他官方的文档上也是能看出的:
在这个问答下,另外一个人提到了一个文章[2],具体讲述了两种初始化方式,Xavier和kaiming两种方式,从他的结论中得出,
I think this article is very interesting and it shows roughly that for “tanh” activations you should use ‘glorot_uniform’ and for “relu” layers you should use “he_uniform”
从文章的理论分析来看,确实是有用的。但是我这里并没有具体编程验证。同时,这里有一篇问答[3]对比了keras和torch两者对he_normal的实现对比。
参考
[1]Where to find a documentation about default weight initializer in Keras?
[2]Weight Initialization in Neural Networks: A Journey From the Basics to Kaiming
[3]he_normal (Keras) is truncated when kaiming_normal_ (pytorch) is not