深度学习权重的初始化的太小,那信号将会在每层传递时逐渐缩小导致难以产生作用,如果权重初始化的太大,那信号将在每层间传递逐渐放大导致发散和失效,根据深度学习大牛的理论说明列,权重应该是满足均值为0,方差为2除以两层神经元个数之和,就是权重链接的那两层。满足的分布就是均匀分布或者高斯分布。
例如:
import tensorflow as tf;
import numpy as np;
import sklearn.preprocessing as pre;
from tensorflow.examples.tutorials.mnist import input_data
def xavier_init(fan_in, fan_out, distribution='Gaussian'):
if distribution == 'Gaussian':
return tf.random_normal((fan_in, fan_out), mean=0, stddev=2.0/(fan_in+fan_out), dtype=tf.float32)
else:
low = -np.sqrt(6.0 / (fan_in + fan_out))
high = np.sqrt(6.0 / (fan_in + fan_out))
return tf.random_uniform((fan_in, fan_out), minval=low, maxval=high, dtype=tf.float32)
with tf.Session() as sess:
sess.run(tf.initialize_all_variables())
print sess.run(xavier_init(3, 5))
print sess.run(xavier_init(3, 5, distributio