One hidden layer Neural Network - Random Initialization

在训练神经网络(NN)时,随机初始化权重至关重要,因为如果权重全为零,隐藏层单元将对输入计算相同的函数,导致训练无效。解决方案是通过小范围随机值初始化权重,以打破对称性,加快梯度下降和学习速度。本文以一个简单的NN为例,展示了如何使用`numpy.random.randn`函数以0.01的小随机值初始化权重和偏置。
摘要由CSDN通过智能技术生成


When you train your NN, it's important to initialize the weights (W^{[2]}, b^{[2]}, W^{[1]}, b^{[1]} etc.) randomly. For logistic regression, it's ok to initialize the weights to 0; but for NN, if initialize the weights all to 0 and then apply gradient descent, it won't work!

The reason is that, if you initialize the weights W to zero, then all the hidden units are symmetric. And no matter how long you train your NN, all hidden units still computing exactly the same function. There is really no point to having more than one hidden unit because they're computing the same thing.

The solution is to initialize the weights randomly. Take following NN as an example:

figure-1
  • Firstly, initialize  W^{[1]} as following.
>>> W1=np.random.randn(2,2)*0.01
>>> W1
array([[-3.61903370e-03, -8.29109877e-05],
       [ 2.39660772e-03,  2.34993952e-03]])
  • Initialize b^{[1]} as all zero is fine
>>> b1=np.zeros((2,1))
>>> b1
array([[0.],
       [0.]])
  • Similarly, you can initialize W^{[2]} and b^{[2]}:
>>> W2=np.random.randn(1,2)*0.01
>>> W2
array([[ 0.01702848, -0.00522487]])
>>> b2=np.zeros((1,1))
>>> b2
array([[0.]])
>>>
  • Note we initialize W^{[1]} and W^{[2]} to very small random values by * 0.01. The reason is, if we're using sigmoid or tanh activation function, if the weights are too large or too small, then Z will be very big or very small. It's more likely that you'll end up with at the flat parts of the activation function where the slope of the gradient is very small. As a result, gradient descent will be very slow and learning will be very slow. If you don't have any tanh or sigmod function throughout you NN, this is less of an issue. But if you're doing binary classification, and your output unit is a sigmoid function, please don't initialize the weights too large.
  • When you train a one hidden layer shallow NN, it's probably ok to use * 0.01. But when train a very deep NN, you'll probably pick a different constant. We'll discuss this later

<end>

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值