One hidden layer Neural Network - Random Initialization

王彩旗 edwardwangcq.com

于 2021-07-12 10:51:19 发布

阅读量86

点赞数

分类专栏：人工智能 # Neural Networks and Deep Learning

本文链接：https://blog.csdn.net/edward_wang1/article/details/118669730

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

Neural Networks and Deep Learning

32 篇文章 0 订阅

订阅专栏

在训练神经网络（NN）时，随机初始化权重至关重要，因为如果权重全为零，隐藏层单元将对输入计算相同的函数，导致训练无效。解决方案是通过小范围随机值初始化权重，以打破对称性，加快梯度下降和学习速度。本文以一个简单的NN为例，展示了如何使用`numpy.random.randn`函数以0.01的小随机值初始化权重和偏置。

摘要由CSDN通过智能技术生成

When you train your NN, it's important to initialize the weights ( $W^{[2]}, b^{[2]}, W^{[1]}, b^{[1]}$ etc.) randomly. For logistic regression, it's ok to initialize the weights to 0; but for NN, if initialize the weights all to 0 and then apply gradient descent, it won't work!

The reason is that, if you initialize the weights W to zero, then all the hidden units are symmetric. And no matter how long you train your NN, all hidden units still computing exactly the same function. There is really no point to having more than one hidden unit because they're computing the same thing.

The solution is to initialize the weights randomly. Take following NN as an example:

Firstly, initialize $W^{[1]}$ as following.

>>> W1=np.random.randn(2,2)*0.01
>>> W1
array([[-3.61903370e-03, -8.29109877e-05],
       [ 2.39660772e-03,  2.34993952e-03]])

Initialize $b^{[1]}$ as all zero is fine

>>> b1=np.zeros((2,1))
>>> b1
array([[0.],
       [0.]])

Similarly, you can initialize $W^{[2]}$ and $b^{[2]}$ :

>>> W2=np.random.randn(1,2)*0.01
>>> W2
array([[ 0.01702848, -0.00522487]])
>>> b2=np.zeros((1,1))
>>> b2
array([[0.]])
>>>

Note we initialize $W^{[1]}$ and $W^{[2]}$ to very small random values by * 0.01. The reason is, if we're using sigmoid or tanh activation function, if the weights are too large or too small, then Z will be very big or very small. It's more likely that you'll end up with at the flat parts of the activation function where the slope of the gradient is very small. As a result, gradient descent will be very slow and learning will be very slow. If you don't have any tanh or sigmod function throughout you NN, this is less of an issue. But if you're doing binary classification, and your output unit is a sigmoid function, please don't initialize the weights too large.
When you train a one hidden layer shallow NN, it's probably ok to use * 0.01. But when train a very deep NN, you'll probably pick a different constant. We'll discuss this later

<end>

王彩旗 edwardwangcq.com

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
One hidden layer Neural Network - Random Initialization

When you train your NN, it's important to initialize the weights ( etc.) randomly. For logistic regression, it's ok to initialize the weights to 0; but for NN, if initialize the weights all to 0 and then apply gradient descent, it won't work!
复制链接

扫一扫