One hidden layer Neural Network - Gradient descent for neural networks

最新推荐文章于 2022-12-22 16:25:54 发布

王彩旗 edwardwangcq.com

最新推荐文章于 2022-12-22 16:25:54 发布

阅读量99

点赞数

分类专栏：人工智能 # Neural Networks and Deep Learning

本文链接：https://blog.csdn.net/edward_wang1/article/details/118668177

版权

人工智能同时被 2 个专栏收录

142 篇文章 0 订阅

订阅专栏

Neural Networks and Deep Learning

32 篇文章 0 订阅

订阅专栏

这篇博客详细介绍了Andrew Ng在Coursera课程中关于神经网络与深度学习的第3.9节内容——梯度下降在神经网络中的应用。文章通过计算图展示了单个和多个训练样本时的反向传播过程，并提供了关键变量的维度检查，以确保计算的准确性。对于单个训练样本，给出了详细的反向传播公式；而对于多个样本，解释了批量梯度下降的计算方式。这些内容对理解神经网络的训练过程极具帮助。

摘要由CSDN通过智能技术生成

The notes when study the Coursera class by Mr. Andrew Ng "Neural Networks & Deep Learning", section 3.9 "Gradient descent for neural networks". It shows the computation graph for NN, how to compute back propagation of NN when there is one and multiple training examples. Share it with you and hope it helps!
————————————————

For an one hidden layer NN as figure-1, its computation graph is as what shown in figure-2:

When there is just one training example, the back propagation for NN is:

Note: we can check the dimension of some variables to make sure the calculation is correct. Take the computation of $dz^{[1]}$ as example:

$W^{[2]}$ is $n^{[2]}\times n^{[1]}=1 \times 4$ , $(W^{[2]})^{T}$ is $n^{[1]} \times n^{[2]}=4 \times 1$

$dz^{[2]}$ is 1 x 1

$W^{[2]T}dz^{[2]}$ is 4 x 1

$g^{[1]'}(z^{[1]})$ is 4 x 1

* is dot element multiplication, thus the final $dz^{[1]}$ is 4 x 1