back propagation in practice;GD and SGD

最新推荐文章于 2024-03-06 17:02:38 发布

northeastsqure

最新推荐文章于 2024-03-06 17:02:38 发布

阅读量817

点赞数

分类专栏：深度学习文章标签： deep-learning

本文链接：https://blog.csdn.net/northeastsqure/article/details/44275141

版权

深度学习专栏收录该内容

61 篇文章 1 订阅

订阅专栏

Markdown编辑器
SGD依赖于训练数据顺序。
sgd gd 区别：
In both gradient descent (GD) and stochastic gradient descent (SGD), you update a set of parameters in an iterative manner to minimize an error function.

While in GD, you have to run through ALL the samples in your training set to do a single update for a parameter in a particular iteration, in SGD, on the other hand, you use ONLY ONE training sample from your training set to do the update for a parameter in a particular iteration.

Thus, if the number of training samples are large, in fact very large, then using gradient descent may take too long because in every iteration when you are updating the values of the parameters, you are running through the complete training set. On the other hand, using SGD will be faster because you use only one training sample and it starts improving itself right away from the first sample.

SGD often converges much faster compared to GD but the error function is not as well minimized as in the case of GD. Often in most cases, the close approximation that you get in SGD for the parameter values are enough because they reach the optimal values and keep oscillating there.

If you need an example of this with a practical case, check Andrew NG’s notes here where he clearly shows you the steps involved in both the cases. http://cs229.stanford.edu/notes/…

northeastsqure

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
back propagation in practice;GD and SGD

In both gradient descent (GD) and stochastic gradient descent (SGD), you update a set of parameters in an iterative manner to minimize an error function.While in GD, you have to run through ALL the sam
复制链接

扫一扫

专栏目录