From perceptrons to machine learning (二）

最新推荐文章于 2024-06-26 10:48:54 发布

OldBibi

最新推荐文章于 2024-06-26 10:48:54 发布

阅读量135

点赞数 1

文章标签： gradient descent Cost Function perceptron

本文链接：https://blog.csdn.net/weixin_43909872/article/details/85037450

版权

Learning with gradient descent(梯度下降法）
Now we have :
a. Inputs, outputs from examples(We call them training data in machine learning)
b. The neuron function to calculate the output from input data, the function should be smooth and differentiable, so we introduced sigmoid function :
σ(z) = 1/(1+exp(-z))
z = w*x + b (x is input)

3.1 Cost Function
We need a way to present the difference between the output of our neurons(y(x)) and the result in the training data(a). Base on the physics experiment or statistics experience, we would use Mean quadratic error(MSE) first. Given n is the total No. of training data, the cost function is as below :
在这里插入图片描述

3.2 Gradient decent
Within the cost function, our target is to find the proper w and b to make it minimum. The cost function’s chart as below, we can find small steps towards the bottom of the mountain. This is gradient decent.
This is a good link to learn what is gradient decent : https://www.jianshu.com/p/c7e642877b0e
在这里插入图片描述

Suppose we have only one parameter x in the cost function(Actually we have 2), the cost function is C(x), the gradient is C’(x).
we can move △v towards the bottom(C’(x) < 0), then the C(x) would change △C towards the bottom.
△C = |△vC’(x)|
Define △v = λ * C’(x)
We can have △C = λC’(x)C’(x)
λ is the learning rate.
Then each step we can move the ball’s position v by v -> v’ = v - λ * C’(x)
The cost would be C - C’ = C - λC’(x)*C’(x)

If there are multiple variables in the cost function, actually we need to calculate the second partial derivatives of C, ∂^2 C/∂w∂b， but it’s costly. So we would use the 1 level partial derivatives:
在这里插入图片描述

3.3 Stochastic gradient descent
Actually we have multiple training data set, and we need to calculate gradient ▽C for each training input separately, then average them. This is time costly when the training input is very large.
So we use Stochastic gradient descent to speed up the learning. The idea is to estimate the gradient ▽C by computing the average of ▽C(small) for a small sample of randomly chosen training inputs.

OldBibi

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
From perceptrons to machine learning (二）

Learning with gradient descent(梯度下降法）Now we have :a. Inputs, outputs from examples(We call them training data in machine learning)b. The neuron function to calculate the output from input data, th...
复制链接

扫一扫