From perceptrons to machine learning (二)

  1. Learning with gradient descent(梯度下降法)
    Now we have :
    a. Inputs, outputs from examples(We call them training data in machine learning)
    b. The neuron function to calculate the output from input data, the function should be smooth and differentiable, so we introduced sigmoid function :
    σ(z) = 1/(1+exp(-z))
    z = w*x + b (x is input)

3.1 Cost Function
We need a way to present the difference between the output of our neurons(y(x)) and the result in the training data(a). Base on the physics experiment or statistics experience, we would use Mean quadratic error(MSE) first. Given n is the total No. of training data, the cost function is as below :
在这里插入图片描述

3.2 Gradient decent
Within the cost function, our target is to find the proper w and b to make it minimum. The cost function’s chart as below, we can find small steps towards the bottom of the mountain. This is gradient decent.
This is a good link to learn what is gradient decent : https://www.jianshu.com/p/c7e642877b0e
在这里插入图片描述

Suppose we have only one parameter x in the cost function(Actually we have 2), the cost function is C(x), the gradient is C’(x).
we can move △v towards the bottom(C’(x) < 0), then the C(x) would change △C towards the bottom.
△C = |△vC’(x)|
Define △v = λ * C’(x)
We can have △C = λ
C’(x)C’(x)
λ is the learning rate.
Then each step we can move the ball’s position v by v -> v’ = v - λ * C’(x)
The cost would be C - C’ = C - λ
C’(x)*C’(x)

If there are multiple variables in the cost function, actually we need to calculate the second partial derivatives of C, ∂^2 C/∂w∂b, but it’s costly. So we would use the 1 level partial derivatives:
在这里插入图片描述
在这里插入图片描述

3.3 Stochastic gradient descent
Actually we have multiple training data set, and we need to calculate gradient ▽C for each training input separately, then average them. This is time costly when the training input is very large.
So we use Stochastic gradient descent to speed up the learning. The idea is to estimate the gradient ▽C by computing the average of ▽C(small) for a small sample of randomly chosen training inputs.

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值