Andrew Ng machine learning课程笔记--监督学习应用,梯度下降

Last lesson I taiked about supervised learning.And supervised learning was this machine-learning problem where I said we're going to tell the algorithm what the close right answer is for a number of examples.

 

Applied supervised learning to get a car to drive itself.The essential learning algorithm for this is something called gradient descent.

 

Autonomous driving (supervised learning)

 

In supervised learning ,this is what we're going to do.we're given a training set,and we are going to feed our training set comprising our M training example,so 47 training examples,into a learning algorithm.And our algorithm then has output function that is by tradition and for histotial reasons,which is usually denoted lower case alphabet H,and is called a hypothesis.

The hypothesis H maps from inputs X to outputs Y.

What we'll do,is minimize as a functin of the parameters of theta,the quantity  J of theta.

 

Search algorithm:we'll start with some value of my parameter vector theta.And then I'm goiing to keep changing  my parameter vector theta to reduce J of theta a llittle bit until we hopefully end up at the minimum with respect to theta of J of theta.

I'm going to take a small step in this direction of steepest descent,or the direction that the gradient turns out to be.And then you take a small step,and you end up at a new point shown there,and it would keep on.you can take another step,and you sort of keep going until up at a local minimum of this function,J of theta.TA一定会结束?If using a slightly different initial starting point,descent from that point,and if you take a steepest descent direction again,you'll finally stopped at a completely differert local optimum.this is another property of gradient descent.

Gradient descent:we'll going to take a repeatedly take a step in the direction of steepest descent,and it turns out that you can write that as,the theta I minus the partial derivative,with respect to theta I,J of theta.

This greek aiphabet alpha here is a parameter of the algorithm called the learning rate.(决定了 参数方向和步子)You decided what direction to take a step in,and so this parameter alpha controls .如果你这个参数,通常是手动设置的,如果值设置的过小,你会想着最陡峭方向下降的算法,每次只动一小步,这样它会花很长时间去收敛;如果值设的过大,你的算法可能会越过最小值,因为你的步子迈的太大了。37:47

This ordinary release squares,turn out to be a quadratic function.And so we’ll always have a nice bow shape,and only have one global minimum with no other local optima.So wen you run gradient descent,here are actually the contours of the function J.

能收敛到区域最小值。

Is the alpha changing every time?Because the step is not …这和值无关,因为这是梯度下降的一个性质,当你接近局部最小值,步子确实会越来越小,最终直到收敛。当你达到局部最小值,梯度也会变为0.

如何检测收敛?比如检验两次迭代,比较两次迭代结果。比如检验的值,或者你试图最小化的量,不再发生很大的变化,你也许就可以认为它收敛了。

 

梯度下降算法怎样环视一周,并且选取下降最陡峭的方向?当你站在山顶上,计算梯度的时候,梯度的方向就是下降最陡峭的方向。你永远不会向着相反的方向走,因为相反的方向是上升最陡峭的方向。当你求导的时候,偏导数的方向就是下降最快的方向。

Batch gradient descent:遍历整个样本集

Stotistic gradient descent(incremental gradient descent):像往常一样,你对所有i进行更新,对参数向量的所有第i个位置都按这个方式进行更新,。为了开始学习,你仅仅需要查看你的第一个训练样本,并且利用第一个训练样本进行更新,之后你需要使用第二个训练样本,执行下一次更新。这样,你调整参数会快得多。但是这个算法,不会精确地收敛到全局最小值。会向着全局最小值附近徘徊,可能会在全局最小值附近一直徘徊。通常的得到的结果很接近全局最小值。

 

 

线性最小二乘法的求系数方法。

 

 

 

 

 

 

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值