#This was a lecture of Andrew NG on Couresera website, Machine Learning of Stanford University.#
#This was a note noted by WONG Zinhoo, Reproduced please specify the Source and the original link#
Lecture 2 Gradient descent
Algorithm Ideal
We already knew that the
function described the distance between the line and the points in different
condition.
We want find out the best line which conform the samples best, to accomplish this aim we just need to make the J function minimum.
This is the basic methods:
As pic<1> show, the red parts are the Maxima peaks and the Minimum is in the valley.
Pic<1>
Gradient descent algorithm
All the
should be simultaneous updated, that means that we changed the
group by group rather than by single.
Function analysis:
when the original
is in the slope:
Theorem: the gradient’s direction is same to the normal direction, and the length of the gradient equal to the rate of the change.
Base on the theorem of gradient the slope steeper the length of gradient much longer; and the value of
become smaller.
With this process the new
become smaller:
Because
the value of gradient
∝ θ
So gradient become smaller:
Repeat this process, the
will go to the local bottom (local minimum).
More simple condition: the single
θ
condition:
As we approach a local minimum, gradient descent will automatically take smaller steps. So, no need to decrease α over time:
------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Email :
zihaowang@live.cn
Facebook:
http://www.facebook.com/zinhoowong
twitter:
https://twitter.com/WongZinhoo