21Fall\ 单变量线性回归

最新推荐文章于 2024-09-14 19:18:15 发布

Bealliant

最新推荐文章于 2024-09-14 19:18:15 发布

阅读量73

点赞数

分类专栏： 21Winter 大一上文章标签：机器学习

本文链接：https://blog.csdn.net/qq_42198383/article/details/121995186

版权

21Winter 大一上专栏收录该内容

10 篇文章 1 订阅

订阅专栏

Notes taken in the course Machine Learning by Andrew Ng.

intuition about Gradient Descent: How the algorithm works & why the updating step makes sense

to know how the formula works, again we reduce the original problem to a simplified problem with only one parameter.

𝑎0≔𝑎0− 𝛼𝑑𝐽(𝑎0)𝑑𝑎0 (𝑗=0)

𝑑𝐽(𝑎0)𝑑𝑎0 means the partial derivative of one point, and has its geometrical meaning - the slope of the tangent to the point.

If alpha is too small, gradient descent may be low; while alpha is too large, it may overshoot the minimum, making it fail to converge, or may diverge.

If you have already reached the local optimum, the derivative term will be 0 so you won't take steps any more.

Remember the magnitude of the step you take is both related to the Learning Rate and the derivative value of the last point. So when you are stepping closer to the minimum point, you will automatically take smaller steps.

Finally, if we put the Cost Function and Gradient Descent together, we will accomplish the first Learning Algorithm - Linear Regression.

that is finally how we realized the algorithm.

Now we look back to the original problem.

In a one-parameter function f(x_1), the graph of function f is just a 2-D curve. In a two-parameter function, however, the graph of the function is a 3-D curved surface with three axes - we call it axis x, axis y and axis z.

we assume that the coordinate of the point P is (𝑥0,𝑦0,𝑧0) and the vertical axis, z represents the value of the function J(x,y). So it's not too hard to understand the meaning of the partial derivative. The derivative to x is in the (x,𝑦0,z) surface, and the derivative to y is in the (𝑥0,𝑦,𝑧) surface.

In this 3-D surface, you can imagine you are going down a real hill and should decide which direction to go.

The 𝒂𝟏𝒂𝒏𝒅 𝒂𝟐 should update every time so that you find the right point (𝒂𝟏 , 𝒂𝟐) in the horizontal surface corresponding to the minimum J.