1、对梯度下降的很简单的解释:
--Suppose we have a function y=f(x),The derivativeof this function is denoted as f(x) or as dy/dx,it specifies how to scale a small change in the input in order to obtain the corresponding change in the output: f (x + ) ≈ f(x) + f(x).
个人理解:其实,对f(x)求导,得到的导数包含两个信息,一个是正负号,一个是缩放尺度即导数的绝对值,正负号代表着x增加一个很小的长度时,y值是增加(导数为正)还是减少(导数为负),而导数的绝对值表示y值增加了多少或者y值减少了多少。
2、什么是鞍点:
--Some criticalpoints are neither maxima nor minima. These are known as saddle points.
3、梯度与导数的联系: