Gradient and Directional Derivatives--How CNN Learn

在阅读这篇文章之前,先阅读这一篇文章:
Partial Derivatives and Vector Fields

1. Gradient


The gradient of a scalar-valued multivariable function f ( x , y , …   ) f(x, y, \dots) f(x,y,), denoted ∇ f \nabla{f} f, packages all its partial derivative information into a vector:
∇ f = [ ∂ f ∂ x ∂ f ∂ y … ] \nabla{f}=\begin{bmatrix} \\\frac{\partial f}{\partial x} \\ \\\frac{\partial f}{\partial y} \\\dots \end{bmatrix} f=xfyf

In particular, this means ∇ f \nabla {f} f, f f f is a vector-valued function.

  • If you imagine standing at a point ( x 0 , y 0 , …   ) (x_{0}, y_{0}, \dots) (x0,y0,) in the input space of f f f, the vector ∇ f \nabla f f tells you which direction you should travel to increase the value of f f f most rapidly.

  • These gradient vectors ∇ f \nabla f f are also perpendicular to the Contour lines of f f f.

In the case of scalar-valued multivariable functions, those with a multidimensional input but a one-dimensional output, the full derivative of such a function is the gradient.

Credit To: The gradient

2. Directional Derivatives


Consider some multivariable function:

f ( x , y ) = x 2 − x y f(x,y)=x^{2}−xy f(x,y)=x2xy

We know that the partial derivatives with respect to x x x and y y y tell us the rate of change of f f f as we nudge the input either in the x x x or y y y direction.

The question now is what happens when we nudge the input of f f f in a direction which is not parallel to the x x x or y y y axes.

For example, the image below shows the graph of f f f along with a small step along a vector v → \overrightarrow{v} v in the input space, meaning the x y xy xy-plane in this case. Is there an operation which tells us how the height of the graph above the tip of v → \overrightarrow{v} v compares to the height of the graph above its tail?

请添加图片描述

As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.

Just as the partial derivative is taken with respect to some input variable—e.g., x x x or y y y.
//
The directional derivative is taken along some vector v → \overrightarrow{v} v in the input space.

One very helpful way to think about this is to picture a point in the input space moving with velocity(速度) v → \overrightarrow{v} v .
The directional derivative of f f f along v → \overrightarrow{v} v is:
The directional derivative of f f f along v → \overrightarrow{v} v is:
The directional derivative of f f f along v → \overrightarrow{v} v is : 函数输出的结果变化率。

3. Compute the Directional Derivative


Let’s say you have a multivariable f ( x , y , z ) f(x, y, z) f(x,y,z), which takes in three variables— x x x, y y y and z z z—and you want to compute its directional derivative(函数输出的结果变化率) along the following vector:

v → = [ 2 3 1 ] \overrightarrow{v}=\begin{bmatrix}2\\3\\1\end{bmatrix} v =231

The answer, as it turns out, is

∇ v → f = 2 ∂ f ∂ x + 3 ∂ f ∂ y + ( − 1 ) ∂ f ∂ z \nabla_{\overrightarrow{v}}{f}=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+(-1)\frac{\partial f}{\partial z} v f=2xf+3yf+(1)zf

This should make sense because a tiny nudge along v → \overrightarrow{v} v can be broken down into two tiny nudges in the x x x-direction, three tiny nudges in the y y y-direction, and a tiny nudge backwards, by − 1 -1 1 in the z z z-direction.

More generally, we can write the vector v → \overrightarrow{v} v abstractly as follows:

v → = [ v 1 v 2 v 3 ] \overrightarrow{v}=\begin{bmatrix}v_{1}\\v_{2}\\v_{3}\end{bmatrix} v =v1v2v3

The directional derivative looks like this:

∇ v → f = v 1 ∂ f ∂ x + v 2 ∂ f ∂ y + v 3 ∂ f ∂ z \nabla_{\overrightarrow{v}}{f}=v_{1}\frac{\partial f}{\partial x}+v_{2}\frac{\partial f}{\partial y}+v_{3}\frac{\partial f}{\partial z} v f=v1xf+v2yf+v3zf

This can be written in a super-pleasing compact way using the dot product and the gradient:

请添加图片描述

∇ v → f = ∇ f ⋅ v → \nabla_{\overrightarrow{v}}{f}=\nabla{f}\cdot{\overrightarrow{v}} v f=fv

Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That’s so many directions! Left, right, up, down, north-north-east, 34. 8 ∘ 34.8^\circ 34.8degrees clockwise from the x x x-axis… Madness!

Credit To: Directional derivatives (introduction)

4. Difference Between the Gradient and the Directional Derivate


For those who are a little confused about the difference between the gradient and the directional derivate:

In the case given in the video. The gradient is a vector whose components are scalars, each representing the rate of change of the function along the standard unit vectors of whatever basis being used. (A lot of the time it’s the Cartesian plane and the unit basis vectors are i , j i,j i,j and k k k).

The gradient only tells us how the function is changing with respect to the axes of our coordinate system. But it’s hardly the case that our mathematical interests lie solely on the axes of our coordinate system, therefore we need the directional derivative.

The directional derivative is a scalar value which represents the rate of change of the function along a direction which is typically NOT in the direction of one of the standard basis vectors.

In conclusion, if you want to find the derivative of a multi variable function along a vector V, then first you must find a unit vector in the direction of V, called u, and then take (∇f dot u). If u = < a, b > then (∇f dot u) = a*(df/dx) + b*(df/dy).


这个视频告诉了我们, CNN 是怎么通过 gradient decent 进行学习的,即 gradient decent 是怎么改变每个 weight 的,这个对应的模型是全部都是全连接层。

∇ C \nabla{C} C 就是 cost function 在点 ( 1 , 1 ) (1,1) (1,1)处的 gradient, 它告诉我了我们哪个方向是cost function的output增加最快的方向,所以我们只要减去这个就行了。这样就能同时该改变 x , y x,y x,y的值,而且改变的程度不同,即x可能是增大,y减小。也可能是x减小,y增大。
在这里插入图片描述

在这里插入图片描述

看完上面的视频,下面这个更好,从细节上将CNN是怎么学习的! What is backpropagation really doing?

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值