Gradient and Directional Derivatives--How CNN Learn

橘猫小八的鱼

已于 2022-04-06 11:53:10 修改

阅读量804

点赞数

文章标签： c语言

于 2022-04-05 21:42:40 首次发布

本文链接：https://blog.csdn.net/weixin_44890472/article/details/123969410

版权

笔记专栏收录该内容

13 篇文章 0 订阅

订阅专栏

在阅读这篇文章之前，先阅读这一篇文章：
Partial Derivatives and Vector Fields

1. Gradient

The gradient of a scalar-valued multivariable function $\dots)$ , denoted $\nabla{f}$ , packages all its partial derivative information into a vector:
$\nabla{f}=\begin{bmatrix} \\\frac{\partial f}{\partial x} \\ \\\frac{\partial f}{\partial y} \\\dots \end{bmatrix}$

In particular, this means $\nabla {f}$ , $f$ is a vector-valued function.

If you imagine standing at a point $(x_{0}, y_{0}, \dots)$ in the input space of $f$ , the vector $\nabla f$ tells you which direction you should travel to increase the value of $f$ most rapidly.
These gradient vectors $\nabla f$ are also perpendicular to the Contour lines of $f$ .

In the case of scalar-valued multivariable functions, those with a multidimensional input but a one-dimensional output, the full derivative of such a function is the gradient.

Credit To: The gradient

2. Directional Derivatives

Consider some multivariable function:

$f(x,y)=x^{2}−xy$

We know that the partial derivatives with respect to $x$ and $y$ tell us the rate of change of $f$ as we nudge the input either in the $x$ or $y$ direction.

The question now is what happens when we nudge the input of $f$ in a direction which is not parallel to the $x$ or $y$ axes.

For example, the image below shows the graph of $f$ along with a small step along a vector $\overrightarrow{v}$ in the input space, meaning the $x y$ -plane in this case. Is there an operation which tells us how the height of the graph above the tip of $\overrightarrow{v}$ compares to the height of the graph above its tail?

请添加图片描述

As you have probably guessed, there is a new type of derivative, called the directional derivative, which answers this question.

Just as the partial derivative is taken with respect to some input variable—e.g., $x$ or $y$ .
//
The directional derivative is taken along some vector $\overrightarrow{v}$ in the input space.

One very helpful way to think about this is to picture a point in the input space moving with velocity(速度) $\overrightarrow{v}$ .
The directional derivative of $f$ along $\overrightarrow{v}$ is:
The directional derivative of $f$ along $\overrightarrow{v}$ is:
The directional derivative of $f$ along $\overrightarrow{v}$ is : 函数输出的结果变化率。

3. Compute the Directional Derivative

Let’s say you have a multivariable $f (x, y, z)$ , which takes in three variables— $x$ , $y$ and $z$ —and you want to compute its directional derivative(函数输出的结果变化率) along the following vector:

$\overrightarrow{v}=\begin{bmatrix}2\\3\\1\end{bmatrix}$

The answer, as it turns out, is

$\nabla_{\overrightarrow{v}}{f}=2\frac{\partial f}{\partial x}+3\frac{\partial f}{\partial y}+(-1)\frac{\partial f}{\partial z}$

This should make sense because a tiny nudge along $\overrightarrow{v}$ can be broken down into two tiny nudges in the $x$ -direction, three tiny nudges in the $y$ -direction, and a tiny nudge backwards, by $- 1$ in the $z$ -direction.

More generally, we can write the vector $\overrightarrow{v}$ abstractly as follows:

$\overrightarrow{v}=\begin{bmatrix}v_{1}\\v_{2}\\v_{3}\end{bmatrix}$

The directional derivative looks like this:

$\nabla_{\overrightarrow{v}}{f}=v_{1}\frac{\partial f}{\partial x}+v_{2}\frac{\partial f}{\partial y}+v_{3}\frac{\partial f}{\partial z}$

This can be written in a super-pleasing compact way using the dot product and the gradient:

请添加图片描述

$\nabla_{\overrightarrow{v}}{f}=\nabla{f}\cdot{\overrightarrow{v}}$

Take a moment to delight in the fact that one single operation, the gradient, packs enough information to compute the rate of change of a function in every possible direction! That’s so many directions! Left, right, up, down, north-north-east, $34.8^\circ$ degrees clockwise from the $x$ -axis… Madness!

Credit To: Directional derivatives (introduction)

4. Difference Between the Gradient and the Directional Derivate

For those who are a little confused about the difference between the gradient and the directional derivate:

In the case given in the video. The gradient is a vector whose components are scalars, each representing the rate of change of the function along the standard unit vectors of whatever basis being used. (A lot of the time it’s the Cartesian plane and the unit basis vectors are $i, j$ and $k$ ).

The gradient only tells us how the function is changing with respect to the axes of our coordinate system. But it’s hardly the case that our mathematical interests lie solely on the axes of our coordinate system, therefore we need the directional derivative.

The directional derivative is a scalar value which represents the rate of change of the function along a direction which is typically NOT in the direction of one of the standard basis vectors.

In conclusion, if you want to find the derivative of a multi variable function along a vector V, then first you must find a unit vector in the direction of V, called u, and then take (∇f dot u). If u = < a, b > then (∇f dot u) = a*(df/dx) + b*(df/dy).

这个视频告诉了我们, CNN 是怎么通过 gradient decent 进行学习的，即 gradient decent 是怎么改变每个 weight 的，这个对应的模型是全部都是全连接层。

$\nabla{C}$ 就是 cost function 在点 $(1, 1)$ 处的 gradient, 它告诉我了我们哪个方向是cost function的output增加最快的方向，所以我们只要减去这个就行了。这样就能同时该改变 $x, y$ 的值，而且改变的程度不同，即x可能是增大，y减小。也可能是x减小，y增大。
在这里插入图片描述

在这里插入图片描述

看完上面的视频，下面这个更好，从细节上将CNN是怎么学习的！ What is backpropagation really doing?

橘猫小八的鱼

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
Gradient and Directional Derivatives--How CNN Learn

在阅读这篇文章之前，先阅读这一篇文章：Partial Derivatives and Vector Fields1. GradientThe gradient of a scalar-valued multivariable function f(x,y,… )f(x, y, \dots)f(x,y,…), denoted ∇f\nabla{f}∇f, packages all its partial derivative information into a vector:∇f=[∂f∂x∂f
复制链接

扫一扫