2.4.3. 梯度
我们可以连接一个多元函数对其所有变量的偏导数,以得到该函数的梯度(gradient)向量。 具体而言,设函数
f
:
R
n
→
R
f:\mathbb{R}^{n}\to\mathbb{R}
f:Rn→R的输入是一个
n
n
n维向量
x
⃗
=
[
x
1
x
2
⋅
⋅
⋅
x
n
]
\vec x=\begin{bmatrix} x_1\\x_2\\···\\x_n\end{bmatrix}
x=
x1x2⋅⋅⋅xn
,输出是一个标量。 函数
f
(
x
⃗
)
f(\vec x)
f(x)相对于
x
⃗
\vec x
x的梯度是一个包含
n
n
n个偏导数的向量:
∇
x
⃗
f
(
x
⃗
)
=
[
∂
f
(
x
⃗
)
∂
x
1
∂
f
(
x
⃗
)
∂
x
2
⋅
⋅
⋅
∂
f
(
x
⃗
)
∂
x
n
]
\nabla_{\vec x} f(\vec x) = \begin{bmatrix}\frac{\partial f(\vec x)}{\partial x_1}\\\frac{\partial f(\vec x)}{\partial x_2}\\···\\ \frac{\partial f(\vec x)}{\partial x_n}\end{bmatrix}
∇xf(x)=
∂x1∂f(x)∂x2∂f(x)⋅⋅⋅∂xn∂f(x)
其中
∇
x
⃗
f
(
x
⃗
)
\nabla_{\vec x} f(\vec x)
∇xf(x)通常在没有歧义时被
∇
f
(
x
⃗
)
\nabla f(\vec x)
∇f(x)取代。
假设 x ⃗ \vec x x为 n n n维向量,在微分多元函数时经常使用以下规则:
一、对于所有 A ∈ R m × n A \in \mathbb{R^{m\times n}} A∈Rm×n,都有 ∇ x ⃗ A x ⃗ = A ⊤ \nabla_{\vec x} A\vec x = A^\top ∇xAx=A⊤;
证明:设
A
(
m
,
n
)
A_{(m,n)}
A(m,n) =
[
a
1
,
1
a
1
,
2
⋅
⋅
⋅
a
1
,
n
a
2
,
1
a
2
,
2
⋅
⋅
⋅
a
2
,
n
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
m
,
1
a
m
,
2
⋅
⋅
⋅
a
m
,
n
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{m,1} & a_{m,2} &···&a_{m,n} \end{bmatrix}
a1,1a2,1⋅⋅⋅am,1a1,2a2,2⋅⋅⋅am,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅am,n
,
则
A
x
⃗
(
m
,
1
)
A\vec x_{(m,1)}
Ax(m,1) =
[
a
1
,
1
x
1
+
a
1
,
2
x
2
+
⋅
⋅
⋅
+
a
1
,
n
x
n
a
2
,
1
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
2
,
n
x
n
⋅
⋅
⋅
a
m
,
1
x
1
+
a
m
,
2
x
2
+
⋅
⋅
⋅
+
a
m
,
n
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n \\ a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n \\ ··· \\ a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n \end{bmatrix}
a1,1x1+a1,2x2+⋅⋅⋅+a1,nxna2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅am,1x1+am,2x2+⋅⋅⋅+am,nxn
,
∇
x
⃗
A
x
⃗
\nabla_{\vec x}A\vec x
∇xAx=
[
∂
A
x
⃗
∂
x
1
∂
A
x
⃗
∂
x
2
⋅
⋅
⋅
∂
A
x
⃗
∂
x
n
]
\begin{bmatrix}\frac{\partial A\vec x}{\partial x_1}\\\frac{\partial A\vec x}{\partial x_2}\\···\\ \frac{\partial A\vec x}{\partial x_n}\end{bmatrix}
∂x1∂Ax∂x2∂Ax⋅⋅⋅∂xn∂Ax
=
[
∂
a
1
,
1
x
1
+
a
1
,
2
x
2
+
⋅
⋅
⋅
+
a
1
,
n
x
n
∂
x
1
∂
a
2
,
1
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
2
,
n
x
n
∂
x
1
⋅
⋅
⋅
∂
a
m
,
1
x
1
+
a
m
,
2
x
2
+
⋅
⋅
⋅
+
a
m
,
n
x
n
∂
x
1
∂
a
1
,
1
x
1
+
a
1
,
2
x
2
+
⋅
⋅
⋅
+
a
1
,
n
x
n
∂
x
2
∂
a
2
,
1
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
2
,
n
x
n
∂
x
2
⋅
⋅
⋅
∂
a
m
,
1
x
1
+
a
m
,
2
x
2
+
⋅
⋅
⋅
+
a
m
,
n
x
n
∂
x
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
∂
a
1
,
1
x
1
+
a
1
,
2
x
2
+
⋅
⋅
⋅
+
a
1
,
n
x
n
∂
x
n
∂
a
2
,
1
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
2
,
n
x
n
∂
x
n
⋅
⋅
⋅
∂
a
m
,
1
x
1
+
a
m
,
2
x
2
+
⋅
⋅
⋅
+
a
m
,
n
x
n
∂
x
n
]
\begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_1}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_1}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_2}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_2}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{1,2}x_2+···+a_{1,n}x_n}{\partial x_n}& \frac{\partial a_{2,1}x_1+a_{2,2}x_2+···+a_{2,n}x_n}{\partial x_n}&···&\frac{\partial a_{m,1}x_1+a_{m,2}x_2+···+a_{m,n}x_n}{\partial x_n}\end{bmatrix}
∂x1∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x2∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn⋅⋅⋅∂xn∂a1,1x1+a1,2x2+⋅⋅⋅+a1,nxn∂x1∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn∂x2∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅∂xn∂a2,1x1+a2,2x2+⋅⋅⋅+a2,nxn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅∂x1∂am,1x1+am,2x2+⋅⋅⋅+am,nxn∂x2∂am,1x1+am,2x2+⋅⋅⋅+am,nxn⋅⋅⋅∂xn∂am,1x1+am,2x2+⋅⋅⋅+am,nxn
=
[
a
1
,
1
a
2
,
1
⋅
⋅
⋅
a
m
,
1
a
1
,
2
a
2
,
2
⋅
⋅
⋅
a
m
,
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
1
,
n
a
2
,
n
⋅
⋅
⋅
a
m
,
n
]
\begin{bmatrix} a_{1,1} & a_{2,1} & ··· & a_{m,1}\\ a_{1,2} & a_{2,2} & ··· & a_{m,2} \\ ···&···&···&··· \\ a_{1,n}&a_{2,n}&···&a_{m,n} \end{bmatrix}
a1,1a1,2⋅⋅⋅a1,na2,1a2,2⋅⋅⋅a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅am,1am,2⋅⋅⋅am,n
=
A
⊤
A^\top
A⊤
二、对于所有 A ∈ R n × m A \in \mathbb{R^{n\times m}} A∈Rn×m,都有 ∇ x ⃗ x ⃗ ⊤ A = A \nabla_{\vec x} \vec x^\top A = A ∇xx⊤A=A;
证明:设
A
(
n
,
m
)
A_{(n,m)}
A(n,m)=
[
a
1
,
1
a
1
,
2
⋅
⋅
⋅
a
1
,
m
a
2
,
1
a
2
,
2
⋅
⋅
⋅
a
2
,
m
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
n
,
1
a
n
,
2
⋅
⋅
⋅
a
n
,
m
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,m} \\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,m} \end{bmatrix}
a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m
,
则
x
⃗
⊤
A
\vec x^\top A
x⊤A=
[
a
1
,
1
x
1
+
a
2
,
1
x
2
+
⋅
⋅
⋅
+
a
n
,
1
x
n
a
1
,
2
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
n
,
2
x
n
⋅
⋅
⋅
a
1
,
m
x
1
+
a
2
,
m
x
2
+
⋅
⋅
⋅
+
a
n
,
m
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n \end{bmatrix}
[a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,mx1+a2,mx2+⋅⋅⋅+an,mxn],
∇
x
⃗
x
⃗
⊤
A
\nabla_{\vec x}\vec x^\top A
∇xx⊤A=
[
∂
x
⃗
⊤
A
∂
x
1
∂
x
⃗
⊤
A
∂
x
2
⋅
⋅
⋅
∂
x
⃗
⊤
A
∂
x
n
]
\begin{bmatrix}\frac{\partial \vec x^\top A}{\partial x_1}\\\frac{\partial \vec x^\top A}{\partial x_2}\\···\\ \frac{\partial \vec x^\top A}{\partial x_n}\end{bmatrix}
∂x1∂x⊤A∂x2∂x⊤A⋅⋅⋅∂xn∂x⊤A
=
[
∂
a
1
,
1
x
1
+
a
2
,
1
x
2
+
⋅
⋅
⋅
+
a
n
,
1
x
n
∂
x
1
∂
a
1
,
2
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
n
,
2
x
n
∂
x
1
⋅
⋅
⋅
∂
a
1
,
m
x
1
+
a
2
,
m
x
2
+
⋅
⋅
⋅
+
a
n
,
m
x
n
∂
x
1
∂
a
1
,
1
x
1
+
a
2
,
1
x
2
+
⋅
⋅
⋅
+
a
n
,
1
x
n
∂
x
2
∂
a
1
,
2
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
n
,
2
x
n
∂
x
2
⋅
⋅
⋅
∂
a
1
,
m
x
1
+
a
2
,
m
x
2
+
⋅
⋅
⋅
+
a
n
,
m
x
n
∂
x
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
∂
a
1
,
1
x
1
+
a
2
,
1
x
2
+
⋅
⋅
⋅
+
a
n
,
1
x
n
∂
x
n
∂
a
1
,
2
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
n
,
2
x
n
∂
x
n
⋅
⋅
⋅
∂
a
1
,
m
x
1
+
a
2
,
m
x
2
+
⋅
⋅
⋅
+
a
n
,
m
x
n
∂
x
n
]
\begin{bmatrix}\frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_1}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_1}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_1}\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_2}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_2}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_2}\\ ···&···&···&···\\ \frac{\partial a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n}{\partial x_n}& \frac{\partial a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n}{\partial x_n}&···&\frac{\partial a_{1,m}x_1+a_{2,m}x_2+···+a_{n,m}x_n}{\partial x_n}\end{bmatrix}
∂x1∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn∂x2∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn⋅⋅⋅∂xn∂a1,1x1+a2,1x2+⋅⋅⋅+an,1xn∂x1∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn∂x2∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅∂xn∂a1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅∂x1∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn∂x2∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn⋅⋅⋅∂xn∂a1,mx1+a2,mx2+⋅⋅⋅+an,mxn
=
[
a
1
,
1
a
1
,
2
⋅
⋅
⋅
a
1
,
m
a
2
,
1
a
2
,
2
⋅
⋅
⋅
a
2
,
m
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
n
,
1
a
n
,
2
⋅
⋅
⋅
a
n
,
m
]
\begin{bmatrix} a_{1,1} & a_{1,2}&···&a_{1,m}\\ a_{2,1}&a_{2,2}&···&a_{2,m} \\ ···&···&···&···\\ a_{n,1}&a_{n,2}&···&a_{n,m} \end{bmatrix}
a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,ma2,m⋅⋅⋅an,m
=
A
A
A
三、对于所有 A ∈ R n × n A \in \mathbb{R^{n\times n}} A∈Rn×n,都有 ∇ x ⃗ x ⃗ ⊤ A x ⃗ = ( A + A ⊤ ) x ⃗ \nabla_{\vec x} \vec x^\top A \vec x = (A+A^\top)\vec x ∇xx⊤Ax=(A+A⊤)x;
证明:设
A
(
n
,
n
)
A_{(n,n)}
A(n,n)=
[
a
1
,
1
a
1
,
2
⋅
⋅
⋅
a
1
,
n
a
2
,
1
a
2
,
2
⋅
⋅
⋅
a
2
,
n
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
n
,
1
a
n
,
2
⋅
⋅
⋅
a
n
,
n
]
\begin{bmatrix} a_{1,1}&a_{1,2}&···&a_{1,n} \\ a_{2,1}&a_{2,2}&···&a_{2,n} \\ ··· & ··· & ··· & ··· \\ a_{n,1} & a_{n,2} &···&a_{n,n} \end{bmatrix}
a1,1a2,1⋅⋅⋅an,1a1,2a2,2⋅⋅⋅an,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,na2,n⋅⋅⋅an,n
,
则
x
⃗
⊤
A
\vec x^\top A
x⊤A=
[
a
1
,
1
x
1
+
a
2
,
1
x
2
+
⋅
⋅
⋅
+
a
n
,
1
x
n
a
1
,
2
x
1
+
a
2
,
2
x
2
+
⋅
⋅
⋅
+
a
n
,
2
x
n
⋅
⋅
⋅
a
1
,
n
x
1
+
a
2
,
n
x
2
+
⋅
⋅
⋅
+
a
n
,
n
x
n
]
\begin{bmatrix} a_{1,1}x_1+a_{2,1}x_2+···+a_{n,1}x_n & a_{1,2}x_1+a_{2,2}x_2+···+a_{n,2}x_n & ···&a_{1,n}x_1+a_{2,n}x_2+···+a_{n,n}x_n \end{bmatrix}
[a1,1x1+a2,1x2+⋅⋅⋅+an,1xna1,2x1+a2,2x2+⋅⋅⋅+an,2xn⋅⋅⋅a1,nx1+a2,nx2+⋅⋅⋅+an,nxn],
x
⃗
⊤
A
x
⃗
\vec x^\top A \vec x
x⊤Ax=
[
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
]
\begin{bmatrix} \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j) \end{bmatrix}
[i=1∑nj=1∑n(ai,jxixj)],
∇
x
⃗
x
⃗
⊤
A
x
⃗
\nabla_{\vec x}\vec x^\top A \vec x
∇xx⊤Ax=
[
∂
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
∂
x
1
∂
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
∂
x
2
⋅
⋅
⋅
∂
∑
i
=
1
n
∑
j
=
1
n
(
a
i
,
j
x
i
x
j
)
∂
x
n
]
\begin{bmatrix} \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_1} \\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_2} \\ ···\\ \frac{\partial \sum\limits_{i=1}^{n}\sum\limits_{j=1}^{n} (a_{i,j}x_ix_j)}{\partial x_n} \end{bmatrix}
∂x1∂i=1∑nj=1∑n(ai,jxixj)∂x2∂i=1∑nj=1∑n(ai,jxixj)⋅⋅⋅∂xn∂i=1∑nj=1∑n(ai,jxixj)
=
[
∑
i
=
1
n
(
a
i
,
1
+
a
1
,
i
)
x
i
∑
i
=
1
n
(
a
i
,
2
+
a
2
,
i
)
x
i
⋅
⋅
⋅
∑
i
=
1
n
(
a
i
,
n
+
a
n
,
i
)
x
i
]
\begin{bmatrix} \sum\limits_{i=1}^{n}(a_{i,1}+a_{1,i})x_i \\ \sum\limits_{i=1}^{n}(a_{i,2}+a_{2,i})x_i \\ ···\\ \sum\limits_{i=1}^{n}(a_{i,n}+a_{n,i})x_i \\ \end{bmatrix}
i=1∑n(ai,1+a1,i)xii=1∑n(ai,2+a2,i)xi⋅⋅⋅i=1∑n(ai,n+an,i)xi
=
[
2
a
1
,
1
a
1
,
2
+
a
2
,
1
⋅
⋅
⋅
a
1
,
n
+
a
n
,
1
a
2
,
1
+
a
1
,
2
2
a
2
,
2
⋅
⋅
⋅
a
2
,
n
+
a
n
,
2
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
a
n
,
1
+
a
1
,
n
a
n
,
2
+
a
2
,
n
⋅
⋅
⋅
2
a
n
,
n
]
[
x
1
x
2
⋅
⋅
⋅
x
n
]
\begin{bmatrix} 2a_{1,1} & a_{1,2}+a_{2,1} & ···&a_{1,n}+a_{n,1} \\ a_{2,1}+a_{1,2} & 2a_{2,2} & ···&a_{2,n}+a_{n,2} \\ ···&···&···&···\\ a_{n,1}+a_{1,n} & a_{n,2}+a_{2,n} & ···&2a_{n,n} \\ \end{bmatrix} \begin{bmatrix} x_1\\ x_2\\ ···\\ x_n \end{bmatrix}
2a1,1a2,1+a1,2⋅⋅⋅an,1+a1,na1,2+a2,12a2,2⋅⋅⋅an,2+a2,n⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅a1,n+an,1a2,n+an,2⋅⋅⋅2an,n
x1x2⋅⋅⋅xn
=
(
A
+
A
⊤
)
x
⃗
(A+A^\top)\vec x
(A+A⊤)x
四、 ∇ x ⃗ ∥ x ∥ 2 = ∇ x ⃗ x ⃗ ⊤ x ⃗ = 2 x ⃗ \nabla_{\vec x} \Vert x \Vert ^2=\nabla_{\vec x}\vec x^\top\vec x = 2\vec x ∇x∥x∥2=∇xx⊤x=2x。
证明:
∇
x
⃗
∥
x
∥
2
\nabla_{\vec x}\Vert x \Vert ^2
∇x∥x∥2=
∇
x
⃗
x
1
2
+
x
2
2
+
⋅
⋅
⋅
+
x
n
n
2
\nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2
∇xx12+x22+⋅⋅⋅+xnn2=
∇
x
⃗
x
1
2
+
x
2
2
+
⋅
⋅
⋅
+
x
n
n
\nabla_{\vec x}x_1^2+x_2^2+···+x_n^n
∇xx12+x22+⋅⋅⋅+xnn=
∇
x
⃗
x
⊤
x
\nabla_{\vec x}x^\top x
∇xx⊤x;
∇
x
⃗
∥
x
∥
2
\nabla_{\vec x}\Vert x \Vert ^2
∇x∥x∥2=
∇
x
⃗
x
1
2
+
x
2
2
+
⋅
⋅
⋅
+
x
n
n
2
\nabla_{\vec x}\sqrt{x_1^2+x_2^2+···+x_n^n}^2
∇xx12+x22+⋅⋅⋅+xnn2=
∇
x
⃗
x
1
2
+
x
2
2
+
⋅
⋅
⋅
+
x
n
n
\nabla_{\vec x}x_1^2+x_2^2+···+x_n^n
∇xx12+x22+⋅⋅⋅+xnn=
[
2
x
1
2
x
2
⋅
⋅
⋅
2
x
n
]
\begin{bmatrix} 2x_1\\ 2x_2\\ ···\\ 2x_n \end{bmatrix}
2x12x2⋅⋅⋅2xn
=
2
x
2x
2x
同样,对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X ∇X∥X∥F2=2X。正如我们之后将看到的,梯度对于设计深度学习中的优化算法有很大用处。
五、对于任何矩阵 X X X,都有 ∇ X ∥ X ∥ F 2 = 2 X \nabla_X \Vert X \Vert_F^2=2X ∇X∥X∥F2=2X
证明:设
X
X
X为
m
×
n
m\times n
m×n的矩阵,
X
=
[
x
1
,
1
x
1
,
2
⋅
⋅
⋅
x
1
,
n
x
2
,
1
x
2
,
2
⋅
⋅
⋅
x
2
,
n
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
x
m
,
1
x
m
,
2
⋅
⋅
⋅
x
m
,
n
]
X = \begin{bmatrix} x_{1,1}& x_{1,2}&···&x_{1,n}\\ x_{2,1}& x_{2,2}&···&x_{2,n}\\ ···&···&···&···\\ x_{m,1}& x_{m,2}&···&x_{m,n}\\ \end{bmatrix}
X=
x1,1x2,1⋅⋅⋅xm,1x1,2x2,2⋅⋅⋅xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅x1,nx2,n⋅⋅⋅xm,n
,
则
∥
X
∥
F
2
\Vert X \Vert_F^2
∥X∥F2=
∑
i
=
1
m
∑
j
=
1
n
x
i
,
j
2
2
\sqrt{\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2}^2
i=1∑mj=1∑nxi,j22=
∑
i
=
1
m
∑
j
=
1
n
x
i
,
j
2
\sum\limits_{i=1}^{m}\sum\limits_{j=1}^n x_{i,j}^2
i=1∑mj=1∑nxi,j2,
∇
X
∥
X
∥
F
2
\nabla_X \Vert X \Vert_F^2
∇X∥X∥F2=
[
2
x
1
,
1
2
x
1
,
2
⋅
⋅
⋅
2
x
1
,
n
2
x
2
,
1
2
x
2
,
2
⋅
⋅
⋅
2
x
2
,
n
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
⋅
2
x
m
,
1
2
x
m
,
2
⋅
⋅
⋅
2
x
m
,
n
]
\begin{bmatrix} 2x_{1,1}& 2x_{1,2}&···&2x_{1,n}\\ 2x_{2,1}& 2x_{2,2}&···&2x_{2,n}\\ ···&···&···&···\\ 2x_{m,1}& 2x_{m,2}&···&2x_{m,n}\\ \end{bmatrix}
2x1,12x2,1⋅⋅⋅2xm,12x1,22x2,2⋅⋅⋅2xm,2⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅⋅2x1,n2x2,n⋅⋅⋅2xm,n
=
2
X
2X
2X
初看公式时没看懂,所以自己推了一遍加深印象,以上内容为推导过程,有问题欢迎讨论