1.矩阵的导数
\qquad
如果矩阵
A
(
t
)
=
[
a
i
j
(
t
)
]
m
×
n
\boldsymbol A(t)=[a_{ij}(t)]_{m\times n}
A(t)=[aij(t)]m×n 的每一个元素
a
i
j
(
t
)
a_{ij}(t)
aij(t) 都是变量
t
t
t 的可微函数,则称矩阵
A
(
t
)
\boldsymbol A(t)
A(t) 是可微的,其导数定义为:
d
A
(
t
)
d
t
=
[
d
a
i
j
(
t
)
d
t
]
m
×
n
=
[
d
a
11
(
t
)
d
t
d
a
12
(
t
)
d
t
⋯
d
a
1
n
(
t
)
d
t
d
a
21
(
t
)
d
t
d
a
22
(
t
)
d
t
⋯
d
a
2
n
(
t
)
d
t
⋮
⋮
⋯
⋮
d
a
m
1
(
t
)
d
t
d
a
m
2
(
t
)
d
t
⋯
d
a
m
n
(
t
)
d
t
]
\qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{ij}(t)}{\mathrm{d}t}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{11}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{12}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{1n}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{21}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{22}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{2n}(t)}{\mathrm{d}t} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\mathrm{d}a_{m1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{m2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{mn}(t)}{\mathrm{d}t} \\ \end{matrix}\right]
dtdA(t)=[dtdaij(t)]m×n=⎣
⎡dtda11(t)dtda21(t)⋮dtdam1(t)dtda12(t)dtda22(t)⋮dtdam2(t)⋯⋯⋯⋯dtda1n(t)dtda2n(t)⋮dtdamn(t)⎦
⎤
\qquad
- 当
m
=
1
m=1
m=1 时,矩阵
A
(
t
)
=
[
a
1
(
t
)
,
a
2
(
t
)
,
⋯
,
a
n
(
t
)
]
\boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_n(t)]
A(t)=[a1(t),a2(t),⋯,an(t)] 为(行)向量值函数
d A ( t ) d t = [ d a j ( t ) d t ] 1 × n = [ d a 1 ( t ) d t d a 2 ( t ) d t ⋯ d a n ( t ) d t ] 1 × n \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{j}(t)}{\mathrm{d}t}\right]_{1\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{n}(t)}{\mathrm{d}t} \\ \end{matrix}\right]_{1\times n} dtdA(t)=[dtdaj(t)]1×n=[dtda1(t)dtda2(t)⋯dtdan(t)]1×n
\qquad - 当
n
=
1
n=1
n=1 时,矩阵
A
(
t
)
=
[
a
1
(
t
)
,
a
2
(
t
)
,
⋯
,
a
m
(
t
)
]
T
\boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_m(t)]^T
A(t)=[a1(t),a2(t),⋯,am(t)]T 为(列)向量值函数
d A ( t ) d t = [ d a i ( t ) d t ] m × 1 = [ d a 1 ( t ) d t d a 2 ( t ) d t ⋮ d a m ( t ) d t ] m × 1 \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{i}(t)}{\mathrm{d}t}\right]_{m\times 1}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} \\ \\ \vdots\\ \\ \dfrac{\mathrm{d}a_{m}(t)}{\mathrm{d}t}\\ \end{matrix}\right]_{m\times 1} dtdA(t)=[dtdai(t)]m×1=⎣ ⎡dtda1(t)dtda2(t)⋮dtdam(t)⎦ ⎤m×1
\qquad
2.多元函数对矩阵的导数
\qquad
设矩阵
X
=
[
x
i
j
]
m
×
n
\bold X=[x_{ij}]_{m\times n}
X=[xij]m×n,考虑该矩阵的
m
n
mn
mn 元函数
f
(
X
)
=
f
(
x
11
,
x
12
,
⋯
,
x
m
1
,
x
m
2
,
⋯
,
x
m
n
)
f(\bold X)=f(x_{11},x_{12},\cdots,x_{m1},x_{m2},\cdots,x_{mn})
f(X)=f(x11,x12,⋯,xm1,xm2,⋯,xmn), 那么
f
(
X
)
f(\bold X)
f(X) 对矩阵
X
\bold X
X 的导数定义为:
d
f
(
X
)
d
X
=
[
∂
f
∂
x
i
j
]
m
×
n
=
[
∂
f
∂
x
11
∂
f
∂
x
12
⋯
∂
f
∂
x
1
n
∂
f
∂
x
21
∂
f
∂
x
22
⋯
∂
f
∂
x
2
n
⋮
⋮
⋯
⋮
∂
f
∂
x
m
1
∂
f
∂
x
m
2
⋯
∂
f
∂
x
m
n
]
\qquad\qquad \dfrac{\mathrm{d}f(\bold X)}{\mathrm{d}\bold X}=\left[\dfrac{\partial f}{\partial x_{ij}}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\partial f}{\partial x_{11}} & \dfrac{\partial f}{\partial x_{12}} & \cdots & \dfrac{\partial f}{\partial x_{1n}} \\ \\ \dfrac{\partial f}{\partial x_{21}} & \dfrac{\partial f}{\partial x_{22}} & \cdots & \dfrac{\partial f}{\partial x_{2n}} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\partial f}{\partial x_{m1}} & \dfrac{\partial f}{\partial x_{m2}} & \cdots & \dfrac{\partial f}{\partial x_{mn}} \\ \end{matrix}\right]
dXdf(X)=[∂xij∂f]m×n=⎣
⎡∂x11∂f∂x21∂f⋮∂xm1∂f∂x12∂f∂x22∂f⋮∂xm2∂f⋯⋯⋯⋯∂x1n∂f∂x2n∂f⋮∂xmn∂f⎦
⎤
\qquad
3.多元函数对(列)向量的导数
\qquad
设
n
n
n 维(列)向量
x
=
[
x
1
,
x
2
,
⋯
,
x
n
]
T
\boldsymbol x=[x_1,x_2,\cdots,x_n]^T
x=[x1,x2,⋯,xn]T,考虑该向量的
n
n
n 元函数
f
(
x
)
=
f
(
x
1
,
x
2
,
⋯
,
x
n
)
f(\boldsymbol x)=f(x_{1},x_{2},\cdots,x_{n})
f(x)=f(x1,x2,⋯,xn),那么:
d
f
(
x
)
d
x
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
]
T
=
[
∂
f
∂
x
1
∂
f
∂
x
2
⋮
∂
f
∂
x
n
]
\qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right]^T=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \dfrac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]
dxdf(x)=[∂x1∂f,∂x2∂f,⋯,∂xn∂f]T=⎣
⎡∂x1∂f∂x2∂f⋮∂xn∂f⎦
⎤,即:
f
(
x
)
f(\boldsymbol x)
f(x) 的梯度
∇
f
(
x
)
=
d
f
(
x
)
d
x
\nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}
∇f(x)=dxdf(x)
d
f
(
x
)
d
x
T
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
]
\qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right]
dxTdf(x)=[∂x1∂f,∂x2∂f,⋯,∂xn∂f],即:
f
(
x
)
f(\boldsymbol x)
f(x) 的梯度的转置
∇
T
f
(
x
)
=
d
f
(
x
)
d
x
T
\nabla^T f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}
∇Tf(x)=dxTdf(x)
\qquad
\qquad
因此
∇
f
(
x
)
=
d
f
(
x
)
d
x
=
[
d
f
(
x
)
d
x
T
]
T
\qquad\nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right]^T
∇f(x)=dxdf(x)=[dxTdf(x)]T
\qquad
常用公式
( 1 ) \qquad(1) (1) 海塞 (Hessian) \text{(Hessian)} (Hessian) 矩阵:
\qquad ∇ T { ∇ f ( x ) } = d d x T ( d f ( x ) d x ) \nabla^T \{\nabla f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}\right) ∇T{∇f(x)}=dxTd(dxdf(x)) 或 ∇ { ∇ T f ( x ) } = d d x ( d f ( x ) d x T ) \nabla \{\nabla^T f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right) ∇{∇Tf(x)}=dxd(dxTdf(x))
\qquad
d
d
x
T
(
d
f
d
x
)
=
[
∂
2
f
∂
x
1
2
∂
2
f
∂
x
1
∂
x
2
⋯
∂
2
f
∂
x
1
∂
x
n
∂
2
f
∂
x
2
∂
x
1
∂
2
f
∂
x
2
2
⋯
∂
2
f
∂
x
2
∂
x
n
⋮
⋮
⋱
⋮
∂
2
f
∂
x
n
∂
x
1
∂
2
f
∂
x
n
∂
x
2
⋯
∂
2
f
∂
x
n
2
]
\qquad\qquad\qquad \dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right)=\left[\begin{matrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\partial x_n} \\ \\ \dfrac{\partial^2 f}{\partial x_2\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\partial x_n} \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \dfrac{\partial^2 f}{\partial x_n\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \\ \end{matrix}\right]
dxTd(dxdf)=⎣
⎡∂x12∂2f∂x2∂x1∂2f⋮∂xn∂x1∂2f∂x1∂x2∂2f∂x22∂2f⋮∂xn∂x2∂2f⋯⋯⋱⋯∂x1∂xn∂2f∂x2∂xn∂2f⋮∂xn2∂2f⎦
⎤
\qquad
( 2 ) \qquad(2) (2) 二次函数 f ( x ) = x T A x f(\boldsymbol x)=\boldsymbol x^T \boldsymbol A \boldsymbol x f(x)=xTAx 的导数为 d f ( x ) d x = ( A + A T ) x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=(\boldsymbol A+\boldsymbol A^T )\boldsymbol x dxdf(x)=(A+AT)x
\quad 若 A = [ a i j ] n × n \boldsymbol A=[a_{ij}]_{n\times n} A=[aij]n×n 为对称矩阵,那么 d f ( x ) d x = 2 A x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=2\boldsymbol A \boldsymbol x dxdf(x)=2Ax
\qquad
证明:
f
(
x
)
=
x
T
A
x
=
∑
i
=
1
n
∑
j
=
1
n
a
i
j
x
i
x
j
=
x
1
∑
j
=
1
n
a
1
j
x
j
+
x
2
∑
j
=
1
n
a
2
j
x
j
+
⋯
+
x
k
∑
j
=
1
n
a
k
j
x
j
+
⋯
+
x
n
∑
j
=
1
n
a
n
j
x
j
\qquad\qquad\qquad \begin{aligned}f(\boldsymbol x)&=\boldsymbol x^T \boldsymbol A \boldsymbol x=\displaystyle\sum_{i=1}^{n}\displaystyle\sum_{j=1}^{n}a_{ij}x_ix_j \\ &=x_1\displaystyle\sum_{j=1}^{n}a_{1j}x_j +x_2\displaystyle\sum_{j=1}^{n}a_{2j}x_j+\cdots +x_k\displaystyle\sum_{j=1}^{n}a_{kj}x_j+\cdots+x_n\displaystyle\sum_{j=1}^{n}a_{nj}x_j \\ \end{aligned}
f(x)=xTAx=i=1∑nj=1∑naijxixj=x1j=1∑na1jxj+x2j=1∑na2jxj+⋯+xkj=1∑nakjxj+⋯+xnj=1∑nanjxj
∂ f ∂ x k = x 1 a 1 k + x 2 a 2 k + ⋯ + ( ∑ j = 1 n a k j x j + x k a k k ) + ⋯ + x n a n k = ( x 1 a 1 k + x 2 a 2 k + ⋯ + x k a k k + ⋯ + x n a n k ) + ∑ j = 1 n a k j x j = ∑ i = 1 n a i k x i + ∑ j = 1 n a k j x j \qquad\qquad\qquad \begin{aligned}\dfrac{\partial f}{\partial x_k}&=x_1a_{1k}+x_2a_{2k}+\cdots+\left(\displaystyle\sum_{j=1}^{n}a_{kj}x_j+x_ka_{kk}\right)+\cdots+x_na_{nk}\\ &=(x_1a_{1k}+x_2a_{2k}+\cdots+x_ka_{kk}+\cdots+x_na_{nk}) +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \\ &=\displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \end{aligned} ∂xk∂f=x1a1k+x2a2k+⋯+(j=1∑nakjxj+xkakk)+⋯+xnank=(x1a1k+x2a2k+⋯+xkakk+⋯+xnank)+j=1∑nakjxj=i=1∑naikxi+j=1∑nakjxj
d
f
(
x
)
d
x
=
[
∂
f
∂
x
1
⋮
∂
f
∂
x
k
⋮
∂
f
∂
x
n
]
=
[
∑
i
=
1
n
a
i
1
x
i
+
∑
j
=
1
n
a
1
j
x
j
⋮
∑
i
=
1
n
a
i
k
x
i
+
∑
j
=
1
n
a
k
j
x
j
⋮
∑
i
=
1
n
a
i
n
x
i
+
∑
j
=
1
n
a
n
j
x
j
]
=
[
∑
i
=
1
n
a
i
1
x
i
⋮
∑
i
=
1
n
a
i
k
x
i
⋮
∑
i
=
1
n
a
i
n
x
i
]
+
[
∑
j
=
1
n
a
1
j
x
j
⋮
∑
j
=
1
n
a
k
j
x
j
⋮
∑
j
=
1
n
a
n
j
x
j
]
=
A
x
+
A
T
x
=
(
A
+
A
T
)
x
\qquad\qquad\qquad\begin{aligned} \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}&=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i +\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i +\displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i \end{matrix}\right]+\left[\begin{matrix}\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right] \\ &=\boldsymbol A\boldsymbol x+\boldsymbol A^T\boldsymbol x \\ &=(\boldsymbol A +\boldsymbol A^T)\boldsymbol x \\ \end{aligned}
dxdf(x)=⎣
⎡∂x1∂f⋮∂xk∂f⋮∂xn∂f⎦
⎤=⎣
⎡i=1∑nai1xi+j=1∑na1jxj⋮i=1∑naikxi+j=1∑nakjxj⋮i=1∑nainxi+j=1∑nanjxj⎦
⎤=⎣
⎡i=1∑nai1xi⋮i=1∑naikxi⋮i=1∑nainxi⎦
⎤+⎣
⎡j=1∑na1jxj⋮j=1∑nakjxj⋮j=1∑nanjxj⎦
⎤=Ax+ATx=(A+AT)x
\qquad
( 3 ) \qquad(3) (3) 线性函数 f ( x ) = b T x f(\boldsymbol x)=\boldsymbol b^T \boldsymbol x f(x)=bTx 的导数为 d f ( x ) d x = b \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\boldsymbol b dxdf(x)=b,或者 d f ( x ) d x T = b T \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}=\boldsymbol b^T dxTdf(x)=bT
\quad 若假设 b \boldsymbol b b 为变量,由于 b T x = x T b \boldsymbol b^T \boldsymbol x= \boldsymbol x^T \boldsymbol b bTx=xTb,因此 d f ( b ) d b = x \dfrac{\mathrm{d}f(\boldsymbol b)}{\mathrm{d}\boldsymbol b}=\boldsymbol x dbdf(b)=x
\qquad 证明: f ( x ) = b T x = ∑ i = 1 n b i x i f(\boldsymbol x) =\boldsymbol b^T \boldsymbol x=\displaystyle\sum_{i=1}^{n}b_ix_i f(x)=bTx=i=1∑nbixi
d
f
(
x
)
d
x
=
[
∂
f
∂
x
1
⋮
∂
f
∂
x
k
⋮
∂
f
∂
x
n
]
=
[
b
1
⋮
b
k
⋮
b
n
]
=
b
\qquad\qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]= \left[\begin{matrix} b_1\\ \\ \vdots\\ \\ b_k\\ \\ \vdots\\ \\ b_n\end{matrix}\right]=\boldsymbol b
dxdf(x)=⎣
⎡∂x1∂f⋮∂xk∂f⋮∂xn∂f⎦
⎤=⎣
⎡b1⋮bk⋮bn⎦
⎤=b
\qquad
\qquad
4.一元函数关于向量的复合求导
\qquad 设向量值函数 x ( t ) = [ x 1 ( t ) , x 2 ( t ) , ⋯ , x n ( t ) ] T \boldsymbol x(t)=[x_1(t),x_2(t),\cdots,x_n(t)]^T x(t)=[x1(t),x2(t),⋯,xn(t)]T,考虑该向量函数的一元函数 f ( x ( t ) ) = f ( x 1 ( t ) , x 2 ( t ) , ⋯ , x n ( t ) ) f(\boldsymbol x(t))=f(x_1(t),x_2(t),\cdots,x_n(t)) f(x(t))=f(x1(t),x2(t),⋯,xn(t)),那么:
d f d t = [ d f d x ] T d x d t = d f d x T d x d t \qquad\qquad\dfrac{\mathrm{d}f}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t} dtdf=[dxdf]Tdtdx=dxTdfdtdx
\qquad 又由于 ∇ T f ( x ) = d f ( x ) d x T \nabla^T f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T} ∇Tf(x)=dxTdf(x),因此 d f d t = d f d x T d x d t = ∇ T f ( x ) d x d t \dfrac{\mathrm{d}f}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\nabla^T f(\boldsymbol x)\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t} dtdf=dxTdfdtdx=∇Tf(x)dtdx
\qquad 证明:
d
f
d
t
=
∂
f
∂
x
1
d
x
1
d
t
+
∂
f
∂
x
2
d
x
2
d
t
+
⋯
+
∂
f
∂
x
n
d
x
n
d
t
=
[
∂
f
∂
x
1
,
∂
f
∂
x
2
,
⋯
,
∂
f
∂
x
n
]
[
d
x
1
d
t
d
x
2
d
t
⋮
d
x
n
d
t
]
=
[
d
f
d
x
]
T
d
x
d
t
=
d
f
d
x
T
d
x
d
t
\qquad\qquad \begin{aligned}\dfrac{\mathrm{d}f}{\mathrm{d}t}&=\dfrac{\partial f}{\partial x_1}\dfrac{\mathrm{d}x_1}{\mathrm{d}t}+\dfrac{\partial f}{\partial x_2}\dfrac{\mathrm{d}x_2}{\mathrm{d}t}+\cdots+\dfrac{\partial f}{\partial x_n}\dfrac{\mathrm{d}x_n}{\mathrm{d}t}\\ &=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right] \left[\begin{matrix}\dfrac{\mathrm{d} x_1}{\mathrm{d} t}\\ \\ \dfrac{\mathrm{d} x_2}{\mathrm{d} t}\\ \\ \vdots\\ \\ \dfrac{\mathrm{d} x_n}{\mathrm{d} t}\end{matrix}\right]=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}\\ \end{aligned}
dtdf=∂x1∂fdtdx1+∂x2∂fdtdx2+⋯+∂xn∂fdtdxn=[∂x1∂f,∂x2∂f,⋯,∂xn∂f]⎣
⎡dtdx1dtdx2⋮dtdxn⎦
⎤=[dxdf]Tdtdx=dxTdfdtdx
\qquad
5. 泰勒级数
\qquad 首先考虑二维的情况,即 x = [ x 1 , x 2 ] T \boldsymbol x=[x_1,x_2]^T x=[x1,x2]T,那么
f ( x 1 + δ 1 , x 2 + δ 2 ) = f ( x 1 , x 2 ) + ∂ f ∂ x 1 δ 1 + ∂ f ∂ x 2 δ 2 + 1 2 ( ∂ 2 f ∂ x 1 2 δ 1 2 + ∂ 2 f ∂ x 1 ∂ x 2 δ 1 δ 2 + ∂ 2 f ∂ x 2 2 δ 2 2 ) + o ( ∥ δ ∥ 2 ) \qquad\qquad\begin{aligned}f(x_1+\delta_1,x_2+\delta_2)&=f(x_1,x_2)+\dfrac{\partial f}{\partial x_1}\delta_1+\dfrac{\partial f}{\partial x_2}\delta_2\\ &\quad+\dfrac{1}{2}\left( \dfrac{\partial^2 f}{\partial x_1^2}\delta_1^2+\dfrac{\partial^2 f}{\partial x_1\partial x_2}\delta_1\delta_2+\dfrac{\partial^2 f}{\partial x_2^2}\delta_2^2 \right) \\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned} f(x1+δ1,x2+δ2)=f(x1,x2)+∂x1∂fδ1+∂x2∂fδ2+21(∂x12∂2fδ12+∂x1∂x2∂2fδ1δ2+∂x22∂2fδ22)+o(∥δ∥2)
\qquad 扩展到 n n n 维的情况,即 x = [ x 1 , x 2 , ⋯ , x n ] T \boldsymbol x=[x_1,x_2,\cdots,x_n]^T x=[x1,x2,⋯,xn]T,那么
f ( x 1 + δ 1 , x 2 + δ 2 , ⋯ , x n + δ n ) = f ( x 1 , x 2 , ⋯ , x n ) + ∑ i = 1 n ∂ f ∂ x i δ i + 1 2 ∑ i = 1 n ∑ j = 1 n ∂ 2 f ∂ x i ∂ x j δ i δ j + o ( ∥ δ ∥ 2 ) \qquad\qquad \begin{aligned}f(x_1+\delta_1,x_2+\delta_2,\cdots,x_n+\delta_n)&=f(x_1,x_2,\cdots,x_n)+\displaystyle\sum_{i=1}^n\dfrac{\partial f}{\partial x_i}\delta_i \\ &\quad+\dfrac{1}{2}\displaystyle\sum_{i=1}^n\displaystyle\sum_{j=1}^n\dfrac{\partial^2 f}{\partial x_i\partial x_j}\delta_i\delta_j\\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned} f(x1+δ1,x2+δ2,⋯,xn+δn)=f(x1,x2,⋯,xn)+i=1∑n∂xi∂fδi+21i=1∑nj=1∑n∂xi∂xj∂2fδiδj+o(∥δ∥2)
\qquad
\qquad
写成矩阵的形式:
f
(
x
+
δ
)
=
f
(
x
)
+
∇
f
(
x
)
T
δ
+
1
2
δ
T
∇
2
f
(
x
)
δ
+
o
(
∥
δ
∥
2
)
\qquad\qquad f(\boldsymbol x+\boldsymbol\delta)=f(\boldsymbol x)+\nabla f(\boldsymbol x)^T\boldsymbol\delta+\dfrac{1}{2}\boldsymbol\delta^T\nabla^2 f(\boldsymbol x)\boldsymbol\delta+o\left(\Vert\boldsymbol\delta\Vert^2\right)
f(x+δ)=f(x)+∇f(x)Tδ+21δT∇2f(x)δ+o(∥δ∥2),其中
δ
=
[
δ
1
,
δ
2
,
⋯
,
δ
n
]
T
\boldsymbol\delta=[\delta_1,\delta_2,\cdots,\delta_n]^T
δ=[δ1,δ2,⋯,δn]T
\qquad
\qquad
或者,写成向量值函数
f
(
x
)
f(\boldsymbol x)
f(x) 在点
x
ˉ
\bar{\boldsymbol x}
xˉ 的展开形式:
f
(
x
)
=
f
(
x
ˉ
)
+
∇
f
(
x
ˉ
)
T
(
x
−
x
ˉ
)
+
1
2
(
x
−
x
ˉ
)
T
∇
2
f
(
x
ˉ
)
(
x
−
x
ˉ
)
+
o
(
∥
x
−
x
ˉ
∥
2
)
\qquad\qquad f(\boldsymbol x)=f(\bar{\boldsymbol x})+\nabla f(\bar{\boldsymbol x})^T(\boldsymbol x-\bar{\boldsymbol x})+\dfrac{1}{2}(\boldsymbol x-\bar{\boldsymbol x})^T\nabla^2 f(\bar{\boldsymbol x})(\boldsymbol x-\bar{\boldsymbol x})+o\left(\Vert\boldsymbol x-\bar{\boldsymbol x}\Vert^2\right)
f(x)=f(xˉ)+∇f(xˉ)T(x−xˉ)+21(x−xˉ)T∇2f(xˉ)(x−xˉ)+o(∥x−xˉ∥2)
\qquad 【注】此处采用 ∇ f ( x ) \nabla f(\boldsymbol x) ∇f(x) 表示梯度,采用 ∇ 2 f ( x ) \nabla^2 f(\boldsymbol x) ∇2f(x) 表示 hessian \text{hessian} hessian 矩阵(而非 PDE \text{PDE} PDE 中的拉普拉斯算符)。