矩阵、向量求导法则,我们这里默认的是分母布局
一、基本定理
1.1 对元素求导
- 行向量
设
y
T
=
[
y
1
…
y
n
]
\mathbf{y}^T=[y_1 \dots y_n]
yT=[y1…yn] 是行向量,
x
x
x 是元素,则
∂
y
T
∂
x
=
[
∂
y
1
∂
x
⋯
∂
y
n
∂
x
]
\frac{\partial \mathbf{y}^{T}}{\partial x}=\left[\begin{array}{lll} \frac{\partial y_{1}}{\partial x} & \cdots & \frac{\partial y_{n}}{\partial x} \end{array}\right]
∂x∂yT=[∂x∂y1⋯∂x∂yn]
- 列向量
设 y = [ y 1 ⋮ y m ] \mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right] y= y1⋮ym 是 m m m 维列向量, x x x 是元素,则
∂ y ∂ x = [ ∂ y 1 ∂ x ⋮ ∂ y m ∂ x ] \frac{\partial \mathbf{y}}{\partial x}=\left[\begin{array}{c} \frac{\partial y_{1}}{\partial x} \\ \vdots \\ \frac{\partial y_{m}}{\partial x} \end{array}\right] ∂x∂y= ∂x∂y1⋮∂x∂ym
- 矩阵
设 Y = [ y 11 ⋯ y 1 n ⋮ ⋮ y m 1 ⋯ y m n ] Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right] Y= y11⋮ym1⋯⋯y1n⋮ymn 是 m × n m\times n m×n 矩阵, x x x 是元素,则
∂ Y ∂ x = [ ∂ y 11 ∂ x ⋯ ∂ y 1 n ∂ x ⋮ ∂ y m 1 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial Y}{\partial x}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial x} & \cdots & \frac{\partial y_{1 n}}{\partial x} \\ \vdots & & \\ \frac{\partial y_{m 1}}{\partial x} & \cdots & \frac{\partial y_{m n}}{\partial x} \end{array}\right] ∂x∂Y= ∂x∂y11⋮∂x∂ym1⋯⋯∂x∂y1n∂x∂ymn
1.2 对行向量求导
- 元素
设 y y y 是元素, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1⋯xq] 是 q q q 维行向量,则
∂ y ∂ x T = [ ∂ y ∂ x 1 ⋯ ∂ y ∂ x q ] \frac{\partial y}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial y}{\partial x_{1}} & \cdots & \frac{\partial y}{\partial x_{q}} \end{array}\right] ∂xT∂y=[∂x1∂y⋯∂xq∂y]
- 列向量
设 y = [ y 1 ⋮ y m ] \mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right] y= y1⋮ym 是 m m m 维列向量, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1⋯xq] 是 q q q 维行向量,则
∂ y ∂ x T = [ ∂ y 1 ∂ x 1 ⋯ ∂ y 1 ∂ x q ⋮ ∂ y m ∂ x 1 ⋯ ∂ y m ∂ x q ] \frac{\partial \mathbf{y}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{1}}{\partial x_{q}} \\ \vdots & & \\ \frac{\partial y_{m}}{\partial x_{1}} & \cdots & \frac{\partial y_{m}}{\partial x_{q}} \end{array}\right] ∂xT∂y= ∂x1∂y1⋮∂x1∂ym⋯⋯∂xq∂y1∂xq∂ym
- 行向量
设
y
T
=
[
y
1
…
y
n
]
\mathbf{y}^T=[y_1 \dots y_n]
yT=[y1…yn] 是
n
n
n 维行向量,
x
T
=
[
x
1
⋯
x
q
]
\mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right]
xT=[x1⋯xq] 是
q
q
q 维行向量,则
∂
y
T
∂
x
T
=
[
∂
y
T
∂
x
1
⋯
∂
y
T
∂
x
q
]
\frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial \mathbf{y}^{T}}{\partial x_{1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{q}} \end{array}\right]
∂xT∂yT=[∂x1∂yT⋯∂xq∂yT]
- 矩阵
设 Y = [ y 11 ⋯ y 1 n ⋮ ⋮ y m 1 ⋯ y m n ] Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right] Y= y11⋮ym1⋯⋯y1n⋮ymn 是 m × n m\times n m×n 矩阵, x T = [ x 1 ⋯ x q ] \mathbf{x}^{T}=\left[\begin{array}{lll} x_{1} & \cdots & x_{q} \end{array}\right] xT=[x1⋯xq] 是 q q q 维行向量,则
∂ Y ∂ x T = [ ∂ Y ∂ x 1 ⋯ ∂ Y ∂ x q ] \frac{\partial Y}{\partial \mathbf{x}^{T}}=\left[\begin{array}{lll} \frac{\partial Y}{\partial x_{1}} & \cdots & \frac{\partial Y}{\partial x_{q}} \end{array}\right] ∂xT∂Y=[∂x1∂Y⋯∂xq∂Y]
1.3 对列向量求导
- 元素
设
y
y
y 是元素,
x
=
[
x
1
⋮
x
p
]
\mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right]
x=
x1⋮xp
是
p
p
p 维列向量,则
∂
y
∂
x
=
[
∂
y
∂
x
1
⋮
∂
y
∂
x
p
]
\frac{\partial y}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y}{\partial x_{1}} \\ \vdots \\ \frac{\partial y}{\partial x_{p}} \end{array}\right]
∂x∂y=
∂x1∂y⋮∂xp∂y
- 行向量
设
y
T
=
[
y
1
…
y
n
]
\mathbf{y}^T=[y_1 \dots y_n]
yT=[y1…yn] 是
n
n
n 维行向量,
x
=
[
x
1
⋮
x
p
]
\mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right]
x=
x1⋮xp
是
p
p
p 维列向量,则
∂
y
T
∂
x
=
[
∂
y
1
∂
x
1
⋯
∂
y
n
∂
x
1
⋮
∂
y
1
∂
x
p
⋯
∂
y
n
∂
x
p
]
\frac{\partial \mathbf{y}^{T}}{\partial \mathbf{x}}=\left[\begin{array}{ccc} \frac{\partial y_{1}}{\partial x_{1}} & \cdots & \frac{\partial y_{n}}{\partial x_{1}} \\ \vdots & & \\ \frac{\partial y_{1}}{\partial x_{p}} & \cdots & \frac{\partial y_{n}}{\partial x_{p}} \end{array}\right]
∂x∂yT=
∂x1∂y1⋮∂xp∂y1⋯⋯∂x1∂yn∂xp∂yn
- 列向量
设
y
=
[
y
1
⋮
y
m
]
\mathbf{y}=\left[\begin{array}{c} y_{1} \\ \vdots \\ y_{m} \end{array}\right]
y=
y1⋮ym
是
m
m
m 维列向量,
x
=
[
x
1
⋮
x
p
]
\mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right]
x=
x1⋮xp
是
p
p
p 维列向量,则
∂
y
∂
x
=
[
∂
y
1
∂
x
⋮
∂
y
m
∂
x
]
\frac{\partial \mathbf{y}}{\partial \mathbf{x}}=\left[\begin{array}{c} \frac{\partial y_{1}}{\partial \mathbf{x}} \\ \vdots \\ \frac{\partial y_{m}}{\partial \mathbf{x}} \end{array}\right]
∂x∂y=
∂x∂y1⋮∂x∂ym
- 矩阵
设
Y
=
[
y
11
⋯
y
1
n
⋮
⋮
y
m
1
⋯
y
m
n
]
Y=\left[\begin{array}{ccc} y_{11} & \cdots & y_{1 n} \\ \vdots & & \vdots \\ y_{m 1} & \cdots & y_{m n} \end{array}\right]
Y=
y11⋮ym1⋯⋯y1n⋮ymn
是
m
×
n
m\times n
m×n 矩阵,
x
=
[
x
1
⋮
x
p
]
\mathbf{x}=\left[\begin{array}{c} x_{1} \\ \vdots \\ x_{p} \end{array}\right]
x=
x1⋮xp
是
p
p
p 维列向量,则
∂
Y
∂
x
=
[
∂
y
11
∂
x
⋯
∂
y
1
n
∂
x
⋮
⋮
∂
y
m
1
∂
x
⋯
∂
y
m
n
∂
x
]
\frac{\partial Y}{\partial \mathbf{x}}=\left[\begin{array}{ccc} \frac{\partial y_{11}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{1 n}}{\partial \mathbf{x}} \\ \vdots & & \vdots \\ \frac{\partial y_{m 1}}{\partial \mathbf{x}} & \cdots & \frac{\partial y_{m n}}{\partial \mathbf{x}} \end{array}\right]
∂x∂Y=
∂x∂y11⋮∂x∂ym1⋯⋯∂x∂y1n⋮∂x∂ymn
1.4 对矩阵求导
- 元素
设 y y y 是元素, X = [ x 11 ⋯ x 1 q ⋮ ⋮ x p 1 ⋯ y p q ] X=\left[\begin{array}{ccc} x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q} \end{array}\right] X= x11⋮xp1⋯⋯x1q⋮ypq 是 p × q p\times q p×q 维矩阵,则
∂ y ∂ X T = [ ∂ y ∂ x 11 ⋯ ∂ y ∂ x 1 q ⋮ ∂ y ∂ x p 1 ⋯ ∂ y ∂ x p q ] \frac{\partial y}{\partial X^T}=\left[\begin{array}{ccc} \frac{\partial y}{\partial x_{11}} & \cdots & \frac{\partial y}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial y}{\partial x_{p 1}} & \cdots & \frac{\partial y}{\partial x_{p q}} \end{array}\right] ∂XT∂y= ∂x11∂y⋮∂xp1∂y⋯⋯∂x1q∂y∂xpq∂y
- 行向量
设
y
T
=
[
y
1
…
y
n
]
\mathbf{y}^T=[y_1 \dots y_n]
yT=[y1…yn] 是
n
n
n 维行向量,
X
=
[
x
11
⋯
x
1
q
⋮
⋮
x
p
1
⋯
y
p
q
]
X=\left[\begin{array}{ccc} x_{11} & \cdots & x_{1 q} \\ \vdots & & \vdots \\ x_{p 1} & \cdots & y_{p q} \end{array}\right]
X=
x11⋮xp1⋯⋯x1q⋮ypq
是
p
×
q
p\times q
p×q 维矩阵,则
∂ y T ∂ X = [ ∂ y T ∂ x 11 ⋯ ∂ y T ∂ x 1 q ⋮ ∂ y T ∂ x p 1 ⋯ ∂ y T ∂ x p q ] \frac{\partial \mathbf{y}^{T}}{\partial X}=\left[\begin{array}{ccc} \frac{\partial \mathbf{y}^{T}}{\partial x_{11}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{1 q}} \\ \vdots & & \\ \frac{\partial \mathbf{y}^{T}}{\partial x_{p 1}} & \cdots & \frac{\partial \mathbf{y}^{T}}{\partial x_{p q}} \end{array}\right] ∂X∂yT= ∂x11∂yT⋮∂xp1∂yT⋯⋯∂x1q∂yT∂xpq∂yT
二、例题
这一部分的主要例题是学习了B站UP主 杂谈博士 的视频,大家有空可以去学习一下。
例1 设 x = ( ξ 1 , ξ 2 , ⋯ ξ n ) T \boldsymbol{x}=\left(\xi_{1}, \xi_{2}, \cdots \xi_{n}\right)^{T} x=(ξ1,ξ2,⋯ξn)T,n 元函数 f ( x ) f(\boldsymbol{x}) f(x),求 d f d x T \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}^{T}} dxTdf, d f d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}} dxdf,和 d 2 f d x 2 \frac{\mathrm{d}^{2} f}{\mathrm{~d} \boldsymbol{x}^{2}} dx2d2f。
解:根据定义有
d f d x T = ( ∂ f ∂ ξ 1 , ∂ f ∂ ξ 2 , ⋯ , ∂ f ∂ ξ n ) \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}^{T}}=\left(\frac{\partial f}{\partial \xi_{1}}, \frac{\partial f}{\partial \xi_{2}}, \cdots, \frac{\partial f}{\partial \xi_{n}}\right) dxTdf=(∂ξ1∂f,∂ξ2∂f,⋯,∂ξn∂f)
梯度
∇ f ( x ) = d f d x = ( ∂ f ∂ ξ 1 ⋮ ∂ f ∂ ξ n ) \nabla f(\boldsymbol{x})=\frac{\mathbf{d} f}{\mathbf{d} \boldsymbol{x}}=\left(\begin{array}{c} \frac{\partial f}{\partial \xi_{1}} \\ \vdots \\ \frac{\partial f}{\partial \xi_{n}} \end{array}\right) ∇f(x)=dxdf= ∂ξ1∂f⋮∂ξn∂f
Hessian阵:
H ( x ) = ∇ 2 f ( x ) = d 2 f d x 2 = ( ∂ 2 f ∂ ξ 1 2 ∂ 2 f ∂ ξ 1 ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ 1 ∂ ξ n ∂ 2 f ∂ ξ 2 ∂ ξ 1 ∂ 2 f ∂ ξ 2 2 ⋯ ∂ 2 f ∂ ξ 2 ∂ ξ n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ ξ n ∂ ξ 1 ∂ 2 f ∂ ξ n ∂ ξ 2 ⋯ ∂ 2 f ∂ ξ n 2 ) \boldsymbol{H}(\boldsymbol{x})=\nabla^{2} f(\boldsymbol{x})=\frac{\mathbf{d}^{2} f}{\mathbf{d} \boldsymbol{x}^{2}}=\left(\begin{array}{cccc} \frac{\partial^{2} f}{\partial \xi_{1}^{2}} & \frac{\partial^{2} f}{\partial \xi_{1} \partial \xi_{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{1} \partial \xi_{n}} \\ \frac{\partial^{2} f}{\partial \xi_{2} \partial \xi_{1}} & \frac{\partial^{2} f}{\partial \xi_{2}^{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{2} \partial \xi_{n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial^{2} f}{\partial \xi_{n} \partial \xi_{1}} & \frac{\partial^{2} f}{\partial \xi_{n} \partial \xi_{2}} & \cdots & \frac{\partial^{2} f}{\partial \xi_{n}^{2}} \end{array}\right) H(x)=∇2f(x)=dx2d2f= ∂ξ12∂2f∂ξ2∂ξ1∂2f⋮∂ξn∂ξ1∂2f∂ξ1∂ξ2∂2f∂ξ22∂2f⋮∂ξn∂ξ2∂2f⋯⋯⋱⋯∂ξ1∂ξn∂2f∂ξ2∂ξn∂2f⋮∂ξn2∂2f
例2 设 A = ( a i j ) m × n \boldsymbol{A}=\left(a_{i j}\right)_{m \times n} A=(aij)m×n 是常矩阵, X = ( x i j ) n × m \boldsymbol{X}=\left(x_{i j}\right)_{n \times m} X=(xij)n×m 是矩阵变量,且 f ( X ) = tr ( A X ) f(X)=\operatorname{tr}(A X) f(X)=tr(AX),求 ∂ f ∂ X \frac{\partial f}{\partial \boldsymbol{X}} ∂X∂f。
分析:
( c 11 ⋯ c 1 m ⋮ ⋱ ⋮ c m 1 ⋯ c m m ) = ( a 11 ⋯ a 1 n ⋮ ⋱ ⋮ a m 1 ⋯ a m n ) ( x 11 ⋯ x 1 m ⋮ ⋱ ⋮ x n 1 ⋯ x n m ) \left(\begin{array}{ccc} c_{11} & \cdots & c_{1 m} \\ \vdots & \ddots & \vdots \\ c_{m 1} & \cdots & c_{m m} \end{array}\right)=\left(\begin{array}{ccc} a_{11} & \cdots & a_{1 n} \\ \vdots & \ddots & \vdots \\ a_{m 1} & \cdots & a_{m n} \end{array}\right)\left(\begin{array}{ccc} x_{11} & \cdots & x_{1 m} \\ \vdots & \ddots & \vdots \\ x_{n 1} & \cdots & x_{n m} \end{array}\right) c11⋮cm1⋯⋱⋯c1m⋮cmm = a11⋮am1⋯⋱⋯a1n⋮amn x11⋮xn1⋯⋱⋯x1m⋮xnm
解:
由于
A
X
=
(
∑
k
=
1
n
a
i
k
x
k
j
)
m
×
m
\boldsymbol{A} \boldsymbol{X}=\left(\sum_{k=1}^{n} a_{i k} x_{k j}\right)_{m \times m}
AX=(k=1∑naikxkj)m×m
所以
f ( X ) = tr ( A X ) = ∑ s = 1 m ( ∑ k = 1 n a s k x k s ) f(X)=\operatorname{tr}(A X)=\sum_{s=1}^{m}\left(\sum_{k=1}^{n} a_{s k} x_{k s}\right) f(X)=tr(AX)=s=1∑m(k=1∑naskxks)
而
( ∂ f ∂ x i j ) n × m = ( a j i ) n × m ( i = 1 , 2 , ⋯ , n j = 1 , 2 , ⋯ , m ) \left(\frac{\partial f}{\partial x_{i j}}\right)_{n \times m}=\left(a_{j i}\right)_{n \times m} \quad(i=1,2, \cdots, n \quad j=1,2,\cdots,m) (∂xij∂f)n×m=(aji)n×m(i=1,2,⋯,nj=1,2,⋯,m)
故
∂ f ∂ X = ( ∂ f ∂ x i j ) n × m = ( a j i ) n × m = A T \frac{\partial f}{\partial \boldsymbol{X}}=\left(\frac{\partial f}{\partial x_{i j}}\right)_{n \times m}=\left(a_{j i}\right)_{n \times m}=\boldsymbol{A}^{T} ∂X∂f=(∂xij∂f)n×m=(aji)n×m=AT
例3 设 x = ( ξ 1 , ξ 2 , ⋯ ξ n ) T \boldsymbol{x}=\left(\xi_{1}, \xi_{2}, \cdots \xi_{n}\right)^{T} x=(ξ1,ξ2,⋯ξn)T, A = ( a i j ) n × n \boldsymbol{A}=\left(a_{i j}\right)_{n \times n} A=(aij)n×n 是常矩阵,n 元函数 f ( x ) = x T A x f(\boldsymbol{x})=\boldsymbol{x}^{T} \mathbf{A} \boldsymbol{x} f(x)=xTAx,求 d f d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}} dxdf。
解:因 f ( x ) = ξ 1 ∑ j = 1 n a 1 j ξ j + ⋯ + ξ k ∑ j = 1 n a k j ξ j + ⋯ + ξ n ∑ j = 1 n a n j ξ j f(\boldsymbol{x})=\xi_{1} \sum_{j=1}^{n} a_{1 j} \xi_{j}+\cdots+\xi_{k} \sum_{j=1}^{n} a_{k j} \xi_{j}+\cdots+\xi_{n} \sum_{j=1}^{n} a_{n j} \xi_{j} f(x)=ξ1j=1∑na1jξj+⋯+ξkj=1∑nakjξj+⋯+ξnj=1∑nanjξj
所以
∂ f ( x ) ∂ ξ k = ξ 1 a 1 k + … + ξ k − 1 a k − 1 , k + ( ∑ j = 1 n a k j ξ j + ξ k a k k ) + ξ k + 1 a k + 1 , k + … + ξ n a n k = ∑ i = 1 n a i k ξ i + ∑ j = 1 n a k j ξ j , k = 1 , 2 , ⋯ , n \begin{aligned} \frac{\partial f(\boldsymbol{x})}{\partial \xi_{k}} &=\xi_{1} a_{1 k}+\ldots+\xi_{k-1} a_{k-1, k}+\left(\sum_{j=1}^{n} a_{k j} \xi_{j}+\xi_{k} a_{k k}\right)+\xi_{k+1} a_{k+1, k}+\ldots+\xi_{n} a_{n k} \\ &=\sum_{i=1}^{n} a_{i k} \xi_{i}+\sum_{j=1}^{n} a_{k j} \xi_{j}, \quad k=1,2, \cdots, n\\ \end{aligned} ∂ξk∂f(x)=ξ1a1k+…+ξk−1ak−1,k+(j=1∑nakjξj+ξkakk)+ξk+1ak+1,k+…+ξnank=i=1∑naikξi+j=1∑nakjξj,k=1,2,⋯,n
d f d x = ( ∂ f ∂ ξ 1 ∂ f ∂ ξ 2 ⋮ ∂ f ∂ ξ n ) = ( ∑ j = 1 n a 1 j ξ j ∑ j = 1 n a 2 j ξ j ⋮ ∑ j = 1 n a i j ξ j ) + ( ∑ i = 1 n a i 1 ξ i ∑ i = 1 n a i 2 ξ i ⋮ ∑ i = 1 n a i n ξ i ) = = A x + A T x = ( A + A T ) x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=\left(\begin{array}{c} \frac{\partial f}{\partial \xi_{1}} \\ \frac{\partial f}{\partial \xi_{2}} \\ \vdots \\ \frac{\partial f}{\partial \xi_{n}} \end{array}\right)=\left(\begin{array}{c} \sum_{j=1}^{n} a_{1j} \xi_{j} \\ \sum_{j=1}^{n} a_{2 j} \xi_{j} \\ \vdots \\ \sum_{j=1}^{n} a_{i j} \xi_{j} \end{array}\right)+\left(\begin{array}{c} \sum_{i=1}^{n} a_{i 1} \xi_{i} \\ \sum_{i=1}^{n} a_{i 2} \xi_{i} \\ \vdots \\ \sum_{i=1}^{n} a_{i n} \xi_{i} \end{array}\right)==\mathbf{A} \boldsymbol{x}+\mathbf{A}^{T} \boldsymbol{x}=\left(\mathbf{A}+\mathbf{A}^{T}\right) \boldsymbol{x} dxdf= ∂ξ1∂f∂ξ2∂f⋮∂ξn∂f = ∑j=1na1jξj∑j=1na2jξj⋮∑j=1naijξj + ∑i=1nai1ξi∑i=1nai2ξi⋮∑i=1nainξi ==Ax+ATx=(A+AT)x
特别地,当 A A A 为对称矩阵时, d f d x = 2 A x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=2 A x dxdf=2Ax。
例4 设 A ∈ R m × n \boldsymbol{A} \in \mathbb{R}^{m \times n} A∈Rm×n, b ∈ R m \boldsymbol{b} \in \mathbb{R}^{m} b∈Rm, x ∈ R n \boldsymbol{x} \in \mathbb{R}^{n} x∈Rn , f ( x ) = ∥ A x − b ∥ 2 2 f(x)=\|\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}\|_{2}^{2} f(x)=∥Ax−b∥22,试求 d f d x \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}} dxdf
解:因为
f ( x ) = ∥ A x − b ∥ 2 2 = ( A x − b , A x − b ) = ( A x − b ) T ( A x − b ) = ( x T A T − b T ) ( A x − b ) = x T A T A x − b T A x − x T A T b + b T b = x T ( A T A ) x − 2 ( A T b ) T x + b T b \begin{aligned} f(\boldsymbol{x}) & =\|\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}\|_{2}^{2}=(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}, \boldsymbol{A} \boldsymbol{x}-\boldsymbol{b})=(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b})^{T}(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}) \\ & =\left(\boldsymbol{x}^{T} \boldsymbol{A}^{T}-\boldsymbol{b}^{T}\right)(\boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}) \\ & =\boldsymbol{x}^{T} \boldsymbol{A}^{T} \boldsymbol{A} \boldsymbol{x}-\boldsymbol{b}^{T} \boldsymbol{A} \boldsymbol{x}-\boldsymbol{x}^{T} \boldsymbol{A}^{T} \boldsymbol{b}+\boldsymbol{b}^{T} \boldsymbol{b} \\ & =x^{T}\left(A^{T} A\right) x-2\left(A^{T} b\right)^{T} x+b^{T} b \end{aligned} f(x)=∥Ax−b∥22=(Ax−b,Ax−b)=(Ax−b)T(Ax−b)=(xTAT−bT)(Ax−b)=xTATAx−bTAx−xTATb+bTb=xT(ATA)x−2(ATb)Tx+bTb
最后一行我懒得打
\boldsymbol
。大家理解一下,doge。此外,二次型的导数之所以可以写成这样,是因为 A T A \boldsymbol{A}^{T} \boldsymbol{A} ATA 是一个对称矩阵。
从而
d f d x = 2 A T A x − 2 A T b \frac{\mathrm{d} f}{\mathrm{~d} \boldsymbol{x}}=2 \boldsymbol{A}^{T} \boldsymbol{A} \boldsymbol{x}-2 \boldsymbol{A}^{T} \boldsymbol{b} dxdf=2ATAx−2ATb
我们令这个导数等于零,即 A T A x = A T b \boldsymbol{A}^T\boldsymbol{A}\boldsymbol{x}=\boldsymbol{A}^T\boldsymbol{b} ATAx=ATb 一定有解的,因为 tr ( A T A ) = tr ( A T b ) \operatorname{tr}(\boldsymbol{A}^T\boldsymbol{A})=\operatorname{tr}(\boldsymbol{A}^T\boldsymbol{b}) tr(ATA)=tr(ATb)。
详细证明我还没学…。