教程
笔记
1. 向量对标量求导
y
=
(
y
1
,
y
2
,
⋯
,
y
n
)
T
\boldsymbol{y} = (y_1, y_2, \cdots, y_n)^T
y=(y1,y2,⋯,yn)T对标量
x
x
x求导:
∂
y
∂
x
=
(
∂
y
1
∂
x
,
∂
y
2
∂
x
,
⋯
,
∂
y
n
∂
x
)
T
\frac{\partial{\boldsymbol{y}}}{\partial{x}} = (\frac{\partial{y_1}}{\partial{x}}, \frac{\partial{y_2}}{\partial{x}}, \cdots, \frac{\partial{y_n}}{\partial{x}})^T
∂x∂y=(∂x∂y1,∂x∂y2,⋯,∂x∂yn)T
2. 标量对向量求导
标量
y
y
y对
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T
\boldsymbol{x} = (x_1, x_2, \cdots, x_n)^T
x=(x1,x2,⋯,xn)T求导:
分子布局:
∂
y
∂
x
=
(
∂
y
∂
x
1
,
∂
y
∂
x
2
,
⋯
,
∂
y
∂
x
n
)
\frac{\partial{y}}{\partial{\boldsymbol{x}}} = (\frac{\partial{y}}{\partial{x_1}}, \frac{\partial{y}}{\partial{x_2}}, \cdots, \frac{\partial{y}}{\partial{x_n}})
∂x∂y=(∂x1∂y,∂x2∂y,⋯,∂xn∂y)
3. 向量对向量求导
y
=
(
y
1
,
y
2
,
⋯
,
y
m
)
T
\boldsymbol{y} = (y_1, y_2, \cdots, y_m)^T
y=(y1,y2,⋯,ym)T对
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
T
\boldsymbol{x} = (x_1, x_2, \cdots, x_n)^T
x=(x1,x2,⋯,xn)T求导:
分子布局:
∂
y
∂
x
=
[
∂
y
1
∂
x
1
∂
y
1
∂
x
2
⋯
∂
y
1
∂
x
n
∂
y
2
∂
x
1
∂
y
2
∂
x
2
⋯
∂
y
2
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
∂
x
1
∂
y
m
∂
x
2
⋯
∂
y
m
∂
x
n
]
\frac{\partial{\boldsymbol{y}}}{\partial{\boldsymbol{x}}} = \left[ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}} & \frac{\partial{y_1}}{\partial{x_2}} & \cdots & \frac{\partial{y_1}}{\partial{x_n}} \\ \frac{\partial{y_2}}{\partial{x_1}} & \frac{\partial{y_2}}{\partial{x_2}} & \cdots & \frac{\partial{y_2}}{\partial{x_n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_m}}{\partial{x_1}} & \frac{\partial{y_m}}{\partial{x_2}} & \cdots & \frac{\partial{y_m}}{\partial{x_n}} \end{matrix} \right]
∂x∂y=⎣⎢⎢⎢⎢⎡∂x1∂y1∂x1∂y2⋮∂x1∂ym∂x2∂y1∂x2∂y2⋮∂x2∂ym⋯⋯⋱⋯∂xn∂y1∂xn∂y2⋮∂xn∂ym⎦⎥⎥⎥⎥⎤
4. 矩阵对标量求导
矩阵
Y
m
×
n
Y_{m\times n}
Ym×n对标量
x
x
x求导:
分子布局:
∂
Y
∂
x
=
[
∂
y
11
∂
x
∂
y
12
∂
x
⋯
∂
y
1
n
∂
x
∂
y
21
∂
x
∂
y
22
∂
x
⋯
∂
y
2
n
∂
x
⋮
⋮
⋱
⋮
∂
y
m
1
∂
x
∂
y
m
2
∂
x
⋯
∂
y
m
n
∂
x
]
\frac{\partial{Y}}{\partial{x}} = \left[ \begin{matrix} \frac{\partial{y_{11}}}{\partial{x}} & \frac{\partial{y_{12}}}{\partial{x}} & \cdots & \frac{\partial{y_{1n}}}{\partial{x}} \\ \frac{\partial{y_{21}}}{\partial{x}} & \frac{\partial{y_{22}}}{\partial{x}} & \cdots & \frac{\partial{y_{2n}}}{\partial{x}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_{m1}}}{\partial{x}} & \frac{\partial{y_{m2}}}{\partial{x}} & \cdots & \frac{\partial{y_{mn}}}{\partial{x}} \end{matrix} \right]
∂x∂Y=⎣⎢⎢⎢⎡∂x∂y11∂x∂y21⋮∂x∂ym1∂x∂y12∂x∂y22⋮∂x∂ym2⋯⋯⋱⋯∂x∂y1n∂x∂y2n⋮∂x∂ymn⎦⎥⎥⎥⎤
5. 标量对矩阵求导
标量
y
y
y对矩阵
X
m
×
n
X_{m\times n}
Xm×n求导:
分子布局:
∂
y
∂
X
=
[
∂
y
∂
x
11
∂
y
∂
x
21
⋯
∂
y
∂
x
m
1
∂
y
∂
x
12
∂
y
∂
x
22
⋯
∂
y
∂
x
m
2
⋮
⋮
⋱
⋮
∂
y
∂
x
1
n
∂
y
∂
x
2
n
⋯
∂
y
∂
x
m
n
]
\frac{\partial{y}}{\partial{X}} = \left[ \begin{matrix} \frac{\partial{y}}{\partial{x_{11}}} & \frac{\partial{y}}{\partial{x_{21}}} & \cdots & \frac{\partial{y}}{\partial{x_{m1}}} \\ \frac{\partial{y}}{\partial{x_{12}}} & \frac{\partial{y}}{\partial{x_{22}}} & \cdots & \frac{\partial{y}}{\partial{x_{m2}}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y}}{\partial{x_{1n}}} & \frac{\partial{y}}{\partial{x_{2n}}} & \cdots & \frac{\partial{y}}{\partial{x_{mn}}} \end{matrix} \right]
∂X∂y=⎣⎢⎢⎢⎢⎡∂x11∂y∂x12∂y⋮∂x1n∂y∂x21∂y∂x22∂y⋮∂x2n∂y⋯⋯⋱⋯∂xm1∂y∂xm2∂y⋮∂xmn∂y⎦⎥⎥⎥⎥⎤
6. 矩阵对向量求导
矩阵
Y
m
×
n
Y_{m\times n}
Ym×n对向量
x
=
(
x
1
,
x
2
,
⋯
,
x
n
)
\boldsymbol{x} = (x_1, x_2, \cdots, x_n)
x=(x1,x2,⋯,xn)求导:
分子布局:
∂
Y
∂
x
=
[
∂
y
11
∂
x
1
∂
y
12
∂
x
2
⋯
∂
y
1
n
∂
x
n
∂
y
21
∂
x
1
∂
y
22
∂
x
2
⋯
∂
y
2
n
∂
x
n
⋮
⋮
⋱
⋮
∂
y
m
1
∂
x
1
∂
y
m
2
∂
x
2
⋯
∂
y
m
n
∂
x
n
]
\frac{\partial{Y}}{\partial{\boldsymbol{x}}} = \left[ \begin{matrix} \frac{\partial{y_{11}}}{\partial{x_1}} & \frac{\partial{y_{12}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{1n}}}{\partial{x_n}} \\ \frac{\partial{y_{21}}}{\partial{x_1}} & \frac{\partial{y_{22}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{2n}}}{\partial{x_n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_{m1}}}{\partial{x_1}} & \frac{\partial{y_{m2}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{mn}}}{\partial{x_n}} \end{matrix} \right]
∂x∂Y=⎣⎢⎢⎢⎢⎡∂x1∂y11∂x1∂y21⋮∂x1∂ym1∂x2∂y12∂x2∂y22⋮∂x2∂ym2⋯⋯⋱⋯∂xn∂y1n∂xn∂y2n⋮∂xn∂ymn⎦⎥⎥⎥⎥⎤
7. 矩阵的迹
矩阵的迹就是矩阵的对角线元素的和,也是矩阵特征值之和,用tr(A)表示。
-
t r ( A ) = t r ( A ′ ) \boldsymbol{tr(A) = tr(A^{'})} tr(A)=tr(A′)
证明:矩阵转置不影响对角线的元素,因此迹不变。 -
t r ( A B ) = t r ( B A ) \boldsymbol{tr(AB) = tr(BA)} tr(AB)=tr(BA)
证明: A m × n , B n × m t r ( A B ) = ∑ i = 1 n ∑ j = 1 m α i j β j i t r ( B A ) = ∑ i = 1 m ∑ j = 1 n β i j α j i = ∑ i = 1 n ∑ j = 1 m β j i α i j = t r ( A B ) A_{m\times n}, B_{n\times m} \newline tr(AB) = \sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ij}\beta_{ji} \newline tr(BA) = \sum^{m}_{i=1}\sum^{n}_{j=1}\beta_{ij}\alpha_{ji} = \sum^{n}_{i=1}\sum^{m}_{j=1}\beta_{ji}\alpha_{ij} = tr(AB) Am×n,Bn×mtr(AB)=∑i=1n∑j=1mαijβjitr(BA)=∑i=1m∑j=1nβijαji=∑i=1n∑j=1mβjiαij=tr(AB) -
t r ( A B C ) = t r ( B C A ) = t r ( C A B ) \boldsymbol{tr(ABC) = tr(BCA) = tr(CAB)} tr(ABC)=tr(BCA)=tr(CAB)
证明: t r ( A B C ) = t r ( A ( B C ) ) = t r ( B C A ) = t r ( B ( C A ) ) = t r ( C A B ) tr(ABC)=tr(A(BC))=tr(BCA)=tr(B(CA))=tr(CAB) tr(ABC)=tr(A(BC))=tr(BCA)=tr(B(CA))=tr(CAB) -
∂ ( t r ( A B ) ) ∂ A = ∂ ( t r ( B A ) ) ∂ A = B ′ \boldsymbol{\frac{\partial (tr(AB))}{\partial A}=\frac{\partial (tr(BA))}{\partial A}=B^{'}} ∂A∂(tr(AB))=∂A∂(tr(BA))=B′
证明: ∂ ( t r ( A B ) ) ∂ A = ∑ i = 1 n ∑ j = 1 m α i j β j i ∂ a i j = b j i = B ′ \frac{\partial (tr(AB))}{\partial A}=\frac{\sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ij}\beta_{ji}}{\partial a_{ij}}=b_{ji}=B^{'} ∂A∂(tr(AB))=∂aij∑i=1n∑j=1mαijβji=bji=B′ -
∂ ( t r ( A ′ B ) ) ∂ A = ∂ ( t r ( B A ′ ) ) ∂ A = B \boldsymbol{\frac{\partial (tr(A^{'}B))}{\partial A}=\frac{\partial (tr(BA^{'}))}{\partial A}=B} ∂A∂(tr(A′B))=∂A∂(tr(BA′))=B
证明: A m × n , A n × m ′ , B m × n ∂ ( t r ( A ′ B ) ) ∂ A = ∑ i = 1 n ∑ j = 1 m α j i β j i ∂ a i j = ∑ j = 1 n ∑ i = 1 m α i j β i j ∂ a i j b i j = B A_{m\times n}, A^{'}_{n\times m}, B_{m\times n} \newline \frac{\partial (tr(A^{'}B))}{\partial A}=\frac{\sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ji}\beta_{ji}}{\partial a_{ij}}=\frac{\sum^{n}_{j=1}\sum^{m}_{i=1}\alpha_{ij}\beta_{ij}}{\partial a_{ij}}b_{ij}=B Am×n,An×m′,Bm×n∂A∂(tr(A′B))=∂aij∑i=1n∑j=1mαjiβji=∂aij∑j=1n∑i=1mαijβijbij=B -
∂ ( t r ( A ′ X B ′ ) ) ∂ X = ∂ ( t r ( B X ′ A ) ) ∂ X = A B \boldsymbol{\frac{\partial (tr(A^{'}XB^{'}))}{\partial X}=\frac{\partial (tr(BX^{'}A))}{\partial X}=AB} ∂X∂(tr(A′XB′))=∂X∂(tr(BX′A))=AB
证明: t r ( A ′ X B ′ ) = t r ( A ′ X B ′ ) ′ = t r ( B X ′ A ) = t r ( A B X ′ ) 又 ∂ ( t r ( A B X ′ ) ) ∂ X = A B 因 此 ∂ ( t r ( A ′ X B ′ ) ) ∂ X = ∂ ( t r ( B X ′ A ) ) ∂ X = A B tr(A^{'}XB^{'})=tr(A^{'}XB^{'})^{'}=tr(BX^{'}A)=tr(ABX^{'}) \newline 又\frac{\partial (tr(ABX^{'}))}{\partial X}=AB \newline 因此\frac{\partial (tr(A^{'}XB^{'}))}{\partial X}=\frac{\partial (tr(BX^{'}A))}{\partial X}=AB tr(A′XB′)=tr(A′XB′)′=tr(BX′A)=tr(ABX′)又∂X∂(tr(ABX′))=AB因此∂X∂(tr(A′XB′))=∂X∂(tr(BX′A))=AB -
∂ ( t r ( A X B X ′ ) ) ∂ X = = A X B + A ′ X B ′ \boldsymbol{\frac{\partial (tr(AXBX^{'}))}{\partial X}==AXB+A^{'}XB^{'}} ∂X∂(tr(AXBX′))==AXB+A′XB′
-
∂ ( t r ( A X B X ) ) ∂ X = = A ′ X ′ B ′ + B ′ X ′ A ′ \boldsymbol{\frac{\partial (tr(AXBX))}{\partial X}==A^{'}X^{'}B^{'}+B^{'}X^{'}A^{'}} ∂X∂(tr(AXBX))==A′X′B′+B′X′A′