矩阵的迹和求导

教程

矩阵微积分(wiki)
机器学习中的线性代数之矩阵求导

笔记

1. 向量对标量求导

y = ( y 1 , y 2 , ⋯   , y n ) T \boldsymbol{y} = (y_1, y_2, \cdots, y_n)^T y=(y1,y2,,yn)T对标量 x x x求导:
∂ y ∂ x = ( ∂ y 1 ∂ x , ∂ y 2 ∂ x , ⋯   , ∂ y n ∂ x ) T \frac{\partial{\boldsymbol{y}}}{\partial{x}} = (\frac{\partial{y_1}}{\partial{x}}, \frac{\partial{y_2}}{\partial{x}}, \cdots, \frac{\partial{y_n}}{\partial{x}})^T xy=(xy1,xy2,,xyn)T

2. 标量对向量求导

标量 y y y x = ( x 1 , x 2 , ⋯   , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^T x=(x1,x2,,xn)T求导:
分子布局:
∂ y ∂ x = ( ∂ y ∂ x 1 , ∂ y ∂ x 2 , ⋯   , ∂ y ∂ x n ) \frac{\partial{y}}{\partial{\boldsymbol{x}}} = (\frac{\partial{y}}{\partial{x_1}}, \frac{\partial{y}}{\partial{x_2}}, \cdots, \frac{\partial{y}}{\partial{x_n}}) xy=(x1y,x2y,,xny)

3. 向量对向量求导

y = ( y 1 , y 2 , ⋯   , y m ) T \boldsymbol{y} = (y_1, y_2, \cdots, y_m)^T y=(y1,y2,,ym)T x = ( x 1 , x 2 , ⋯   , x n ) T \boldsymbol{x} = (x_1, x_2, \cdots, x_n)^T x=(x1,x2,,xn)T求导:
分子布局:
∂ y ∂ x = [ ∂ y 1 ∂ x 1 ∂ y 1 ∂ x 2 ⋯ ∂ y 1 ∂ x n ∂ y 2 ∂ x 1 ∂ y 2 ∂ x 2 ⋯ ∂ y 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m ∂ x 1 ∂ y m ∂ x 2 ⋯ ∂ y m ∂ x n ] \frac{\partial{\boldsymbol{y}}}{\partial{\boldsymbol{x}}} = \left[ \begin{matrix} \frac{\partial{y_1}}{\partial{x_1}} & \frac{\partial{y_1}}{\partial{x_2}} & \cdots & \frac{\partial{y_1}}{\partial{x_n}} \\ \frac{\partial{y_2}}{\partial{x_1}} & \frac{\partial{y_2}}{\partial{x_2}} & \cdots & \frac{\partial{y_2}}{\partial{x_n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_m}}{\partial{x_1}} & \frac{\partial{y_m}}{\partial{x_2}} & \cdots & \frac{\partial{y_m}}{\partial{x_n}} \end{matrix} \right] xy=x1y1x1y2x1ymx2y1x2y2x2ymxny1xny2xnym

4. 矩阵对标量求导

矩阵 Y m × n Y_{m\times n} Ym×n对标量 x x x求导:
分子布局:
∂ Y ∂ x = [ ∂ y 11 ∂ x ∂ y 12 ∂ x ⋯ ∂ y 1 n ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ⋯ ∂ y 2 n ∂ x ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x ∂ y m 2 ∂ x ⋯ ∂ y m n ∂ x ] \frac{\partial{Y}}{\partial{x}} = \left[ \begin{matrix} \frac{\partial{y_{11}}}{\partial{x}} & \frac{\partial{y_{12}}}{\partial{x}} & \cdots & \frac{\partial{y_{1n}}}{\partial{x}} \\ \frac{\partial{y_{21}}}{\partial{x}} & \frac{\partial{y_{22}}}{\partial{x}} & \cdots & \frac{\partial{y_{2n}}}{\partial{x}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_{m1}}}{\partial{x}} & \frac{\partial{y_{m2}}}{\partial{x}} & \cdots & \frac{\partial{y_{mn}}}{\partial{x}} \end{matrix} \right] xY=xy11xy21xym1xy12xy22xym2xy1nxy2nxymn

5. 标量对矩阵求导

标量 y y y对矩阵 X m × n X_{m\times n} Xm×n求导:
分子布局:
∂ y ∂ X = [ ∂ y ∂ x 11 ∂ y ∂ x 21 ⋯ ∂ y ∂ x m 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ⋯ ∂ y ∂ x m 2 ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 n ∂ y ∂ x 2 n ⋯ ∂ y ∂ x m n ] \frac{\partial{y}}{\partial{X}} = \left[ \begin{matrix} \frac{\partial{y}}{\partial{x_{11}}} & \frac{\partial{y}}{\partial{x_{21}}} & \cdots & \frac{\partial{y}}{\partial{x_{m1}}} \\ \frac{\partial{y}}{\partial{x_{12}}} & \frac{\partial{y}}{\partial{x_{22}}} & \cdots & \frac{\partial{y}}{\partial{x_{m2}}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y}}{\partial{x_{1n}}} & \frac{\partial{y}}{\partial{x_{2n}}} & \cdots & \frac{\partial{y}}{\partial{x_{mn}}} \end{matrix} \right] Xy=x11yx12yx1nyx21yx22yx2nyxm1yxm2yxmny

6. 矩阵对向量求导

矩阵 Y m × n Y_{m\times n} Ym×n对向量 x = ( x 1 , x 2 , ⋯   , x n ) \boldsymbol{x} = (x_1, x_2, \cdots, x_n) x=(x1,x2,,xn)求导:
分子布局:
∂ Y ∂ x = [ ∂ y 11 ∂ x 1 ∂ y 12 ∂ x 2 ⋯ ∂ y 1 n ∂ x n ∂ y 21 ∂ x 1 ∂ y 22 ∂ x 2 ⋯ ∂ y 2 n ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ y m 1 ∂ x 1 ∂ y m 2 ∂ x 2 ⋯ ∂ y m n ∂ x n ] \frac{\partial{Y}}{\partial{\boldsymbol{x}}} = \left[ \begin{matrix} \frac{\partial{y_{11}}}{\partial{x_1}} & \frac{\partial{y_{12}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{1n}}}{\partial{x_n}} \\ \frac{\partial{y_{21}}}{\partial{x_1}} & \frac{\partial{y_{22}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{2n}}}{\partial{x_n}} \\ \vdots & \vdots & \ddots & \vdots \\ \frac{\partial{y_{m1}}}{\partial{x_1}} & \frac{\partial{y_{m2}}}{\partial{x_2}} & \cdots & \frac{\partial{y_{mn}}}{\partial{x_n}} \end{matrix} \right] xY=x1y11x1y21x1ym1x2y12x2y22x2ym2xny1nxny2nxnymn

7. 矩阵的迹

矩阵的迹就是矩阵的对角线元素的和,也是矩阵特征值之和,用tr(A)表示。

  1. t r ( A ) = t r ( A ′ ) \boldsymbol{tr(A) = tr(A^{'})} tr(A)=tr(A)
    证明:矩阵转置不影响对角线的元素,因此迹不变。

  2. t r ( A B ) = t r ( B A ) \boldsymbol{tr(AB) = tr(BA)} tr(AB)=tr(BA)
    证明: A m × n , B n × m t r ( A B ) = ∑ i = 1 n ∑ j = 1 m α i j β j i t r ( B A ) = ∑ i = 1 m ∑ j = 1 n β i j α j i = ∑ i = 1 n ∑ j = 1 m β j i α i j = t r ( A B ) A_{m\times n}, B_{n\times m} \newline tr(AB) = \sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ij}\beta_{ji} \newline tr(BA) = \sum^{m}_{i=1}\sum^{n}_{j=1}\beta_{ij}\alpha_{ji} = \sum^{n}_{i=1}\sum^{m}_{j=1}\beta_{ji}\alpha_{ij} = tr(AB) Am×n,Bn×mtr(AB)=i=1nj=1mαijβjitr(BA)=i=1mj=1nβijαji=i=1nj=1mβjiαij=tr(AB)

  3. t r ( A B C ) = t r ( B C A ) = t r ( C A B ) \boldsymbol{tr(ABC) = tr(BCA) = tr(CAB)} tr(ABC)=tr(BCA)=tr(CAB)
    证明: t r ( A B C ) = t r ( A ( B C ) ) = t r ( B C A ) = t r ( B ( C A ) ) = t r ( C A B ) tr(ABC)=tr(A(BC))=tr(BCA)=tr(B(CA))=tr(CAB) tr(ABC)=tr(A(BC))=tr(BCA)=tr(B(CA))=tr(CAB)

  4. ∂ ( t r ( A B ) ) ∂ A = ∂ ( t r ( B A ) ) ∂ A = B ′ \boldsymbol{\frac{\partial (tr(AB))}{\partial A}=\frac{\partial (tr(BA))}{\partial A}=B^{'}} A(tr(AB))=A(tr(BA))=B
    证明: ∂ ( t r ( A B ) ) ∂ A = ∑ i = 1 n ∑ j = 1 m α i j β j i ∂ a i j = b j i = B ′ \frac{\partial (tr(AB))}{\partial A}=\frac{\sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ij}\beta_{ji}}{\partial a_{ij}}=b_{ji}=B^{'} A(tr(AB))=aiji=1nj=1mαijβji=bji=B

  5. ∂ ( t r ( A ′ B ) ) ∂ A = ∂ ( t r ( B A ′ ) ) ∂ A = B \boldsymbol{\frac{\partial (tr(A^{'}B))}{\partial A}=\frac{\partial (tr(BA^{'}))}{\partial A}=B} A(tr(AB))=A(tr(BA))=B
    证明: A m × n , A n × m ′ , B m × n ∂ ( t r ( A ′ B ) ) ∂ A = ∑ i = 1 n ∑ j = 1 m α j i β j i ∂ a i j = ∑ j = 1 n ∑ i = 1 m α i j β i j ∂ a i j b i j = B A_{m\times n}, A^{'}_{n\times m}, B_{m\times n} \newline \frac{\partial (tr(A^{'}B))}{\partial A}=\frac{\sum^{n}_{i=1}\sum^{m}_{j=1}\alpha_{ji}\beta_{ji}}{\partial a_{ij}}=\frac{\sum^{n}_{j=1}\sum^{m}_{i=1}\alpha_{ij}\beta_{ij}}{\partial a_{ij}}b_{ij}=B Am×n,An×m,Bm×nA(tr(AB))=aiji=1nj=1mαjiβji=aijj=1ni=1mαijβijbij=B

  6. ∂ ( t r ( A ′ X B ′ ) ) ∂ X = ∂ ( t r ( B X ′ A ) ) ∂ X = A B \boldsymbol{\frac{\partial (tr(A^{'}XB^{'}))}{\partial X}=\frac{\partial (tr(BX^{'}A))}{\partial X}=AB} X(tr(AXB))=X(tr(BXA))=AB
    证明: t r ( A ′ X B ′ ) = t r ( A ′ X B ′ ) ′ = t r ( B X ′ A ) = t r ( A B X ′ ) 又 ∂ ( t r ( A B X ′ ) ) ∂ X = A B 因 此 ∂ ( t r ( A ′ X B ′ ) ) ∂ X = ∂ ( t r ( B X ′ A ) ) ∂ X = A B tr(A^{'}XB^{'})=tr(A^{'}XB^{'})^{'}=tr(BX^{'}A)=tr(ABX^{'}) \newline 又\frac{\partial (tr(ABX^{'}))}{\partial X}=AB \newline 因此\frac{\partial (tr(A^{'}XB^{'}))}{\partial X}=\frac{\partial (tr(BX^{'}A))}{\partial X}=AB tr(AXB)=tr(AXB)=tr(BXA)=tr(ABX)X(tr(ABX))=ABX(tr(AXB))=X(tr(BXA))=AB

  7. ∂ ( t r ( A X B X ′ ) ) ∂ X = = A X B + A ′ X B ′ \boldsymbol{\frac{\partial (tr(AXBX^{'}))}{\partial X}==AXB+A^{'}XB^{'}} X(tr(AXBX))==AXB+AXB

  8. ∂ ( t r ( A X B X ) ) ∂ X = = A ′ X ′ B ′ + B ′ X ′ A ′ \boldsymbol{\frac{\partial (tr(AXBX))}{\partial X}==A^{'}X^{'}B^{'}+B^{'}X^{'}A^{'}} X(tr(AXBX))==AXB+BXA

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值