矩阵求导笔记

0.符号定义

  • 数域:记 F \mathbb{F} F为某一数域。
  • 标量:记 y y y x x x标量,相应的 d y \mathrm{d}y dy d x \mathrm{d}x dx也为标量,即 x , d x , y , d y ∈ F 1 x,\mathrm{d}x, y, \mathrm{d}y \in \mathbb{F}^{1} x,dx,y,dyF1
  • 向量:记 y ⃗ \vec{y} y x ⃗ \vec{x} x 分别为 m m m n n n维列向量,相应的 d y ⃗ \mathrm{d}\vec{y} dy d x ⃗ \mathrm{d}\vec{x} dx 也分别为 m m m n n n维列向量
    x ⃗ , d x ⃗ , ∈ F n \vec{x},\mathrm{d}\vec{x}, \in \mathbb{F}^{n} x ,dx ,Fn y ⃗ , d y ⃗ , ∈ F m \vec{y},\mathrm{d}\vec{y}, \in \mathbb{F}^{m} y ,dy ,Fm
  • 矩阵:记 Y Y Y X X X矩阵,相应的 d y \mathrm{d}y dy d x \mathrm{d}x dx也为矩阵
    X , d X , ∈ F r × s X,\mathrm{d}X, \in \mathbb{F}^{r \times s} X,dX,Fr×s Y , d Y , ∈ F p × q Y,\mathrm{d}Y, \in \mathbb{F}^{p \times q} Y,dY,Fp×q

其中 d x ⃗ = ( d x 1 , d x 2 , ⋯   , d x n ) T \mathrm{d}\vec{x} = ( \mathrm{d}x_1, \mathrm{d}x_2, \cdots, \mathrm{d}x_n )^T dx =(dx1,dx2,,dxn)T, d y ⃗ \mathrm{d}\vec{y} dy 同理。
d X = ( d x ⃗ 1 , d x ⃗ 2 , ⋯   , d x ⃗ s ) = ( d x 11 d x 12 d x 13 ⋯ d x 1 s d x 21 d x 22 d x 23 ⋯ d x 2 s d x 31 d x 32 d x 33 ⋯ d x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ d x r 1 d x r 2 d x r 3 ⋯ d x r s ) r × s \mathrm{d}X = \left( \begin{matrix} \mathrm{d}\vec{x}_1, & \mathrm{d}\vec{x}_2, & \cdots, & \mathrm{d}\vec{x}_s \end{matrix} \right) = \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \\ \end{matrix} \right)_{r \times s} dX=(dx 1,dx 2,,dx s)=dx11dx21dx31dxr1dx12dx22dx32dxr2dx13dx23dx33dxr3dx1sdx2sdx3sdxrsr×s
矩阵求导类型[1]

标量向量矩阵
标量 ∂ y ∂ x \frac{\partial y}{\partial x} xy ∂ y ⃗ ∂ x \frac{\partial \vec{y}}{\partial x} xy
向量 ∂ y ∂ x ⃗ \frac{\partial y}{\partial \vec{x}} x y ∂ y ⃗ ∂ x ⃗ \frac{\partial \vec{y}}{\partial \vec{x}} x y
矩阵 ∂ y ∂ X \frac{\partial y}{\partial X} Xy ∂ y ⃗ ∂ X \frac{\partial \vec{y}}{\partial X} Xy

1.对标量求导

1.1标量对标量求导

为了全文书写上风格统一:
d y = ∂ y ∂ x d x . \mathrm{d}y = \frac{\partial y}{\partial x} \mathrm{d}x. dy=xydx.
性质

  • SS1(线性) ∂ ( u + v ) ∂ x = ∂ u ∂ x + ∂ v ∂ x \frac{\partial (u + v)}{\partial x} = \frac{\partial u}{\partial x} + \frac{\partial v}{\partial x} x(u+v)=xu+xv
  • SS2(分部) ∂ ( u v ) ∂ x = u ∂ v ∂ x + v ∂ u ∂ x \frac{\partial (uv)}{\partial x} = u \frac{\partial v}{\partial x} + v \frac{\partial u}{\partial x} x(uv)=uxv+vxu
  • SS3(链式) ∂ g ( u ) ∂ x = ∂ g ( u ) ∂ u ∂ u ∂ x \frac{\partial g(u)}{\partial x} = \frac{\partial g(u)}{\partial u} \frac{\partial u}{\partial x} xg(u)=ug(u)xu

1.2向量对标量求导

∂ y ⃗ ∂ x = ( ∂ y 1 ∂ x ∂ y 2 ∂ x ∂ y 3 ∂ x ⋮ ∂ y n ∂ x ) n × 1 \frac{\partial \vec{y}}{\partial x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \\ \frac{\partial y_2}{\partial x} \\ \frac{\partial y_3}{\partial x} \\ \vdots \\ \frac{\partial y_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} xy =xy1xy2xy3xynn×1

因此有 d y i = ∂ y i ∂ x d x , i = 1 , 2 , ⋯   , m \mathrm{d}y_i = \frac{\partial y_i}{\partial x} \mathrm{d}x, i = 1,2,\cdots, m dyi=xyidx,i=1,2,,m。即
d y ⃗ = ( ∂ y 1 ∂ x d x ∂ y 2 ∂ x d x ∂ y 3 ∂ x d x ⋮ ∂ y n ∂ x d x ) n × 1 = ∂ y ⃗ ∂ x ⊗ d x \mathrm{d}\vec{y} = \left( \begin{matrix} \frac{\partial y_1}{\partial x} \mathrm{d}x \\ \frac{\partial y_2}{\partial x} \mathrm{d}x \\ \frac{\partial y_3}{\partial x} \mathrm{d}x \\ \vdots \\ \frac{\partial y_n}{\partial x} \mathrm{d}x \\ \end{matrix} \right)_{n \times 1} = \frac{\partial \vec{y}}{\partial x} \otimes \mathrm{d}x dy =xy1dxxy2dxxy3dxxyndxn×1=xy dx
性质

  • VS1(常向量):对于 ∀ a ⃗ ∈ F n × 1 \forall \vec{a} \in \mathbb{F}^{n \times 1} a Fn×1的常列向量, ∂ a ⃗ ∂ x = 0 ⃗ n × 1 \frac{\partial \vec{a}}{\partial x} = \vec{0}_{n \times 1} xa =0 n×1
  • VS2(向量数乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , a ∈ F 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, a \in \mathbb{F}^{1} u (x)Fn×1,aF1, 有 ∂ a u ⃗ ∂ x = a ∂ u ⃗ ∂ x \frac{\partial a\vec{u}}{\partial x} = a \frac{\partial \vec{u}}{\partial x} xau =axu
  • VS3(向量矩阵乘):对于 ∀ u ⃗ ( x ) ∈ F n × 1 , A ∈ F m × n \forall \vec{u}(x) \in \mathbb{F}^{n \times 1}, A \in \mathbb{F}^{m \times n} u (x)Fn×1,AFm×n, 有 ∂ A u ⃗ ∂ x = A ∂ u ⃗ ∂ x \frac{\partial A\vec{u}}{\partial x} = A \frac{\partial \vec{u}}{\partial x} xAu =Axu
  • VS4(向量转置):对于 ∀ u ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} u (x)Fn×1, 有 ∂ ( u ⃗ T ) ∂ x = ( ∂ u ⃗ ∂ x ) T \frac{\partial ( \vec{u}^T ) }{\partial x} = \left( \frac{\partial \vec{u}}{\partial x} \right)^T x(u T)=(xu )T
  • VS5(向量加法): 对于 ∀ u ⃗ ( x ) , v ⃗ ( x ) ∈ F n × 1 \forall \vec{u}(x), \vec{v}(x) \in \mathbb{F}^{n \times 1} u (x),v (x)Fn×1, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x = ∂ u ⃗ ∂ x + ∂ v ⃗ ∂ x \frac{\partial (\vec{u} + \vec{v})}{\partial x} = \frac{\partial \vec{u}}{\partial x} + \frac{\partial \vec{v}}{\partial x} x(u +v )=xu +xv
  • VS6(链式): 对于 ∀ u ⃗ ( x ) ∈ F n × 1 , g ( u ⃗ ) ⃗ ∈ F m × 1 \forall \vec{u}(x) \in \mathbb{F}^{n \times 1} , \vec{g(\vec{u})} \in \mathbb{F}^{m \times 1} u (x)Fn×1,g(u ) Fm×1, 有 ∂ g ⃗ ∂ x = ( ∂ g ⃗ ∂ u ⃗ ) T ∂ u ⃗ ∂ x \frac{\partial \vec{g} }{\partial x} = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} xg =(u g )Txu

简单对 VS3(向量矩阵乘) 性质做证明。
∂ ( A u ⃗ ) ∂ x = ∂ ∂ x ( ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( u 1 u 2 ⋮ u n ) ) = ∂ ∂ x ( u 1 ( a 11 a 21 ⋮ a n 1 ) + u 2 ( a 12 a 22 ⋮ a n 2 ) + ⋯ + u m ( a 1 m a 2 m ⋮ a n m ) ) = ∂ u 1 ∂ x ( a 11 a 21 ⋮ a n 1 ) + ∂ u 2 ∂ x ( a 12 a 22 ⋮ a n 2 ) + ⋯ + ∂ u m ∂ x ( a 1 m a 2 m ⋮ a n m ) = ( a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a m 1 a m 2 ⋯ a m n ) ( ∂ u 1 ∂ x ∂ u 2 ∂ x ⋮ ∂ u n ∂ x ) = A ∂ u ⃗ ∂ x \begin{aligned} \frac{\partial \left( A\vec{u} \right)}{\partial x} = & \frac{\partial}{\partial x} \left( \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ \vdots \\ u_n \\ \end{matrix} \right) \right) \\ = & \frac{\partial}{\partial x} \left( u_1 \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + u_2 \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + u_m \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \right) \\ = & \frac{\partial u_1}{\partial x} \left( \begin{matrix} a_{11} \\ a_{21} \\ \vdots \\ a_{n1} \\ \end{matrix} \right) + \frac{\partial u_2}{\partial x} \left( \begin{matrix} a_{12} \\ a_{22} \\ \vdots \\ a_{n2} \\ \end{matrix} \right) + \cdots + \frac{\partial u_m}{\partial x} \left( \begin{matrix} a_{1m} \\ a_{2m} \\ \vdots \\ a_{nm} \\ \end{matrix} \right) \\ = & \left( \begin{matrix} a_{11} & a_{12} & \cdots & a_{1n} \\ a_{21} & a_{22} & \cdots & a_{2n} \\ \vdots & \vdots & \ddots & \vdots \\ a_{m1} & a_{m2} & \cdots & a_{mn} \\ \end{matrix} \right) \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right) \\ = & A \frac{\partial \vec{u} }{\partial x} \end{aligned} x(Au )=====xa11a21am1a12a22am2a1na2namnu1u2unxu1a11a21an1+u2a12a22an2++uma1ma2manmxu1a11a21an1+xu2a12a22an2++xuma1ma2manma11a21am1a12a22am2a1na2namnxu1xu2xunAxu

下面简单证明 VS6(链式) 法则。首先,对于 ∂ g ⃗ ∂ u ⃗ \frac{\partial \vec{g}}{\partial \vec{u}} u g 属于向量对向量求导,有
∂ g ⃗ ∂ u ⃗ = ( ∂ g 1 ∂ u ⃗ , ∂ g 2 ∂ u ⃗ , ∂ g 3 ∂ u ⃗ , ⋯   , ∂ g m ∂ u ⃗ ) = ( ∂ g 1 ∂ u 1 ∂ g 2 ∂ u 1 ∂ g 3 ∂ x 1 ⋯ ∂ g m ∂ u 1 ∂ g 1 ∂ u 2 ∂ g 2 ∂ u 2 ∂ g 3 ∂ x 2 ⋯ ∂ g m ∂ u 2 ∂ g 1 ∂ u 3 ∂ g 2 ∂ u 3 ∂ g 3 ∂ x 3 ⋯ ∂ g m ∂ u 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ g 1 ∂ u n ∂ g 2 ∂ u n ∂ g 3 ∂ u n ⋯ ∂ g m ∂ u n ) n × m \frac{\partial \vec{g}}{\partial \vec{u}} = \left( \begin{matrix} \frac{\partial g_1}{\partial \vec{u}}, & \frac{\partial g_2}{\partial \vec{u}}, & \frac{\partial g_3}{\partial \vec{u}}, & \cdots, & \frac{\partial g_m}{\partial \vec{u}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_2}{\partial u_1} & \frac{\partial g_3}{\partial x_1} & \cdots & \frac{\partial g_m}{\partial u_1} \\ \frac{\partial g_1}{\partial u_2} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_3}{\partial x_2} & \cdots & \frac{\partial g_m}{\partial u_2} \\ \frac{\partial g_1}{\partial u_3} & \frac{\partial g_2}{\partial u_3} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_m}{\partial u_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_1}{\partial u_n} & \frac{\partial g_2}{\partial u_n} & \frac{\partial g_3}{\partial u_n} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{n \times m} u g =(u g1,u g2,u g3,,u gm)=u1g1u2g1u3g1ung1u1g2u2g2u3g2ung2x1g3x2g3x3g3ung3u1gmu2gmu3gmungmn×m
∂ u ⃗ ∂ x \frac{\partial \vec{u}}{\partial x} xu 属于向量对标量求导,有
∂ u ⃗ ∂ x = ( ∂ u 1 ∂ x ∂ u 2 ∂ x ∂ u 3 ∂ x ⋮ ∂ u n ∂ x ) n × 1 \frac{\partial \vec{u}}{\partial x} = \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} xu =xu1xu2xu3xunn×1
因此
R H S = ( ∂ g ⃗ ∂ u ⃗ ) T ∂ u ⃗ ∂ x = ( ∂ g 1 ∂ u 1 ∂ g 1 ∂ u 2 ∂ g 1 ∂ x 3 ⋯ ∂ g 1 ∂ u m ∂ g 2 ∂ u 1 ∂ g 2 ∂ u 2 ∂ g 2 ∂ x 3 ⋯ ∂ g 2 ∂ u m ∂ g 3 ∂ u 1 ∂ g 3 ∂ u 2 ∂ g 3 ∂ x 3 ⋯ ∂ g 3 ∂ u m ⋮ ⋮ ⋮ ⋱ ⋮ ∂ g n ∂ u 1 ∂ g n ∂ u 2 ∂ g n ∂ u 3 ⋯ ∂ g m ∂ u n ) m × n ( ∂ u 1 ∂ x ∂ u 2 ∂ x ∂ u 3 ∂ x ⋮ ∂ u n ∂ x ) n × 1 = ( ∑ i = 1 n ∂ g 1 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 2 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 3 ∂ u i ∂ u i ∂ x ⋮ ∑ i = 1 n ∂ g m ∂ u i ∂ u i ∂ x ) m × 1 \begin{aligned} RHS = \left( \frac{\partial \vec{g}}{\partial \vec{u}} \right)^T \frac{\partial \vec{u}}{\partial x} = & \left( \begin{matrix} \frac{\partial g_1}{\partial u_1} & \frac{\partial g_1}{\partial u_2} & \frac{\partial g_1}{\partial x_3} & \cdots & \frac{\partial g_1}{\partial u_m} \\ \frac{\partial g_2}{\partial u_1} & \frac{\partial g_2}{\partial u_2} & \frac{\partial g_2}{\partial x_3} & \cdots & \frac{\partial g_2}{\partial u_m} \\ \frac{\partial g_3}{\partial u_1} & \frac{\partial g_3}{\partial u_2} & \frac{\partial g_3}{\partial x_3} & \cdots & \frac{\partial g_3}{\partial u_m} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial g_n}{\partial u_1} & \frac{\partial g_n}{\partial u_2} & \frac{\partial g_n}{\partial u_3} & \cdots & \frac{\partial g_m}{\partial u_n} \\ \end{matrix} \right)_{m \times n} \left( \begin{matrix} \frac{\partial u_1}{\partial x} \\ \frac{\partial u_2}{\partial x} \\ \frac{\partial u_3}{\partial x} \\ \vdots \\ \frac{\partial u_n}{\partial x} \\ \end{matrix} \right)_{n \times 1} \\ = & \left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} \end{aligned} RHS=(u g )Txu ==u1g1u1g2u1g3u1gnu2g1u2g2u2g3u2gnx3g1x3g2x3g3u3gnumg1umg2umg3ungmm×nxu1xu2xu3xunn×1i=1nuig1xuii=1nuig2xuii=1nuig3xuii=1nuigmxuim×1

如果将 ∂ g ⃗ ∂ x \frac{\partial \vec{g} }{\partial x} xg 看成向量对标量求导,则
L H S = ∂ g ⃗ ∂ x = ( ∂ g 1 ∂ x ∂ g 2 ∂ x ∂ g 3 ∂ x ⋮ ∂ g m ∂ x ) m × 1 = ( ∑ i = 1 n ∂ g 1 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 2 ∂ u i ∂ u i ∂ x ∑ i = 1 n ∂ g 3 ∂ u i ∂ u i ∂ x ⋮ ∑ i = 1 n ∂ g m ∂ u i ∂ u i ∂ x ) m × 1 = R H S . LHS = \frac{\partial \vec{g} }{\partial x} =\left( \begin{matrix} \frac{\partial g_1}{\partial x} \\ \frac{\partial g_2}{\partial x} \\ \frac{\partial g_3}{\partial x} \\ \vdots \\ \frac{\partial g_m}{\partial x} \\ \end{matrix} \right)_{m \times 1} =\left( \begin{matrix} \sum_{i=1}^{n} \frac{\partial g_1}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_2}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \sum_{i=1}^{n} \frac{\partial g_3}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \vdots \\ \sum_{i=1}^{n} \frac{\partial g_m}{\partial u_i} \frac{\partial u_i}{\partial x} \\ \end{matrix} \right)_{m \times 1} = RHS. % \qed LHS=xg =xg1xg2xg3xgmm×1=i=1nuig1xuii=1nuig2xuii=1nuig3xuii=1nuigmxuim×1=RHS.

1.3矩阵对标量求导

∂ Y ∂ x = ( ∂ y ⃗ 1 ∂ x , ∂ y ⃗ 2 ∂ x , ⋯   , ∂ y ⃗ q ∂ x ) = ( ∂ y 11 ∂ x ∂ y 12 ∂ x ∂ y 13 ∂ x ⋯ ∂ y 1 q ∂ x ∂ y 21 ∂ x ∂ y 22 ∂ x ∂ y 23 ∂ x ⋯ ∂ y 2 q ∂ x ∂ y 31 ∂ x ∂ y 32 ∂ x ∂ y 33 ∂ x ⋯ ∂ y 3 q ∂ x ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y p 1 ∂ x ∂ y p 2 ∂ x ∂ y p 3 ∂ x ⋯ ∂ y p q ∂ x ) p × q \frac{\partial Y}{\partial x} = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x}, & \frac{\partial \vec{y}_2}{\partial x}, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x} & \frac{\partial y_{12}}{\partial x} & \frac{\partial y_{13}}{\partial x} & \cdots & \frac{\partial y_{1q}}{\partial x} \\ \frac{\partial y_{21}}{\partial x} & \frac{\partial y_{22}}{\partial x} & \frac{\partial y_{23}}{\partial x} & \cdots & \frac{\partial y_{2q}}{\partial x} \\ \frac{\partial y_{31}}{\partial x} & \frac{\partial y_{32}}{\partial x} & \frac{\partial y_{33}}{\partial x} & \cdots & \frac{\partial y_{3q}}{\partial x} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x} & \frac{\partial y_{p2}}{\partial x} & \frac{\partial y_{p3}}{\partial x} & \cdots & \frac{\partial y_{pq}}{\partial x} \\ \end{matrix} \right)_{p \times q} xY=(xy 1,xy 2,,xy q)=xy11xy21xy31xyp1xy12xy22xy32xyp2xy13xy23xy33xyp3xy1qxy2qxy3qxypqp×q
因此有 d y i j = ∂ y i j ∂ x d x , i = 1 , 2 , ⋯   , p , j = 1 , 2 , ⋯   , q \mathrm{d}y_{ij} = \frac{\partial y_{ij}}{\partial x} \mathrm{d}x, i = 1,2,\cdots, p, j = 1,2,\cdots, q dyij=xyijdx,i=1,2,,p,j=1,2,,q。即
d Y = ( ∂ y ⃗ 1 ∂ x d x , ∂ y ⃗ 2 ∂ x d x , ⋯   , ∂ y ⃗ q ∂ x d x ) = ( ∂ y 11 ∂ x d x ∂ y 12 ∂ x d x ∂ y 13 ∂ x d x ⋯ ∂ y 1 q ∂ x d x ∂ y 21 ∂ x d x ∂ y 22 ∂ x d x ∂ y 23 ∂ x d x ⋯ ∂ y 2 q ∂ x d x ∂ y 31 ∂ x d x ∂ y 32 ∂ x d x ∂ y 33 ∂ x d x ⋯ ∂ y 3 q ∂ x d x ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y p 1 ∂ x d x ∂ y p 2 ∂ x d x ∂ y p 3 ∂ x d x ⋯ ∂ y p q ∂ x d x ) p × q = ∂ Y ∂ x ⊗ d x \mathrm{d}Y = \left( \begin{matrix} \frac{\partial \vec{y}_1}{\partial x} \mathrm{d}x, & \frac{\partial \vec{y}_2}{\partial x} \mathrm{d}x, & \cdots, & \frac{\partial \vec{y}_q}{\partial x} \mathrm{d}x \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{11}}{\partial x}\mathrm{d}x & \frac{\partial y_{12}}{\partial x}\mathrm{d}x & \frac{\partial y_{13}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{1q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{21}}{\partial x}\mathrm{d}x & \frac{\partial y_{22}}{\partial x}\mathrm{d}x & \frac{\partial y_{23}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{2q}}{\partial x}\mathrm{d}x \\ \frac{\partial y_{31}}{\partial x}\mathrm{d}x & \frac{\partial y_{32}}{\partial x}\mathrm{d}x & \frac{\partial y_{33}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{3q}}{\partial x}\mathrm{d}x \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{p1}}{\partial x}\mathrm{d}x & \frac{\partial y_{p2}}{\partial x}\mathrm{d}x & \frac{\partial y_{p3}}{\partial x}\mathrm{d}x & \cdots & \frac{\partial y_{pq}}{\partial x}\mathrm{d}x \\ \end{matrix} \right)_{p \times q} = \frac{\partial Y}{\partial x} \otimes \mathrm{d}x dY=(xy 1dx,xy 2dx,,xy qdx)=xy11dxxy21dxxy31dxxyp1dxxy12dxxy22dxxy32dxxyp2dxxy13dxxy23dxxy33dxxyp3dxxy1qdxxy2qdxxy3qdxxypqdxp×q=xYdx
性质

  • MS1(矩阵数乘):对于 ∀ U ( x ) ∈ F m × n \forall U(x) \in \mathbb{F}^{m \times n} U(x)Fm×n, 有 ∂ ( a U ) ∂ x = a ∂ U ∂ x \frac{\partial (aU)}{\partial x} = a \frac{\partial U}{\partial x} x(aU)=axU
  • MS2(矩阵乘):对于 ∀ U ( x ) ∈ F m × n , A ∈ F r × m , B ∈ F n × s \forall U(x) \in \mathbb{F}^{m \times n}, A \in \mathbb{F}^{r \times m}, B \in \mathbb{F}^{n \times s} U(x)Fm×n,AFr×m,BFn×s, 有 ∂ ( A U B ) ∂ x = A ∂ U ∂ x B \frac{\partial (AUB)}{\partial x} = A \frac{\partial U}{\partial x} B x(AUB)=AxUB
  • MS3(线性):对于 ∀ U ( x ) , V ( x ) ∈ F m × n \forall U(x),V(x) \in \mathbb{F}^{m \times n} U(x),V(x)Fm×n, 有 ∂ ( U + V ) ∂ x = ∂ U ∂ x + ∂ V ∂ x \frac{\partial (U + V)}{\partial x} = \frac{\partial U}{\partial x} +\frac{\partial V}{\partial x} x(U+V)=xU+xV
  • MS4(分部):对于 ∀ U ( x ) ∈ F m × n , V ( x ) ∈ F n × l \forall U(x) \in \mathbb{F}^{m \times n}, V(x) \in \mathbb{F}^{n \times l} U(x)Fm×n,V(x)Fn×l, 有 ∂ ( U V ) ∂ x = ∂ U ∂ x V + U ∂ V ∂ x \frac{\partial (UV)}{\partial x} = \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x} x(UV)=xUV+UxV

先证MS4(分部)
为了书写上的方便,记 ∂ Y ∂ x = ( ∂ y i j ∂ x ) p × q \frac{\partial Y}{\partial x} = \left( \frac{\partial y_{ij}}{\partial x} \right)_{p \times q} xY=(xyij)p×q
∂ ( U V ) ∂ x = ( ∑ k = 1 n ∂ ( u i k v k j ) ∂ x ) m × l = ( ∑ k = 1 n ( ∂ u i k ∂ x v k j + u i k ∂ v k j ∂ x ) ) m × l = ( ∑ k = 1 n ∂ u i k ∂ x v k j ) m × l + ( ∑ k = 1 n u i k ∂ v k j ∂ x ) m × l = ∂ U ∂ x V + U ∂ V ∂ x . \begin{aligned} \frac{\partial (UV)}{\partial x} = & \left( \sum_{k=1}^{n} \frac{\partial \left( u_{ik} v_{kj} \right)}{\partial x} \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \left( \frac{\partial u_{ik}}{\partial x}v_{kj} + u_{ik}\frac{\partial v_{kj}}{\partial x} \right) \right)_{m \times l} \\ = & \left( \sum_{k=1}^{n} \frac{\partial u_{ik}}{\partial x}v_{kj} \right)_{m \times l} + \left( \sum_{k=1}^{n} u_{ik}\frac{\partial v_{kj}}{\partial x} \right)_{m \times l}\\ = & \frac{\partial U}{\partial x} V + U \frac{\partial V}{\partial x}. \end{aligned} x(UV)====(k=1nx(uikvkj))m×l(k=1n(xuikvkj+uikxvkj))m×l(k=1nxuikvkj)m×l+(k=1nuikxvkj)m×lxUV+UxV.

根据 MS4(分部) 再证 MS2(矩阵乘)
∂ ( A U B ) ∂ x = ∂ A ∂ x U B + A ( ∂ U B ∂ x ) = ∂ A ∂ x U B + A ( ∂ U ∂ x B + U ∂ B ∂ x ) = 0 U B + A ∂ U ∂ x B + A U 0 = A ∂ U ∂ x B . \begin{aligned} \frac{\partial (AUB)}{\partial x} = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial UB}{\partial x} \right) \\ = & \frac{\partial A}{\partial x}UB + A \left( \frac{\partial U}{\partial x}B + U\frac{\partial B}{\partial x} \right)\\ = & 0UB + A\frac{\partial U}{\partial x}B + AU0 \\ = & A\frac{\partial U}{\partial x}B. \end{aligned} x(AUB)====xAUB+A(xUB)xAUB+A(xUB+UxB)0UB+AxUB+AU0AxUB.

2.对向量求导

2.1标量对向量求导

∂ y ∂ x ⃗ = ( ∂ y ∂ x 1 ∂ y ∂ x 2 ∂ y ∂ x 3 ⋮ ∂ y ∂ x n ) n × 1 \frac{\partial y}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y}{\partial x_1} \\ \frac{\partial y}{\partial x_2} \\ \frac{\partial y}{\partial x_3} \\ \vdots \\ \frac{\partial y}{\partial x_n} \\ \end{matrix} \right)_{n \times 1} x y=x1yx2yx3yxnyn×1
上式俗称梯度。
根据全微分公式:
d y = ∑ i = 1 n ∂ y ∂ x i d x i = ( ∂ y ∂ x 1 , ∂ y ∂ x 2 , ∂ y ∂ x 3 , ⋯   , ∂ y ∂ x n ) × ( d x 1 d x 2 d x 3 ⋮ d x n ) = ( ∂ y ∂ x ⃗ ) T d x ⃗ \mathrm{d}y = \sum_{i=1}^{n} \frac{\partial y}{\partial x_i} \mathrm{d}x_i = \left( \begin{matrix} \frac{\partial y}{\partial x_1}, & \frac{\partial y}{\partial x_2}, & \frac{\partial y}{\partial x_3}, & \cdots, & \frac{\partial y}{\partial x_n} \end{matrix} \right) \times \left( \begin{matrix} \mathrm{d}x_1 \\ \mathrm{d}x_2 \\ \mathrm{d}x_3 \\ \vdots \\ \mathrm{d}x_n \end{matrix} \right) = \left( \frac{\partial y}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dy=i=1nxiydxi=(x1y,x2y,x3y,,xny)×dx1dx2dx3dxn=(x y)Tdx
性质

  • SV1(数乘) :对于 ∀ u ( x ) , a ∈ F \forall u(x), a \in \mathbb{F} u(x),aF, 有 ∂ ( a u ) ∂ x ⃗ = a ∂ u ∂ x ⃗ \frac{\partial (au)}{\partial \vec{x}} = a \frac{\partial u}{\partial \vec{x}} x (au)=ax u
  • SV2(线性):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} u(x),v(x)F, 有 ∂ ( u + v ) ∂ x ⃗ = ∂ u ∂ x ⃗ + ∂ v ∂ x ⃗ \frac{\partial (u + v)}{\partial \vec{x}} = \frac{\partial u}{\partial \vec{x}} + \frac{\partial v}{\partial \vec{x}} x (u+v)=x u+x v
  • SV3(分部):对于 ∀ u ( x ) , v ( x ) ∈ F \forall u(x),v(x) \in \mathbb{F} u(x),v(x)F, 有 ∂ ( u v ) ∂ x = ∂ u ∂ x ⃗ v + u ∂ v ∂ x ⃗ \frac{\partial (uv)}{\partial x} = \frac{\partial u}{\partial \vec{x}} v + u \frac{\partial v}{\partial \vec{x}} x(uv)=x uv+ux v
  • SV4(链式):对于 ∀ u ( x ) , g ( u ) ∈ F \forall u(x),g(u) \in \mathbb{F} u(x),g(u)F, 有 ∂ g ( u ) ∂ x ⃗ = ∂ g ( u ) ∂ u ∂ u ∂ x ⃗ \frac{\partial g(u)}{\partial \vec{x}} = \frac{\partial g(u)}{\partial u}\frac{\partial u}{\partial \vec{x}} x g(u)=ug(u)x u
  • SV5:对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} u (x ),v (x )Fm, 有 ∂ ( u ⃗ T v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ u ⃗ + ∂ u ⃗ ∂ x ⃗ v ⃗ \frac{\partial (\vec{u}^T \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} \vec{v} x (u Tv )=x v u +x u v
  • SV6:对于 ∀ A ∈ F m × n , u ⃗ ( x ⃗ ) ∈ F m , v ⃗ ( x ⃗ ) ∈ F n \forall A \in \mathbb{F}^{m \times n}, \vec{u}(\vec{x}) \in \mathbb{F}^{m}, \vec{v}(\vec{x}) \in \mathbb{F}^{n} AFm×n,u (x )Fm,v (x )Fn,有 ∂ ( u ⃗ T A v ⃗ ) ∂ x ⃗ = ∂ v ⃗ ∂ x ⃗ A T u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ \frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} = \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v} x (u TAv )=x v ATu +x u Av

首先对SV5做简要证明。左右两边都是 n × 1 n \times 1 n×1向量,只需证每行相等即可。对于第 i i i行,
L H S i = ∂ u ⃗ T v ⃗ ∂ x i = ∂ ∂ x i ( ∑ j = 1 m u i v i ) = ∑ j = 1 m ( ∂ u i v i ∂ x i ) = ∑ j = 1 m ( ∂ v i ∂ x i u i + ∂ u i ∂ x i v i ) . \begin{aligned} LHS_i = & \frac{\partial \vec{u}^T \vec{v}}{\partial x_i} = \frac{\partial }{\partial x_i} \left( \sum_{j=1}^{m} u_i v_i \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial u_i v_i}{\partial x_i} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i + \frac{\partial u_i}{\partial x_i}v_i \right). \\ \end{aligned} LHSi===xiu Tv =xi(j=1muivi)j=1m(xiuivi)j=1m(xiviui+xiuivi).

根据向量对向量求导,右边第 i i i行为
R H S i = ( ∂ v 1 ∂ x i , ∂ v 2 ∂ x i , ∂ v 3 ∂ x i , ⋯   , ∂ v m ∂ x i ) ( u 1 u 2 u 3 ⋮ u m ) + ( ∂ u 1 ∂ x i , ∂ u 2 ∂ x i , ∂ u 3 ∂ x i , ⋯   , ∂ u m ∂ x i ) ( v 1 v 2 v 3 ⋮ v m ) = ∑ j = 1 m ( ∂ v i ∂ x i u i ) + ∑ j = 1 m ( ∂ u i ∂ x i v i ) = L H S i . \begin{aligned} RHS_i = & \left( \begin{matrix} \frac{\partial v_1}{\partial x_i}, & \frac{\partial v_2}{\partial x_i}, & \frac{\partial v_3}{\partial x_i}, & \cdots, & \frac{\partial v_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} u_1 \\ u_2 \\ u_3 \\ \vdots \\ u_m \end{matrix} \right) + \left( \begin{matrix} \frac{\partial u_1}{\partial x_i}, & \frac{\partial u_2}{\partial x_i}, & \frac{\partial u_3}{\partial x_i}, & \cdots, & \frac{\partial u_m}{\partial x_i} \end{matrix} \right) \left( \begin{matrix} v_1 \\ v_2 \\ v_3 \\ \vdots \\ v_m \end{matrix} \right) \\ = & \sum_{j=1}^{m} \left( \frac{\partial v_i}{\partial x_i}u_i \right) + \sum_{j=1}^{m} \left( \frac{\partial u_i}{\partial x_i}v_i \right) \\ = & LHS_i. \end{aligned} RHSi===(xiv1,xiv2,xiv3,,xivm)u1u2u3um+(xiu1,xiu2,xiu3,,xium)v1v2v3vmj=1m(xiviui)+j=1m(xiuivi)LHSi.

关于SV6,证明如下:
∂ ( u ⃗ T A v ⃗ ) ∂ x ⃗ = S V 5 ∂ A v ⃗ ∂ x ⃗ u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ = V V 3 ∂ v ⃗ ∂ x ⃗ A T u ⃗ + ∂ u ⃗ ∂ x ⃗ A v ⃗ . \frac{\partial \left( \vec{u}^T A \vec{v} \right)}{\partial \vec{x}} \overset{SV5}{=} \frac{\partial A\vec{v}}{\partial \vec{x}} \vec{u} + \frac{\partial \vec{u}}{\partial \vec{x}} A\vec{v} \overset{VV3}{=} \frac{\partial \vec{v}}{\partial \vec{x}} A^T \vec{u} + \frac{\partial \vec{u} }{\partial \vec{x}} A \vec{v}. x (u TAv )=SV5x Av u +x u Av =VV3x v ATu +x u Av .

2.2向量对向量求导

∂ y i ∂ x ⃗ = ( ∂ y i ∂ x 1 ∂ y i ∂ x 2 ∂ y i ∂ x 3 ⋮ ∂ y i ∂ x n ) n × 1 , i = 1 , 2 , ⋯   , m . \frac{\partial y_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_i}{\partial x_1} \\ \frac{\partial y_i}{\partial x_2} \\ \frac{\partial y_i}{\partial x_3} \\ \vdots \\ \frac{\partial y_i}{\partial x_n} \end{matrix} \right)_{n \times 1} , i = 1, 2, \cdots, m. x yi=x1yix2yix3yixnyin×1,i=1,2,,m.
因此
∂ y ⃗ ∂ x ⃗ = ( ∂ y 1 ∂ x ⃗ , ∂ y 2 ∂ x ⃗ , ∂ y 3 ∂ x ⃗ , ⋯   , ∂ y m ∂ x ⃗ ) = ( ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ∂ y 3 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ∂ y 3 ∂ x 2 ⋯ ∂ y m ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 3 ⋯ ∂ y m ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ∂ y 3 ∂ x n ⋯ ∂ y m ∂ x n ) n × m \frac{\partial \vec{y}}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_1}{\partial \vec{x}}, & \frac{\partial y_2}{\partial \vec{x}}, & \frac{\partial y_3}{\partial \vec{x}}, & \cdots, & \frac{\partial y_m}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)_{n \times m} x y =(x y1,x y2,x y3,,x ym)=x1y1x2y1x3y1xny1x1y2x2y2x3y2xny2x1y3x2y3x3y3xny3x1ymx2ymx3ymxnymn×m
由上面的标量对向量求导,可知 d y i = ( ∂ y i ∂ x ⃗ ) T d x ⃗ \mathrm{d}y_i = \left( \frac{\partial y_i}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dyi=(x yi)Tdx .因此
d y ⃗ = ( d y 1 d y 2 d y 3 ⋮ d y m ) = [ ( ∂ y 1 ∂ x ⃗ ) T d x ⃗ ( ∂ y 2 ∂ x ⃗ ) T d x ⃗ ( ∂ y 3 ∂ x ⃗ ) T d x ⃗ ⋮ ( ∂ y m ∂ x ⃗ ) T d x ⃗ ] = [ ( ∂ y 1 ∂ x ⃗ ) T ( ∂ y 2 ∂ x ⃗ ) T ( ∂ y 3 ∂ x ⃗ ) T ⋮ ( ∂ y m ∂ x ⃗ ) T ] d x ⃗ = ( ∂ y 1 ∂ x 1 ∂ y 2 ∂ x 1 ∂ y 3 ∂ x 1 ⋯ ∂ y m ∂ x 1 ∂ y 1 ∂ x 2 ∂ y 2 ∂ x 2 ∂ y 3 ∂ x 2 ⋯ ∂ y m ∂ x 2 ∂ y 1 ∂ x 3 ∂ y 2 ∂ x 3 ∂ y 3 ∂ x 3 ⋯ ∂ y m ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 ∂ x n ∂ y 2 ∂ x n ∂ y 3 ∂ x n ⋯ ∂ y m ∂ x n ) T d x ⃗ = ( ∂ y ⃗ ∂ x ⃗ ) T d x ⃗ \mathrm{d} \vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \mathrm{d}y_3 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} \end{matrix} \right] = \left[ \begin{matrix} \left( \frac{\partial y_1}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_2}{\partial \vec{x}} \right)^T \\ \left( \frac{\partial y_3}{\partial \vec{x}} \right)^T \\ \vdots \\ \left( \frac{\partial y_m}{\partial \vec{x}} \right)^T \end{matrix} \right] \mathrm{d}\vec{x} = \left( \begin{matrix} \frac{\partial y_1}{\partial x_1} & \frac{\partial y_2}{\partial x_1} & \frac{\partial y_3}{\partial x_1} & \cdots & \frac{\partial y_m}{\partial x_1} \\ \frac{\partial y_1}{\partial x_2} & \frac{\partial y_2}{\partial x_2} & \frac{\partial y_3}{\partial x_2} & \cdots & \frac{\partial y_m}{\partial x_2} \\ \frac{\partial y_1}{\partial x_3} & \frac{\partial y_2}{\partial x_3} & \frac{\partial y_3}{\partial x_3} & \cdots & \frac{\partial y_m}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_1}{\partial x_n} & \frac{\partial y_2}{\partial x_n} & \frac{\partial y_3}{\partial x_n} & \cdots & \frac{\partial y_m}{\partial x_n} \\ \end{matrix} \right)^{T} \mathrm{d}\vec{x} = \left( \frac{\partial \vec{y}}{\partial \vec{x}} \right)^T \mathrm{d}\vec{x} dy =dy1dy2dy3dym=(x y1)Tdx (x y2)Tdx (x y3)Tdx (x ym)Tdx =(x y1)T(x y2)T(x y3)T(x ym)Tdx =x1y1x2y1x3y1xny1x1y2x2y2x3y2xny2x1y3x2y3x3y3xny3x1ymx2ymx3ymxnymTdx =(x y )Tdx

  • VV1(数乘):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , a ( x ⃗ ) ∈ F \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, a(\vec{x}) \in \mathbb{F} u (x )Fm,a(x )F, 有 ∂ ( a u ⃗ ) ∂ x ⃗ = a ∂ u ⃗ ∂ x ⃗ + ∂ a ∂ x ⃗ u ⃗ T \frac{\partial (a\vec{u})}{\partial \vec{x}} = a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T x (au )=ax u +x au T
  • VV2(线性):对于 ∀ u ⃗ ( x ⃗ ) , v ⃗ ( x ⃗ ) ∈ F m \forall \vec{u}(\vec{x}), \vec{v}(\vec{x}) \in \mathbb{F}^{m} u (x ),v (x )Fm, 有 ∂ ( u ⃗ + v ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ + ∂ v ⃗ ∂ x ⃗ \frac{\partial (\vec{u} + \vec{v})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial \vec{v}}{\partial \vec{x}} x (u +v )=x u +x v
  • VV3(乘矩阵):对于 ∀ u ⃗ ( x ⃗ ) ∈ F m , A ∈ F p × m \forall \vec{u}(\vec{x}) \in \mathbb{F}^{m}, A \in \mathbb{F}^{p \times m} u (x )Fm,AFp×m, 有 ∂ ( A u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ A T \frac{\partial (A \vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} A^T x (Au )=x u AT
  • VV4(链式):对于 ∀ u ⃗ ( x ⃗ ) ∈ F p , g ⃗ ( u ⃗ ) ∈ F q \forall \vec{u}(\vec{x}) \in \mathbb{F}^{p}, \vec{g}(\vec{u}) \in \mathbb{F}^{q} u (x )Fp,g (u )Fq, 有 ∂ g ⃗ ( u ⃗ ) ∂ x ⃗ = ∂ u ⃗ ∂ x ⃗ ∂ g ⃗ ∂ u ⃗ \frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} x g (u )=x u u g

VV1(数乘),同样为了书写方便记 ∂ u ⃗ x ⃗ = ( ∂ u j ∂ x i ) n × m \frac{\partial \vec{u}}{\vec{x}} = \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m} x u =(xiuj)n×m
∂ ( a u ⃗ ) ∂ x ⃗ = ( ∂ a u j ∂ x i ) n × m = ( a ∂ u j ∂ x i + ∂ a ∂ x i u j ) n × m = ( a ∂ u j ∂ x i ) n × m + ( ∂ a ∂ x i u j ) n × m = a ( ∂ u j ∂ x i ) n × m + ( ∂ a ∂ x 1 ∂ a ∂ x 2 ∂ a ∂ x 3 ⋮ ∂ a ∂ x n ) ( u 1 , u 2 , u 3 , ⋯   , u m ) = a ∂ u ⃗ ∂ x ⃗ + ∂ a ∂ x ⃗ u ⃗ T . \begin{aligned} \frac{\partial (a\vec{u})}{\partial \vec{x}} = & \left( \frac{\partial a u_j}{\partial x_i} \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} + \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & \left( a \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \frac{\partial a}{\partial x_i} u_j \right)_{n \times m} \\ = & a \left( \frac{\partial u_j}{\partial x_i} \right)_{n \times m} + \left( \begin{matrix} \frac{\partial a}{\partial x_1} \\ \frac{\partial a}{\partial x_2} \\ \frac{\partial a}{\partial x_3} \\ \vdots \\ \frac{\partial a}{\partial x_n} \end{matrix} \right) \left( \begin{matrix} u_1, & u_2, & u_3, & \cdots, u_m \end{matrix} \right) \\ = & a \frac{\partial \vec{u}}{\partial \vec{x}} + \frac{\partial a}{\partial \vec{x}} \vec{u}^T. \end{aligned} x (au )=====(xiauj)n×m(axiuj+xiauj)n×m(axiuj)n×m+(xiauj)n×ma(xiuj)n×m+x1ax2ax3axna(u1,u2,u3,,um)ax u +x au T.
VV3(乘矩阵),向量 A u ⃗ A\vec{u} Au 记成 ( ∑ k = 1 m a j k u k ) p \left( \sum_{k=1}^{m} a_{jk} u_k \right)_p (k=1majkuk)p,即其第 j j j行元素为 ∑ k = 1 m a j k u k \sum_{k=1}^{m} a_{jk} u_k k=1majkuk
根据向量对向量求导的特点,可以得到
∂ ∂ x ⃗ ( ∑ k = 1 m a j k u k ) = ( ∂ ∂ x 1 ( ∑ k = 1 m a j k u k ) ∂ ∂ x 2 ( ∑ k = 1 m a j k u k ) ∂ ∂ x 3 ( ∑ k = 1 m a j k u k ) ⋮ ∂ ∂ x n ( ∑ k = 1 m a j k u k ) ) \frac{\partial}{\partial \vec{x}} \left( \sum_{k=1}^{m} a_{jk} u_k \right) = \left( \begin{matrix} \frac{\partial}{\partial x_1} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_2} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \frac{\partial}{\partial x_3} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \\ \vdots \\ \frac{\partial}{\partial x_n} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \end{matrix} \right) x (k=1majkuk)=x1(k=1majkuk)x2(k=1majkuk)x3(k=1majkuk)xn(k=1majkuk)
因此LHS可以写为
∂ ( A u ⃗ ) ∂ x ⃗ = ( ∂ ∂ x i ( ∑ k = 1 m a j k u k ) ) n × p = ( ∑ k = 1 m ( ∂ u k ∂ x i a j k ) ) n × p . \frac{\partial (A \vec{u})}{\partial \vec{x}} = \left( \frac{\partial}{\partial x_i} \left( \sum_{k=1}^{m} a_{jk} u_k \right) \right)_{n \times p} = \left( \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) \right)_{n \times p}. x (Au )=(xi(k=1majkuk))n×p=(k=1m(xiukajk))n×p.
L H S i , j = ∑ k = 1 m ( ∂ u k ∂ x i a j k ) LHS_{i,j} = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) LHSi,j=k=1m(xiukajk)
现在考虑 R H S i , j RHS_{i,j} RHSi,j,它是由 ∂ u ⃗ ∂ x ⃗ \frac{\partial \vec{u}}{\partial \vec{x}} x u 的第 i i i行乘 A T A^T AT的第 j j j列得到的。
R H S i , j = ( ∂ u 1 ∂ x i , ∂ u 2 ∂ x i , ⋯   , ∂ u m ∂ x i ) ( a j 1 a j 2 ⋮ a j m ) = ∑ k = 1 m ( ∂ u k ∂ x i a j k ) = L H S i , j . RHS_{i,j} = \left( \frac{\partial u_1}{\partial x_i}, \frac{\partial u_2}{\partial x_i}, \cdots ,\frac{\partial u_m}{\partial x_i} \right) \left( \begin{matrix} a_{j1} \\ a_{j2} \\ \vdots \\ a_{jm} \\ \end{matrix} \right) = \sum_{k=1}^{m} \left( \frac{\partial u_k}{\partial x_i} a_{jk} \right) = LHS_{i,j}. RHSi,j=(xiu1,xiu2,,xium)aj1aj2ajm=k=1m(xiukajk)=LHSi,j.

最后证VV4(链式)
∂ u ⃗ ∂ x ⃗ ∂ g ⃗ ∂ u ⃗ = ( ∂ u j ∂ x i ) n × p ( ∂ g k ∂ u j ) p × q = ( ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) ) n × q \frac{\partial \vec{u}}{\partial \vec{x}} \frac{\partial \vec{g}}{\partial \vec{u}} = \left( \frac{\partial u_j}{ \partial x_i} \right)_{n \times p} \left( \frac{\partial g_k}{ \partial u_j} \right)_{p \times q} = \left( \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \right)_{n \times q} x u u g =(xiuj)n×p(ujgk)p×q=(j=1p(xiujujgk))n×q
R H S i , k = ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) RHS_{i,k} = \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) RHSi,k=j=1p(xiujujgk)
∂ g ⃗ ( u ⃗ ) ∂ x ⃗ = ( ∂ g k ∂ x i ) n × q \frac{\partial \vec{g}(\vec{u})}{\partial \vec{x}} = \left( \frac{\partial g_k}{ \partial x_i} \right)_{n \times q} x g (u )=(xigk)n×q,即
L H S i , k = ∂ g k ∂ x i = S S 3 ∂ g k ∂ u 1 ∂ u 1 ∂ x i + ∂ g k ∂ u 2 ∂ u 2 ∂ x i + ⋯ + ∂ g k ∂ u p ∂ u p ∂ x i = ∑ j = 1 p ( ∂ u j ∂ x i ∂ g k ∂ u j ) = R H S i , k . \begin{aligned} LHS_{i,k} = & \frac{\partial g_k}{ \partial x_i} \\ \overset{SS3}{=} & \frac{\partial g_k}{ \partial u_1} \frac{\partial u_1}{ \partial x_i} + \frac{\partial g_k}{ \partial u_2} \frac{\partial u_2}{ \partial x_i} + \cdots + \frac{\partial g_k}{ \partial u_p} \frac{\partial u_p}{ \partial x_i} \\ = & \sum_{j=1}^{p} \left( \frac{\partial u_j}{ \partial x_i} \frac{\partial g_k}{ \partial u_j} \right) \\ = & RHS_{i,k}. \end{aligned} LHSi,k==SS3==xigku1gkxiu1+u2gkxiu2++upgkxiupj=1p(xiujujgk)RHSi,k.

2.3矩阵对向量求导

首先将矩阵 Y Y Y按列优先向量化,即
v e c ( Y p × q ) = v e c ( ( y 11 y 12 y 13 ⋯ y 1 q y 21 y 22 y 23 ⋯ y 2 q y 31 y 32 y 33 ⋯ y 3 q ⋮ ⋮ ⋮ ⋱ ⋮ y p 1 y p 2 y p 3 ⋯ y p q ) p × q ) = ( y ⃗ 1 , y ⃗ 2 , y ⃗ 3 , ⋯   , y ⃗ q ) T = ( y 11 y 21 ⋮ y p 1 y 12 y 22 ⋮ y p 2 ⋮ ⋮ y 1 q y 2 q ⋮ y p q ) p q × 1 . vec(Y_{p \times q}) = vec \left( \left( \begin{matrix} y_{11} & y_{12} & y_{13} & \cdots & y_{1q} \\ y_{21} & y_{22} & y_{23} & \cdots & y_{2q} \\ y_{31} & y_{32} & y_{33} & \cdots & y_{3q} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ y_{p1} & y_{p2} & y_{p3} & \cdots & y_{pq} \end{matrix} \right)_{p \times q} \right) = \left( \vec{y}_1, \vec{y}_2, \vec{y}_3, \cdots, \vec{y}_q \right)^T = \left( \begin{matrix} y_{11} \\ y_{21} \\ \vdots \\ y_{p1} \\ y_{12} \\ y_{22} \\ \vdots \\ y_{p2} \\ \vdots \\ \vdots \\ y_{1q} \\ y_{2q} \\ \vdots \\ y_{pq} \end{matrix} \right)_{pq \times 1}. vec(Yp×q)=vecy11y21y31yp1y12y22y32yp2y13y23y33yp3y1qy2qy3qypqp×q=(y 1,y 2,y 3,,y q)T=y11y21yp1y12y22yp2y1qy2qypqpq×1.
根据向量对向量求导,有
∂ y ⃗ i ∂ x ⃗ = ( ∂ y 1 i ∂ x ⃗ , ∂ y 2 i ∂ x ⃗ , ∂ y 3 i ∂ x ⃗ , ⋯   , ∂ y p i ∂ x ⃗ ) = ( ∂ y 1 i ∂ x 1 ∂ y 2 i ∂ x 1 ∂ y 3 i ∂ x 1 ⋯ ∂ y p i ∂ x 1 ∂ y 1 i ∂ x 2 ∂ y 2 i ∂ x 2 ∂ y 3 i ∂ x 2 ⋯ ∂ y p i ∂ x 2 ∂ y 1 i ∂ x 3 ∂ y 2 i ∂ x 3 ∂ y 3 i ∂ x 3 ⋯ ∂ y p i ∂ x 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y 1 i ∂ x n ∂ y 2 i ∂ x n ∂ y 3 i ∂ x n ⋯ ∂ y p i ∂ x n ) n × p \frac{\partial \vec{y}_i}{\partial \vec{x}} = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial \vec{x}}, & \frac{\partial y_{2i}}{\partial \vec{x}}, & \frac{\partial y_{3i}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pi}}{\partial \vec{x}} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y_{1i}}{\partial x_1} & \frac{\partial y_{2i}}{\partial x_1} & \frac{\partial y_{3i}}{\partial x_1} & \cdots & \frac{\partial y_{pi}}{\partial x_1} \\ \frac{\partial y_{1i}}{\partial x_2} & \frac{\partial y_{2i}}{\partial x_2} & \frac{\partial y_{3i}}{\partial x_2} & \cdots & \frac{\partial y_{pi}}{\partial x_2} \\ \frac{\partial y_{1i}}{\partial x_3} & \frac{\partial y_{2i}}{\partial x_3} & \frac{\partial y_{3i}}{\partial x_3} & \cdots & \frac{\partial y_{pi}}{\partial x_3} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y_{1i}}{\partial x_n} & \frac{\partial y_{2i}}{\partial x_n} & \frac{\partial y_{3i}}{\partial x_n} & \cdots & \frac{\partial y_{pi}}{\partial x_n} \\ \end{matrix} \right)_{n \times p} x y i=(x y1i,x y2i,x y3i,,x ypi)=x1y1ix2y1ix3y1ixny1ix1y2ix2y2ix3y2ixny2ix1y3ix2y3ix3y3ixny3ix1ypix2ypix3ypixnypin×p
因此
∂ v e c ( Y ) ∂ x ⃗ = ( ∂ y 11 ∂ x ⃗ , ∂ y 21 ∂ x ⃗ , ⋯   , ∂ y p 1 ∂ x ⃗ , ∂ y 22 ∂ x ⃗ , ⋯   , ∂ y p 2 ∂ x ⃗ , ⋯   , ⋯   , ∂ y p q ∂ x ⃗ ) = ( ∂ y 11 ∂ x 1 , ∂ y 21 ∂ x 1 , ⋯   , ∂ y p 1 ∂ x 1 , ∂ y 22 ∂ x 1 , ⋯   , ∂ y p 2 ∂ x 1 , ⋯   , ⋯   , ∂ y p q ∂ x 1 ∂ y 11 ∂ x 2 , ∂ y 21 ∂ x 2 , ⋯   , ∂ y p 1 ∂ x 2 , ∂ y 22 ∂ x 2 , ⋯   , ∂ y p 2 ∂ x 2 , ⋯   , ⋯   , ∂ y p q ∂ x 2 ∂ y 11 ∂ x 3 , ∂ y 21 ∂ x 3 , ⋯   , ∂ y p 1 ∂ x 3 , ∂ y 22 ∂ x 3 , ⋯   , ∂ y p 2 ∂ x 3 , ⋯   , ⋯   , ∂ y p q ∂ x 3 ⋮ ⋮ ⋱ , ⋮ ⋮ ⋱ , ⋮ ⋱ , ⋯   , ∂ y p q ∂ x 2 ∂ y 11 ∂ x n , ∂ y 21 ∂ x n , ⋯   , ∂ y p 1 ∂ x n , ∂ y 22 ∂ x n , ⋯   , ∂ y p 2 ∂ x n , ⋯   , ⋯   , ∂ y p q ∂ x n ) n × p q \begin{aligned} \frac{\partial vec(Y)}{\partial \vec{x}} = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial \vec{x}}, & \frac{\partial y_{21}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p1}}{\partial \vec{x}}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{p2}} {\partial \vec{x}}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial \vec{x}} \end{matrix} \right) \\ = & \left( \begin{matrix} \frac{\partial y_{11}}{\partial x_1}, & \frac{\partial y_{21}}{\partial x_1}, & \cdots, & \frac{\partial y_{p1}}{\partial x_1}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_1}, & \cdots, & \frac{\partial y_{p2}} {\partial x_1}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_1} \\ \frac{\partial y_{11}}{\partial x_2}, & \frac{\partial y_{21}}{\partial x_2}, & \cdots, & \frac{\partial y_{p1}}{\partial x_2}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_2}, & \cdots, & \frac{\partial y_{p2}} {\partial x_2}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_3}, & \frac{\partial y_{21}}{\partial x_3}, & \cdots, & \frac{\partial y_{p1}}{\partial x_3}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_3}, & \cdots, & \frac{\partial y_{p2}} {\partial x_3}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_3} \\ \vdots & \vdots & \ddots, & \vdots & % \frac{\partial y_{12}}{\partial \vec{x}}, & \vdots & \ddots, & \vdots & \ddots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_2} \\ \frac{\partial y_{11}}{\partial x_n}, & \frac{\partial y_{21}}{\partial x_n}, & \cdots, & \frac{\partial y_{p1}}{\partial x_n}, & % \frac{\partial y_{12}}{\partial \vec{x}}, & \frac{\partial y_{22}}{\partial x_n}, & \cdots, & \frac{\partial y_{p2}} {\partial x_n}, & \cdots, & % \cdots, & % \frac{\partial y_{1q}} {\partial \vec{x}}, & % \frac{\partial y_{2q}}{\partial \vec{x}}, & \cdots, & \frac{\partial y_{pq}}{\partial x_n} \\ \end{matrix} \right)_{n \times pq} \end{aligned} x vec(Y)==(x y11,x y21,,x yp1,x y22,,x yp2,,,x ypq)x1y11,x2y11,x3y11,xny11,x1y21,x2y21,x3y21,xny21,,,,,,x1yp1,x2yp1,x3yp1,xnyp1,x1y22,x2y22,x3y22,xny22,,,,,,x1yp2,x2yp2,x3yp2,xnyp2,,,,,,,,,,,x1ypqx2ypqx3ypqx2ypqxnypqn×pq

v e c ( d Y ) = ( ∂ v e c ( Y ) ∂ x ⃗ ) T d x ⃗ vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial \vec{x}} \right)^T \mathrm{d} \vec{x} vec(dY)=(x vec(Y))Tdx

3.对矩阵求导

3.1标量对矩阵求导

∂ y ∂ X = ( ∂ y ∂ x ⃗ 1 , ∂ y ∂ x ⃗ 2 , ⋯   , ∂ y ∂ x ⃗ s ) = ( ∂ y ∂ x 11 ∂ y ∂ x 12 ∂ y ∂ x 13 ⋯ ∂ y ∂ x 1 s ∂ y ∂ x 21 ∂ y ∂ x 22 ∂ y ∂ x 23 ⋯ ∂ y ∂ x 2 s ∂ y ∂ x 31 ∂ y ∂ x 32 ∂ y ∂ x 33 ⋯ ∂ y ∂ x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x r 1 ∂ y ∂ x r 2 ∂ y ∂ x r 3 ⋯ ∂ y ∂ x r s ) r × s \frac{\partial y}{\partial X} = \left( \begin{matrix} \frac{\partial y}{\partial \vec{x}_1}, & \frac{\partial y}{\partial \vec{x}_2}, & \cdots, & \frac{\partial y}{\partial \vec{x}_s} \end{matrix} \right) = \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{13}} & \cdots & \frac{\partial y}{\partial x_{1s}} \\ \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{23}} & \cdots & \frac{\partial y}{\partial x_{2s}} \\ \frac{\partial y}{\partial x_{31}} & \frac{\partial y}{\partial x_{32}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{3s}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{r1}} & \frac{\partial y}{\partial x_{r2}} & \frac{\partial y}{\partial x_{r3}} & \cdots & \frac{\partial y}{\partial x_{rs}} \\ \end{matrix} \right)_{r \times s} Xy=(x 1y,x 2y,,x sy)=x11yx21yx31yxr1yx12yx22yx32yxr2yx13yx23yx33yxr3yx1syx2syx3syxrsyr×s
同样的,由全微分公式有 d y = ∑ i = 1 r ∑ j = 1 s ∂ y ∂ x i j d x i j \mathrm{d}y = \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} dy=i=1rj=1sxijydxij
( ∂ y ∂ X ) T d X = ( ∂ y ∂ x 11 ∂ y ∂ x 21 ∂ y ∂ x 31 ⋯ ∂ y ∂ x r 1 ∂ y ∂ x 12 ∂ y ∂ x 22 ∂ y ∂ x 32 ⋯ ∂ y ∂ x r 2 ∂ y ∂ x 13 ∂ y ∂ x 23 ∂ y ∂ x 33 ⋯ ∂ y ∂ x r 3 ⋮ ⋮ ⋮ ⋱ ⋮ ∂ y ∂ x 1 s ∂ y ∂ x 2 s ∂ y ∂ x 3 s ⋯ ∂ y ∂ x r s ) s × r × ( d x 11 d x 12 d x 13 ⋯ d x 1 s d x 21 d x 22 d x 23 ⋯ d x 2 s d x 31 d x 32 d x 33 ⋯ d x 3 s ⋮ ⋮ ⋮ ⋱ ⋮ d x r 1 d x r 2 d x r 3 ⋯ d x r s ) r × s = ( ∑ i = 1 r ∂ y ∂ x i 1 d x i 1 ⋯ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i 2 d x i 2 ⋯ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i 3 d x i 3 ⋯ ⋯ ⋮ ⋮ ⋮ ⋱ ⋮ ⋯ ⋯ ⋯ ⋯ ∑ i = 1 r ∂ y ∂ x i s d x i s ) s × s \begin{aligned} \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X = & \left( \begin{matrix} \frac{\partial y}{\partial x_{11}} & \frac{\partial y}{\partial x_{21}} & \frac{\partial y}{\partial x_{31}} & \cdots & \frac{\partial y}{\partial x_{r1}} \\ \frac{\partial y}{\partial x_{12}} & \frac{\partial y}{\partial x_{22}} & \frac{\partial y}{\partial x_{32}} & \cdots & \frac{\partial y}{\partial x_{r2}} \\ \frac{\partial y}{\partial x_{13}} & \frac{\partial y}{\partial x_{23}} & \frac{\partial y}{\partial x_{33}} & \cdots & \frac{\partial y}{\partial x_{r3}} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \frac{\partial y}{\partial x_{1s}} & \frac{\partial y}{\partial x_{2s}} & \frac{\partial y}{\partial x_{3s}} & \cdots & \frac{\partial y}{\partial x_{rs}} \end{matrix} \right)_{s \times r} \times \left( \begin{matrix} \mathrm{d}x_{11} & \mathrm{d}x_{12} & \mathrm{d}x_{13} & \cdots & \mathrm{d}x_{1s} \\ \mathrm{d}x_{21} & \mathrm{d}x_{22} & \mathrm{d}x_{23} & \cdots & \mathrm{d}x_{2s} \\ \mathrm{d}x_{31} & \mathrm{d}x_{32} & \mathrm{d}x_{33} & \cdots & \mathrm{d}x_{3s} \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \mathrm{d}x_{r1} & \mathrm{d}x_{r2} & \mathrm{d}x_{r3} & \cdots & \mathrm{d}x_{rs} \end{matrix} \right)_{r \times s} \\ = & \left( \begin{matrix} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i1}} \mathrm{d}x_{i1} & \cdots & \cdots & \cdots & \cdots \\ \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i2}} \mathrm{d}x_{i2} & \cdots & \cdots & \cdots \\ \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{i3}} \mathrm{d}x_{i3} & \cdots & \cdots \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ \cdots & \cdots & \cdots & \cdots & \sum_{i=1}^{r} \frac{\partial y}{\partial x_{is}} \mathrm{d}x_{is} \end{matrix} \right)_{s \times s} \end{aligned} (Xy)TdX==x11yx12yx13yx1syx21yx22yx23yx2syx31yx32yx33yx3syxr1yxr2yxr3yxrsys×r×dx11dx21dx31dxr1dx12dx22dx32dxr2dx13dx23dx33dxr3dx1sdx2sdx3sdxrsr×si=1rxi1ydxi1i=1rxi2ydxi2i=1rxi3ydxi3i=1rxisydxiss×s
因此
t r ( ( ∂ y ∂ X ) T d X ) = ∑ j = 1 s ∑ i = 1 r ∂ y ∂ x i j d x i j = ∑ i = 1 r ∑ j = 1 s ∂ y ∂ x i j d x i j = d y \begin{aligned} tr\left( \left( \frac{\partial y}{\partial X} \right)^T \mathrm{d}X\right) = & \sum_{j=1}^{s} \sum_{i=1}^{r} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \sum_{i=1}^{r} \sum_{j=1}^{s} \frac{\partial y}{\partial x_{ij}} \mathrm{d}x_{ij} = & \mathrm{d}y \end{aligned} tr((Xy)TdX)=j=1si=1rxijydxij=i=1rj=1sxijydxij=dy

3.2向量对矩阵求导

d y ⃗ = ( d y 1 d y 2 ⋮ d y m ) = ( t r ( ( ∂ y 1 ∂ X ) T d X ) t r ( ( ∂ y 2 ∂ X ) T d X ) ⋮ t r ( ( ∂ y m ∂ X ) T d X ) ) \mathrm{d}\vec{y} = \left( \begin{matrix} \mathrm{d}y_1 \\ \mathrm{d}y_2 \\ \vdots \\ \mathrm{d}y_m \end{matrix} \right) = \left( \begin{matrix} tr\left( \left( \frac{\partial y_1}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_2}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots \\ tr\left( \left( \frac{\partial y_m}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right) dy =dy1dy2dym=tr((Xy1)TdX)tr((Xy2)TdX)tr((Xym)TdX)

3.3矩阵对矩阵求导

如果采用向量对矩阵求导,我们有
d Y = ( d y ⃗ 1 , d y ⃗ 2 , ⋯   , d y ⃗ q ) = ( t r ( ( ∂ y 11 ∂ X ) T d X ) t r ( ( ∂ y 12 ∂ X ) T d X ) t r ( ( ∂ y 13 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 1 q ∂ X ) T d X ) t r ( ( ∂ y 21 ∂ X ) T d X ) t r ( ( ∂ y 22 ∂ X ) T d X ) t r ( ( ∂ y 23 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 2 q ∂ X ) T d X ) t r ( ( ∂ y 31 ∂ X ) T d X ) t r ( ( ∂ y 32 ∂ X ) T d X ) t r ( ( ∂ y 33 ∂ X ) T d X ) ⋯ t r ( ( ∂ y 3 q ∂ X ) T d X ) ⋮ ⋮ ⋮ ⋱ ⋮ t r ( ( ∂ y p 1 ∂ X ) T d X ) t r ( ( ∂ y p 2 ∂ X ) T d X ) t r ( ( ∂ y p 3 ∂ X ) T d X ) ⋯ t r ( ( ∂ y p q ∂ X ) T d X ) ) r × s \begin{aligned} \mathrm{d}Y = & \left( \begin{matrix} \mathrm{d}\vec{y}_1, & \mathrm{d}\vec{y}_2, & \cdots, & \mathrm{d}\vec{y}_q \end{matrix} \right) \\ = & \left( \begin{matrix} tr\left( \left( \frac{\partial y_{11}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{12}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{13}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{1q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{21}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{22}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{23}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{2q}}{\partial X} \right)^T \mathrm{d}X\right) \\ tr\left( \left( \frac{\partial y_{31}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{32}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{33}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{3q}}{\partial X} \right)^T \mathrm{d}X\right) \\ \vdots & \vdots & \vdots & \ddots & \vdots \\ tr\left( \left( \frac{\partial y_{p1}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p2}}{\partial X} \right)^T \mathrm{d}X\right) & tr\left( \left( \frac{\partial y_{p3}}{\partial X} \right)^T \mathrm{d}X\right) & \cdots & tr\left( \left( \frac{\partial y_{pq}}{\partial X} \right)^T \mathrm{d}X\right) \\ \end{matrix} \right)_{r \times s} \end{aligned} dY==(dy 1,dy 2,,dy q)tr((Xy11)TdX)tr((Xy21)TdX)tr((Xy31)TdX)tr((Xyp1)TdX)tr((Xy12)TdX)tr((Xy22)TdX)tr((Xy32)TdX)tr((Xyp2)TdX)tr((Xy13)TdX)tr((Xy23)TdX)tr((Xy33)TdX)tr((Xyp3)TdX)tr((Xy1q)TdX)tr((Xy2q)TdX)tr((Xy3q)TdX)tr((Xypq)TdX)r×s
当然,如果采用将矩阵向量化,则有
v e c ( d Y ) = ( ∂ v e c ( Y ) ∂ v e c ( x ) ) T v e c ( d X ) vec(\mathrm{d}Y) = \left( \frac{\partial vec(Y)}{\partial vec(x)} \right)^T vec(\mathrm{d}X) vec(dY)=(vec(x)vec(Y))Tvec(dX)

矩阵微分算子[2]

  • D1(线性) d ( X ± Y ) = d X ± d Y \mathrm{d} \left( X \pm Y \right) = \mathrm{d}X \pm \mathrm{d}Y d(X±Y)=dX±dY
  • D2(矩阵乘法) d ( X Y ) = ( d X ) Y + X ( d Y ) \mathrm{d} \left( X Y \right) = (\mathrm{d}X)Y + X(\mathrm{d}Y) d(XY)=(dX)Y+X(dY)
  • D3(转置) d ( X T ) = ( d X ) T \mathrm{d}(X^T) = (\mathrm{d}X)^T d(XT)=(dX)T
  • D4(迹) d ( t r ( X ) ) = t r ( d X ) \mathrm{d}\left( tr(X) \right) = tr(\mathrm{d}X) d(tr(X))=tr(dX)
  • D5(逆):若 X X X可逆, d ( X − 1 ) = ( d X ) − 1 \mathrm{d}(X^{-1}) = (\mathrm{d}X)^{-1} d(X1)=(dX)1
  • D6(行列式) d ∣ X ∣ = t r ( X a d j u g a t e ( d X ) ) \mathrm{d} |X| = tr\left(X^{adjugate}(\mathrm{d}X)\right) dX=tr(Xadjugate(dX)),若 X X X可逆,则 d ∣ X ∣ = ∣ X ∣ t r ( X − 1 d X ) \mathrm{d} |X| = |X|tr\left( X^{-1} \mathrm{d}X \right) dX=Xtr(X1dX)
  • D7(逐元素乘法) d ( X ⊙ Y ) = ( d X ) ⊙ Y + X ⊙ ( d Y ) \mathrm{d}\left( X \odot Y \right) = (\mathrm{d}X) \odot Y + X \odot (\mathrm{d}Y) d(XY)=(dX)Y+X(dY)
  • D7(逐元素函数) d f ( X ) = f ′ ( X ) ⊙ ( d X ) \mathrm{d} f(X) = f^{'}(X) \odot ( \mathrm{d} X ) df(X)=f(X)(dX)

4.行列式对矩阵求导

行列式对矩阵求导,同样也属于标量对矩阵求导类型。

  • DM1:对于 ∀ X ∈ F n × n , A ∈ F p × n , B ∈ F n × q \forall X \in \mathbb{F}^{n \times n}, A \in \mathbb{F}^{p \times n}, B \in \mathbb{F}^{n \times q} XFn×n,AFp×n,BFn×q, 有 ∂ ∣ A X B ∣ ∂ X = ∣ A X B ∣ ( X − 1 ) T \frac{\partial |AXB|}{\partial X} = |AXB|(X^{-1})^T XAXB=AXB(X1)T
  • DM2:对于 ∀ X ∈ F n × n \forall X \in \mathbb{F}^{n \times n} XFn×n, 有 ∂ l n ( ∣ X ∣ ) ∂ X = ( X − 1 ) T \frac{\partial ln(|X|)}{\partial X} = (X^{-1})^T Xln(X)=(X1)T
  • DM3:对于 ∀ X ( z ) ∈ F n × n , z ∈ F \forall X(z) \in \mathbb{F}^{n \times n}, z \in \mathbb{F} X(z)Fn×n,zF, 有 ∂ l n ( ∣ X ( z ) ∣ ) ∂ z = t r ( X − 1 ∂ X ∂ z ) \frac{\partial ln(|X(z)|)}{\partial z} = tr\left( X^{-1} \frac{\partial X}{\partial z} \right) zln(X(z))=tr(X1zX)
  • DM4:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} XFn×m,AFn×n, 有 ∂ ∣ X T A X ∣ ∂ X = ∣ X T A X ∣ ( A X ( X T A X ) − 1 + A T X ( X T A T X ) − 1 ) \frac{\partial |X^T A X|}{\partial X} = |X^T A X| \left( AX \left(X^TAX\right)^{-1} + A^TX \left(X^TA^TX\right)^{-1} \right) XXTAX=XTAX(AX(XTAX)1+ATX(XTATX)1)

5.迹对矩阵求导

迹对矩阵求导,本质上属于标量对矩阵求导类型。
迹的性质

  • TR1(标量):对于 ∀ a ∈ F 1 \forall a \in \mathbb{F}^1 aF1, 都有 a = t r ( a ) a = tr(a) a=tr(a)
  • TR2(转置):对于 ∀ A ∈ F m × n \forall A \in \mathbb{F}^{m \times n} AFm×n, 都有 t r ( A ) = t r ( A T ) tr(A) = tr(A^T) tr(A)=tr(AT)
  • TR3(线性):对于 ∀ A , B ∈ F m × n \forall A,B \in \mathbb{F}^{m \times n} A,BFm×n, 都有 t r ( A ± B ) = t r ( A ) ± t r ( B ) tr(A \pm B) = tr(A) \pm tr(B) tr(A±B)=tr(A)±tr(B)
  • TR4(对矩阵乘法交换律):对于 ∀ A ∈ F m × n , B ∈ F n × m \forall A \in \mathbb{F}^{m \times n}, B \in \mathbb{F}^{n \times m} AFm×n,BFn×m, 都有 t r ( A B ) = t r ( B A ) tr(AB) = tr(BA) tr(AB)=tr(BA)
  • TR5(对矩阵乘法/逐元素乘法交换律):对于 ∀ A , B , C ∈ F m × n \forall A,B,C \in \mathbb{F}^{m \times n} A,B,CFm×n, 有 t r ( A T ( B ⊙ C ) ) = t r ( ( A ⊙ B ) T C ) tr\left( A^T(B \odot C) \right) = tr \left( (A \odot B)^T C \right) tr(AT(BC))=tr((AB)TC)

常用的函数[2]

f f f d f \mathrm{d} f df ∂ f ∂ X \frac{\partial f}{\partial X} Xf
t r ( X ) tr(X) tr(X) t r ( I d X ) tr(I \mathrm{d}X) tr(IdX) I I I
t r ( X T ) tr(X^T) tr(XT) 2 t r ( X T d X ) 2tr(X^T \mathrm{d}X) 2tr(XTdX) 2 X 2X 2X
t r ( X 2 ) tr(X^2) tr(X2) 2 t r ( X d X ) 2tr(X \mathrm{d}X) 2tr(XdX) 2 X T 2 X^T 2XT
t r ( A X ) tr(A X) tr(AX) t r ( A d X ) tr(A \mathrm{d}X) tr(AdX) A T A^T AT
t r ( X T A X ) tr(X^T A X) tr(XTAX) t r ( X T ( A + A T ) d X ) tr(X^T(A + A^T) \mathrm{d}X) tr(XT(A+AT)dX) ( A + A T ) X (A + A^T)X (A+AT)X
t r ( X A X T ) tr(X A X^T) tr(XAXT) t r ( ( A + A T ) X T d X ) tr((A + A^T)X^T \mathrm{d}X) tr((A+AT)XTdX) X ( A + A T ) X(A + A^T) X(A+AT)
t r ( X A X ) tr(X A X) tr(XAX) t r ( ( A X + X A ) d X ) tr((AX + XA) \mathrm{d}X) tr((AX+XA)dX) X T A T + A T X T X^T A^T + A^T X^T XTAT+ATXT
t r ( A X − 1 ) tr(A X^{-1}) tr(AX1) − t r ( X − 1 A X − 1 d X ) -tr(X^{-1}AX^{-1} \mathrm{d}X) tr(X1AX1dX) − ( X − 1 A X − 1 ) T -\left( X^{-1} A X^{-1} \right)^T (X1AX1)T
t r ( X A X B ) tr(X A X B) tr(XAXB) t r ( ( A X B + B X A ) d X ) tr((AXB + BXA) \mathrm{d}X) tr((AXB+BXA)dX) ( A X B + B X A ) T \left( A X B + B X A \right)^T (AXB+BXA)T
t r ( X A X T B ) tr(X A X^T B) tr(XAXTB) t r ( ( A X T B + A T X T B T ) d X ) tr((A X^T B + A^T X^T B^T) \mathrm{d}X) tr((AXTB+ATXTBT)dX) B T X A T + B X A B^T X A^T + B X A BTXAT+BXA
  • TM1:对于 ∀ X ∈ F n × n \forall X \in \mathbb{F}^{n \times n} XFn×n,
    ∂ t r ( X ) ∂ X = I \frac{\partial tr(X)}{\partial X} = I Xtr(X)=I
  • TM2:对于 ∀ X ∈ F n × m , A ∈ F m × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{m \times n} XFn×m,AFm×n,有 ∂ t r ( X A ) ∂ X = ∂ t r ( A X ) ∂ X = A T \frac{\partial tr(XA)}{\partial X} = \frac{\partial tr(AX)}{\partial X} = A^T Xtr(XA)=Xtr(AX)=AT
  • TM3:对于 ∀ X ∈ F n × m , A ∈ F n × n \forall X \in \mathbb{F}^{n \times m}, A \in \mathbb{F}^{n \times n} XFn×m,AFn×n, 有 ∂ t r ( X T A X ) ∂ X = ( A + A T ) X \frac{\partial tr(X^T A X)}{\partial X} = (A+A^T)X Xtr(XTAX)=(A+AT)X
  • TM4:对于 ∀ X , A ∈ F n × n \forall X, A \in \mathbb{F}^{n \times n} X,AFn×n, 有 ∂ t r ( X − 1 A ) ∂ X = − X − 1 A T X − 1 \frac{\partial tr(X^{-1} A)}{\partial X} = - X^{-1} A^T X^{-1} Xtr(X1A)=X1ATX1

这里为了描述方便,用 x i X , j X x_{i_X,j_X} xiX,jX表示矩阵 X X X的第 i i i j j j列元素。

TM1
∂ t r ( X ) ∂ X = ( ∂ t r ( X ) ∂ x i X , j X ) n × n = ( ∂ ∂ x i X , j X ( ∑ i = 1 n x i , i ) ) n × n = ( ∂ x i X , i X ∂ x i X , j X ) n × n = I . \begin{aligned} \frac{\partial tr(X)}{\partial X} = & \left( \frac{\partial tr(X)}{\partial x_{i_X,j_X}} \right)_{n \times n} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n}x_{i,i} \right) \right)_{n \times n} \\ = & \left( \frac{\partial x_{i_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times n} \\ = & I. \end{aligned} Xtr(X)====(xiX,jXtr(X))n×n(xiX,jX(i=1nxi,i))n×n(xiX,jXxiX,iX)n×nI.

TM2
∂ t r ( X A ) ∂ X = ( ∂ t r ( X A ) ∂ x i X , j X ) n × m = ( ∂ ∂ x i X , j X ( ∑ i = 1 n ∑ k = 1 m x i , k a k , i ) ) n × m = ( ∂ x i X , j X a j X , i X ∂ x i X , j X ) n × m = ( a j X , i X ) n × m = A T . \begin{aligned} \frac{\partial tr(XA)}{\partial X} = & \left( \frac{\partial tr(XA)}{\partial x_{i_X,j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X,j_X}} \left( \sum_{i=1}^{n} \sum_{k=1}^{m} x_{i,k}a_{k,i} \right) \right)_{n \times m} \\ = & \left( \frac{\partial x_{i_X,j_X}a_{j_X,i_X}}{\partial x_{i_X,j_X}}\right)_{n \times m} \\ = & \left( a_{j_X,i_X} \right)_{n \times m} \\ = & A^T. \end{aligned} Xtr(XA)=====(xiX,jXtr(XA))n×m(xiX,jX(i=1nk=1mxi,kak,i))n×m(xiX,jXxiX,jXajX,iX)n×m(ajX,iX)n×mAT.

TM3,记
X = ( x i X 1 , j X 1 ) n × m , X T = ( x j X 1 , i X 1 ) m × n , A = ( a j A , i A ) n × n , X=\left( x_{i_{X1},j_{X1}} \right)_{n \times m}, X^T=\left( x_{j_{X1},i_{X1}} \right)_{m \times n}, A=\left( a_{j_{A},i_{A}} \right)_{n \times n}, X=(xiX1,jX1)n×m,XT=(xjX1,iX1)m×n,A=(ajA,iA)n×n,
X T A = ( ∑ i = 1 n x j X 1 , i a i , j A ) m × n , X T A X = ( ∑ j = 1 n ∑ i = 1 n x j X 1 , i a i , j x j , j X 2 ) m × m . X^T A = \left( \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j_{A}} \right)_{m \times n}, X^T A X = \left( \sum_{j=1}^{n} \sum_{i=1}^{n} x_{j_{X1},i} a_{i,j}x_{j,j_{X2}} \right)_{m \times m}. XTA=(i=1nxjX1,iai,jA)m×n,XTAX=(j=1ni=1nxjX1,iai,jxj,jX2)m×m.
A X = ( ∑ i = 1 n a i A , i x i , j X ) n × m , A T X = ( ∑ i = 1 n a i , j A x i , j X ) n × m , A X = \left( \sum_{i=1}^{n} a_{i_A,i}x_{i,j_{X}} \right)_{n \times m}, A^T X = \left( \sum_{i=1}^{n} a_{i,j_A}x_{i,j_{X}} \right)_{n \times m}, AX=(i=1naiA,ixi,jX)n×m,ATX=(i=1nai,jAxi,jX)n×m,
因此 t r ( X T A X ) = ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n x k , i a i , j x j , k . tr(X^T A X) = \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} . tr(XTAX)=k=1mj=1ni=1nxk,iai,jxj,k.
∂ t r ( X T A X ) ∂ X = ( ∂ t r ( X T A X ) ∂ x i X , j X ) n × m = ( ∂ ∂ x i X , j X ( ∑ k = 1 m ∑ j = 1 n ∑ i = 1 n x k , i a i , j x j , k ) ) n × m = ( ∂ ∂ x i X , j X ( ∑ j = 1 n x i X , j X a j X , j x j , i X ) + ∂ ∂ x i X , j X ( ∑ i = 1 n x j X , i a i , i X x i X , j X ) ) n × m = ( ( ∑ j = 1 n a j X , j x j , i X ) + ( ∑ i = 1 n x j X , i a i , i X ) ) n × m = A X + A T X = ( A + A T ) X . \begin{aligned} \frac{\partial tr(X^T A X)}{\partial X} = & \left( \frac{\partial tr(X^T A X)}{\partial x_{i_X, j_X}} \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{k=1}^{m} \sum_{j=1}^{n} \sum_{i=1}^{n} x_{k,i} a_{i,j}x_{j,k} \right) \right)_{n \times m} \\ = & \left( \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{j=1}^{n} x_{i_X,j_X} a_{j_X,j}x_{j,i_{X}} \right) + \frac{\partial }{\partial x_{i_X, j_X}} \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X}x_{i_X,j_X} \right) \right)_{n \times m} \\ = & \left( \left( \sum_{j=1}^{n} a_{j_X,j}x_{j,i_{X}} \right) + \left( \sum_{i=1}^{n} x_{j_X,i} a_{i,i_X} \right) \right)_{n \times m} \\ = & A X + A^T X \\ = & (A + A^T) X. \end{aligned} Xtr(XTAX)======(xiX,jXtr(XTAX))n×m(xiX,jX(k=1mj=1ni=1nxk,iai,jxj,k))n×m(xiX,jX(j=1nxiX,jXajX,jxj,iX)+xiX,jX(i=1nxjX,iai,iXxiX,jX))n×m((j=1najX,jxj,iX)+(i=1nxjX,iai,iX))n×mAX+ATX(A+AT)X.

6.例题

这里的例题均摘录自[3]。

【例1】, f = a ⃗ T X b ⃗ , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T X \vec{b}, \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=a TXb ,a Fm×1,XFm×n,b Fn×1, 求 ∂ f ∂ X \frac{\partial f}{\partial X} Xf

【解】 ∂ f ∂ X = ∂ t r ( a ⃗ T X b ⃗ ) ∂ X = T R 4 ∂ t r ( b ⃗ a ⃗ T X ) ∂ X = T M 2 a ⃗ b ⃗ T \frac{\partial f}{\partial X} = \frac{\partial tr\left( \vec{a}^T X \vec{b} \right)}{\partial X} \overset{TR4}{=} \frac{\partial tr\left( \vec{b} \vec{a}^T X \right)}{\partial X} \overset{TM2}{=} \vec{a} \vec{b}^T Xf=Xtr(a TXb )=TR4Xtr(b a TX)=TM2a b T

【例2】 f = a ⃗ T e x p ( X b ⃗ ) , a ⃗ ∈ F m × 1 , X ∈ F m × n , b ⃗ ∈ F n × 1 f = \vec{a}^T exp(X \vec{b}), \vec{a} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{b} \in \mathbb{F}^{n \times 1} f=a Texp(Xb ),a Fm×1,XFm×n,b Fn×1,求 ∂ f ∂ X \frac{\partial f}{\partial X} Xf

【解】 先采用微分算子操作 d f = a ⃗ T ( e x p ( X b ⃗ ) ⊙ ( d X b ⃗ ) ) \mathrm{d}f = \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) df=a T(exp(Xb )(dXb ))

两边取迹,然后凑成TM2形式。
d f = t r ( a ⃗ T ( e x p ( X b ⃗ ) ⊙ ( d X b ⃗ ) ) ) = T R 5 t r ( ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T ( d X b ⃗ ) ) = T R 4 t r ( b ⃗ ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T d X ) \begin{aligned} \mathrm{d}f = & tr \left( \vec{a}^T \left( exp(X \vec{b}) \odot (\mathrm{d}X \vec{b}) \right) \right) \\ \overset{TR5}{=} & tr \left( \left( \vec{a} \odot exp(X \vec{b}) \right)^T (\mathrm{d}X \vec{b}) \right) \\ \overset{TR4}{=} & tr \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \mathrm{d}X \right) \end{aligned} df==TR5=TR4tr(a T(exp(Xb )(dXb )))tr((a exp(Xb ))T(dXb ))tr(b (a exp(Xb ))TdX)
得到 ∂ f ∂ X = ( b ⃗ ( a ⃗ ⊙ e x p ( X b ⃗ ) ) T ) T = ( a ⃗ ⊙ e x p ( X b ⃗ ) ) b ⃗ T \frac{\partial f}{\partial X} = \left( \vec{b} \left( \vec{a} \odot exp(X \vec{b}) \right)^T \right)^T = \left( \vec{a} \odot exp(X \vec{b}) \right) \vec{b}^T Xf=(b (a exp(Xb ))T)T=(a exp(Xb ))b T

【例3】 f = t r ( Y T M Y ) , Y = σ ( W X ) f = tr\left( Y^T M Y \right), Y = \sigma \left( WX \right) f=tr(YTMY),Y=σ(WX),求 ∂ f ∂ X \frac{\partial f}{\partial X} Xf。其中 W ∈ F l × m , X ∈ F m × n , Y ∈ F l × n , M ∈ F l × l W \in \mathrm{F}^{l \times m}, X \in \mathrm{F}^{m \times n}, Y \in \mathrm{F}^{l \times n}, M \in \mathrm{F}^{l \times l} WFl×m,XFm×n,YFl×n,MFl×l σ \sigma σ是逐元素函数, f f f是标量。

【解】 先求 ∂ f ∂ Y \frac{\partial f}{\partial Y} Yf部分,
∂ f ∂ Y = ( M + M T ) Y . \frac{\partial f}{\partial Y} = \left( M + M^T \right)Y. Yf=(M+MT)Y.
得到 d f \mathrm{d}f df d Y \mathrm{d}Y dY的关系 d f = t r ( ∂ f ∂ Y T d Y ) = t r ( Y T ( M + M T ) d Y ) \mathrm{d}f = tr\left( \frac{\partial f}{\partial Y}^T \mathrm{d}Y \right) = tr\left( Y^T \left( M + M^T \right) \mathrm{d}Y \right) df=tr(YfTdY)=tr(YT(M+MT)dY)

再求 d Y \mathrm{d}Y dY
d Y = D 7 σ ′ ( W X ) ⊙ d ( W X ) = σ ′ ( W X ) ⊙ ( W d X ) . \begin{aligned} \mathrm{d}Y \overset{D7}{=} & \sigma^{'}(W X) \odot \mathrm{d}(WX) \\ = & \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right). \end{aligned} dY=D7=σ(WX)d(WX)σ(WX)(WdX).
合并得到
d f = t r ( Y T ( M + M T ) σ ′ ( W X ) ⊙ ( W d X ) ) = T R 5 t r ( ( ( M + M T ) Y ⊙ σ ′ ( W X ) ) T W d X ) . \begin{aligned} \mathrm{d}f & = tr \left( Y^T \left( M + M^T \right) \sigma^{'}(W X) \odot \left( W \mathrm{d}X \right) \right) \\ \overset{TR5}{=} & tr \left( \left( (M + M^T)Y \odot \sigma^{'}(W X) \right)^T W \mathrm{d}X \right). \end{aligned} df=TR5=tr(YT(M+MT)σ(WX)(WdX))tr(((M+MT)Yσ(WX))TWdX).
∂ f ∂ X = W T ( ( M + M T ) Y ⊙ σ ′ ( W X ) ) \frac{\partial f}{\partial X}= W^T \left( (M + M^T)Y \odot \sigma^{'}(W X) \right) Xf=WT((M+MT)Yσ(WX))

【例4】 l = ∥ X w ⃗ − y ⃗ ∥ 2 , y ⃗ ∈ F m × 1 , X ∈ F m × n , w ⃗ ∈ F m × 1 l = \| X \vec{w} - \vec{y} \|^2, \vec{y} \in \mathbb{F}^{m \times 1}, X \in \mathbb{F}^{m \times n}, \vec{w} \in \mathbb{F}^{m \times 1} l=Xw y 2,y Fm×1,XFm×n,w Fm×1,求 w ⃗ \vec{w} w 的最小二乘估计。

【解】
l = ∥ X w ⃗ − y ⃗ ∥ 2 = ( X w ⃗ − y ⃗ ) T ( X w ⃗ − y ⃗ ) = ( w ⃗ T X T − y ⃗ T ) ( X w ⃗ − y ⃗ ) = w ⃗ T X T X w ⃗ − w ⃗ T X T y ⃗ − y ⃗ T X w ⃗ + y ⃗ T y ⃗ . \begin{aligned} l = & \| X \vec{w} - \vec{y} \|^2 \\ = & \left( X \vec{w} - \vec{y} \right)^T \left( X \vec{w} - \vec{y} \right) \\ = & \left( \vec{w}^T X^T - \vec{y}^T \right) \left( X \vec{w} - \vec{y} \right) \\ = & \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y}. \end{aligned} l====Xw y 2(Xw y )T(Xw y )(w TXTy T)(Xw y )w TXTXw w TXTy y TXw +y Ty .

∂ l ∂ w ⃗ = ∂ t r ( l ) ∂ w ⃗ = ∂ ∂ w ⃗ t r ( w ⃗ T X T X w ⃗ − w ⃗ T X T y ⃗ − y ⃗ T X w ⃗ + y ⃗ T y ⃗ ) = ∂ ∂ w ⃗ t r ( w ⃗ T X T X w ⃗ ) − ∂ ∂ w ⃗ t r ( 2 y ⃗ T X w ⃗ ) = 2 ( X T X ) w ⃗ − 2 X T y ⃗ = 0. \begin{aligned} \frac{\partial l}{\partial \vec{w}} = & \frac{\partial tr(l)}{\partial \vec{w}} \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} - \vec{w}^T X^T \vec{y} - \vec{y}^T X \vec{w} + \vec{y}^T \vec{y} \right) \\ = & \frac{\partial }{\partial \vec{w}} tr \left( \vec{w}^T X^T X \vec{w} \right) - \frac{\partial }{\partial \vec{w}} tr \left( 2\vec{y}^T X \vec{w} \right) \\ = & 2(X^T X) \vec{w} - 2X^T \vec{y} \\ = & 0. \end{aligned} w l=====w tr(l)w tr(w TXTXw w TXTy y TXw +y Ty )w tr(w TXTXw )w tr(2y TXw )2(XTX)w 2XTy 0.
w ⃗ = ( X T X ) − 1 X T y ⃗ \vec{w} = (X^T X)^{-1} X^T \vec{y} w =(XTX)1XTy

【例5】 样本 x ⃗ 1 , ⋯   , x ⃗ N ∼ N ( μ ⃗ , Σ ) \vec{x}_1,\cdots,\vec{x}_N \thicksim \mathcal{N}\left( \vec{\mu}, \Sigma \right) x 1,,x NN(μ ,Σ)
求方差 Σ \Sigma Σ的极大似然估计。

【解】 对数似然函数为 l = l n ∣ Σ ∣ + 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) . l = ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T\Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right). l=lnΣ+N1i=1N(x ix ˉ)TΣ1(x ix ˉ).

因此
∂ l ∂ Σ = ∂ ∂ Σ ( l n ∣ Σ ∣ + 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) ) = D M 2 ( Σ − 1 ) T + ∂ ∂ Σ t r ( 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) T Σ − 1 ( x ⃗ i − x ⃗ ˉ ) ) = ( Σ − 1 ) T + 1 N ∑ i = 1 N ∂ ∂ Σ t r ( ( x ⃗ i − x ⃗ ˉ ) ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) T ) = ( Σ − 1 ) T + 1 N ∑ i = 1 N ∂ ∂ Σ t r ( ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) T ( x ⃗ i − x ⃗ ˉ ) ) = ( Σ − 1 ) T − 1 N ∑ i = 1 N ( ( Σ − 1 ) T ( x ⃗ i − x ⃗ ˉ ) ( x ⃗ i − x ⃗ ˉ ) T ( Σ − 1 ) T ) = ( Σ − 1 ) T − ( Σ − 1 ) T ( 1 N ∑ i = 1 N ( x ⃗ i − x ⃗ ˉ ) ( x ⃗ i − x ⃗ ˉ ) T ) ( Σ − 1 ) T = ( Σ − 1 ) T − ( Σ − 1 ) T S 2 ( Σ − 1 ) T = ( Σ − 1 − Σ − 1 S 2 Σ − 1 ) T = 0. \begin{aligned} \frac{\partial l}{\partial \Sigma} = & \frac{\partial}{\partial \Sigma} \left( ln|\Sigma| + \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ \overset{DM2}{=} & \left( \Sigma^{-1} \right)^T + \frac{\partial}{\partial \Sigma} tr \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right)^T \Sigma^{-1} \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T + \frac{1}{N}\sum_{i=1}^{N} \frac{\partial}{\partial \Sigma} tr \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \right) \\ = & \left( \Sigma^{-1} \right)^T - \frac{1}{N}\sum_{i=1}^{N} \left( \left( \Sigma^{-1} \right)^T \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \left( \Sigma^{-1} \right)^T \right) \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T \left( \frac{1}{N}\sum_{i=1}^{N} \left( \vec{x}_i - \bar{\vec{x}} \right) \left( \vec{x}_i - \bar{\vec{x}} \right)^T \right) \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} \right)^T - \left( \Sigma^{-1} \right)^T S^2 \left( \Sigma^{-1} \right)^T \\ = & \left( \Sigma^{-1} - \Sigma^{-1} S^2 \Sigma^{-1} \right)^T \\ = & 0. \end{aligned} Σl==DM2=======Σ(lnΣ+N1i=1N(x ix ˉ)TΣ1(x ix ˉ))(Σ1)T+Σtr(N1i=1N(x ix ˉ)TΣ1(x ix ˉ))(Σ1)T+N1i=1NΣtr((x ix ˉ)(Σ1)T(x ix ˉ)T)(Σ1)T+N1i=1NΣtr((Σ1)T(x ix ˉ)T(x ix ˉ))(Σ1)TN1i=1N((Σ1)T(x ix ˉ)(x ix ˉ)T(Σ1)T)(Σ1)T(Σ1)T(N1i=1N(x ix ˉ)(x ix ˉ)T)(Σ1)T(Σ1)T(Σ1)TS2(Σ1)T(Σ1Σ1S2Σ1)T0.
得到方差估计 Σ = S 2 \Sigma = S^2 Σ=S2

【例6】 l = − y ⃗ T l o g s o f t m a x ( W x ⃗ ) , y ⃗ ∈ F m × 1 , W ∈ F m × n , x ⃗ ∈ F n × 1 l = - \vec{y}^T log softmax(W \vec{x}), \vec{y} \in \mathbb{F}^{m \times 1}, W \in \mathbb{F}^{m \times n}, \vec{x} \in \mathbb{F}^{n \times 1} l=y Tlogsoftmax(Wx ),y Fm×1,WFm×n,x Fn×1。求 ∂ l ∂ W \frac{\partial l}{\partial W} Wl。其中 y ⃗ \vec{y} y 只有一个元素为 1 1 1,其他都是 0 0 0

【解】 首先,对于 u ⃗ ∈ F n × 1 , c ∈ F 1 \vec{u} \in \mathbb{F}^{n \times 1}, c \in \mathbb{F}^{1} u Fn×1,cF1
l o g ( u ⃗ c ) = l o g ( u ⃗ ) − 1 ⃗ l o g ( c ) log(\frac{\vec{u}}{c}) = log(\vec{u}) - \vec{1}log(c) log(cu )=log(u )1 log(c)
因此
l = − y ⃗ T l o g s o f t m a x ( W x ⃗ ) = − y ⃗ T l o g ( e x p ( W x ⃗ ) 1 ⃗ T e x p ( W x ⃗ ) ) = − y ⃗ T ( W x ⃗ − 1 ⃗ l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = − y ⃗ T W x ⃗ + l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) . \begin{aligned} l = & - \vec{y}^T log softmax(W \vec{x}) \\ = & - \vec{y}^T log \left( \frac{exp(W \vec{x})}{\vec{1}^T exp(W \vec{x})} \right) \\ = & - \vec{y}^T \left( W \vec{x} - \vec{1} log \left( \vec{1}^T exp(W \vec{x} ) \right)\right) \\ = & - \vec{y}^T W \vec{x} + log \left( \vec{1}^T exp(W \vec{x} ) \right). \end{aligned} l====y Tlogsoftmax(Wx )y Tlog(1 Texp(Wx )exp(Wx ))y T(Wx 1 log(1 Texp(Wx )))y TWx +log(1 Texp(Wx )).
第一部分 ∂ ∂ W ( − y ⃗ T W x ⃗ ) = ∂ ∂ W t r ( − x ⃗ y ⃗ T W ) = − y ⃗ x ⃗ T . \frac{\partial }{\partial W} \left( - \vec{y}^T W \vec{x} \right) = \frac{\partial }{\partial W} tr \left( - \vec{x} \vec{y}^T W \right) = - \vec{y} \vec{x}^T. W(y TWx )=Wtr(x y TW)=y x T.
第二部分
d ( l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = d t r ( l o g ( 1 ⃗ T e x p ( W x ⃗ ) ) ) = d t r ( 1 ⃗ T ( e x p ( W x ⃗ ) ⊙ ( d W x ⃗ ) ) 1 ⃗ T e x p ( W x ⃗ ) ) = d t r ( ( 1 ⃗ ⊙ e x p ( W x ⃗ ) T ) ( d W x ⃗ ) 1 ⃗ T e x p ( W x ⃗ ) ) = d t r ( x ⃗ e x p ( W x ⃗ ) T ( d W ) 1 ⃗ T e x p ( W x ⃗ ) ) \begin{aligned} \mathrm{d} \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) = & \mathrm{d} tr \left( log \left( \vec{1}^T exp(W \vec{x} ) \right) \right) \\ = & \mathrm{d} tr \left( \frac{\vec{1}^T \left( exp(W\vec{x}) \odot \left(\mathrm{d}W \vec{x}\right) \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \left( \vec{1} \odot exp(W\vec{x})^T \right) \left( \mathrm{d}W \vec{x} \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ = & \mathrm{d} tr \left( \frac{ \vec{x} exp(W\vec{x})^T \left( \mathrm{d}W \right) }{ \vec{1}^T exp(W\vec{x}) } \right) \\ \end{aligned} d(log(1 Texp(Wx )))====dtr(log(1 Texp(Wx )))dtr(1 Texp(Wx )1 T(exp(Wx )(dWx )))dtr1 Texp(Wx )(1 exp(Wx )T)(dWx )dtr(1 Texp(Wx )x exp(Wx )T(dW))
故得
∂ l ∂ W = − y ⃗ x ⃗ T + s o f t m a x ( W x ⃗ ) x ⃗ T = ( s o f t m a x ( W x ⃗ ) − y ⃗ ) x ⃗ T . \frac{\partial l}{\partial W} = - \vec{y} \vec{x}^T + softmax(W \vec{x})\vec{x}^T = \left( softmax(W \vec{x}) - \vec{y} \right) \vec{x}^T. Wl=y x T+softmax(Wx )x T=(softmax(Wx )y )x T.

【例7】 有样本 ( x ⃗ 1 , y ⃗ 1 ) , ( x ⃗ 2 , y ⃗ 2 ) , ⋯   , ( x ⃗ N , y ⃗ N ) (\vec{x}_1, \vec{y}_1), (\vec{x}_2, \vec{y}_2), \cdots, (\vec{x}_N, \vec{y}_N) (x 1,y 1),(x 2,y 2),,(x N,y N) y ⃗ i ∈ F m × 1 \vec{y}_i \in \mathbb{F}^{m \times 1} y iFm×1 y ⃗ i \vec{y}_i y i只有一个元素为 1 1 1,其他都是 0 0 0 x ⃗ i ∈ F n × 1 \vec{x}_i \in \mathbb{F}^{n \times 1} x iFn×1
W 1 ∈ F p × n W_1 \in \mathbb{F}^{p \times n} W1Fp×n W 2 ∈ F m × p W_2 \in \mathbb{F}^{m \times p} W2Fm×p b ⃗ 1 ∈ F p × 1 \vec{b}_1 \in \mathbb{F}^{p \times 1} b 1Fp×1
b ⃗ 2 ∈ F m × 1 \vec{b}_2 \in \mathbb{F}^{m \times 1} b 2Fm×1 a ⃗ 1 , i = W 1 x ⃗ i + b ⃗ 1 \vec{a}_{1,i} = W_1 \vec{x}_i + \vec{b}_1 a 1,i=W1x i+b 1, h 1 , i ⃗ = σ ( a ⃗ 1 , i ) \vec{h_{1,i}} = \sigma (\vec{a}_{1,i}) h1,i =σ(a 1,i)
a ⃗ 2 , i = W 1 h ⃗ 1 , i + b ⃗ 2 \vec{a}_{2,i} = W_1 \vec{h}_{1,i} + \vec{b}_2 a 2,i=W1h 1,i+b 2, 定义损失函数为 l = − ∑ i = 1 N y ⃗ i T log ⁡ s o f t m a x ( a ⃗ 2 , i ) l = - \sum_{i=1}^{N} \vec{y}_i^T \log softmax(\vec{a}_{2,i}) l=i=1Ny iTlogsoftmax(a 2,i).

【解】 先求损失对第2层输出的微分 ∂ l ∂ a ⃗ 2 , i = s o f t m a x ( a ⃗ 2 , i ) − y ⃗ i \frac{ \partial l }{ \partial \vec{a}_{2,i} } = softmax(\vec{a}_{2,i}) - \vec{y}_i a 2,il=softmax(a 2,i)y i
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d l = t r ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 2 , i ) T d a ⃗ 2 , i ) = ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 h ⃗ 1 , i + b ⃗ 2 ) ) = ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 ) h ⃗ 1 , i ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T W 2 d ( h ⃗ 1 , i ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( b ⃗ 2 ) ) = ∑ i = 1 N t r ( h ⃗ 1 , i ( ∂ l ∂ a ⃗ 2 , i ) T d ( W 2 ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T W 2 d ( h ⃗ 1 , i ) ) + ∑ i = 1 N t r ( ( ∂ l ∂ a ⃗ 2 , i ) T d ( b ⃗ 2 ) ) . \begin{aligned} \mathrm{d} l = & tr\left( \sum_{i=1}^{N} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \vec{a}_{2,i} \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \vec{h}_{1,i} + \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \vec{h}_{1,i} \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) \\ = & \sum_{i=1}^{N} tr\left( \vec{h}_{1,i} \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( W_2 \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T W_2 \mathrm{d} \left( \vec{h}_{1,i} \right) \right) + \sum_{i=1}^{N} tr\left( \left( \frac{ \partial l }{ \partial \vec{a}_{2,i} } \right)^T \mathrm{d} \left( \vec{b}_2 \right) \right) . \end{aligned} dl====tr(i=1N(a 2,il)Tda 2,i)i=1Ntr((a 2,il)Td(W2h 1,i+b 2))i=1Ntr((a 2,il)Td(W2)h 1,i)+i=1Ntr((a 2,il)TW2d(h 1,i))+i=1Ntr((a 2,il)Td(b 2))i=1Ntr(h 1,i(a 2,il)Td(W2))+i=1Ntr((a 2,il)TW2d(h 1,i))+i=1Ntr((a 2,il)Td(b 2)).
得到 ∂ l ∂ W 2 = ∑ i = 1 N ∂ l ∂ a ⃗ 2 , i h ⃗ 1 , i T \frac{\partial l}{\partial W_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} } \vec{h}_{1,i}^T W2l=i=1Na 2,ilh 1,iT.
∂ l ∂ b 2 = ∑ i = 1 N ∂ l ∂ a ⃗ 2 , i . \frac{\partial l}{\partial b_2} = \sum_{i=1}^{N} \frac{ \partial l }{ \partial \vec{a}_{2,i} }. b2l=i=1Na 2,il.
∂ l ∂ h 1 , i = W 2 T ∂ l ∂ a ⃗ 2 , i . \frac{\partial l}{\partial h_{1,i}} = W_2^T \frac{ \partial l }{ \partial \vec{a}_{2,i} }. h1,il=W2Ta 2,il.
再求损失对第1层输入的微分。
∂ l ∂ a ⃗ 1 , i = ∂ l ∂ h 1 , i ⊙ σ ′ ( a ⃗ 1 , i ) . \frac{\partial l}{\partial \vec{a}_{1,i}} = \frac{\partial l}{\partial h_{1,i}} \odot \sigma^{'}(\vec{a}_{1,i}). a 1,il=h1,ilσ(a 1,i).
最后再求损失对连接输入层到第1层的权重的微分。
d l = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d a ⃗ 1 , i ) = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d ( W 1 x ⃗ i + b ⃗ i ) ) = t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d W 1 x ⃗ i ) + t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d b ⃗ i ) = t f ( ∑ i = 1 N x ⃗ i ( ∂ l ∂ a ⃗ 1 , i ) T d W 1 ) + t f ( ∑ i = 1 N ( ∂ l ∂ a ⃗ 1 , i ) T d b ⃗ i ) \begin{aligned} \mathrm{d} l = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{a}_{1,i} \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \left( W_1 \vec{x}_i + \vec{b}_i \right) \right) \\ = & tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \vec{x}_i \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ = & tf \left( \sum_{i=1}^{N} \vec{x}_i \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} W_1 \right) + tf \left( \sum_{i=1}^{N} \left( \frac{\partial l}{ \partial \vec{a}_{1,i}} \right)^T \mathrm{d} \vec{b}_i \right) \\ \end{aligned} dl====tf(i=1N(a 1,il)Tda 1,i)tf(i=1N(a 1,il)Td(W1x i+b i))tf(i=1N(a 1,il)TdW1x i)+tf(i=1N(a 1,il)Tdb i)tf(i=1Nx i(a 1,il)TdW1)+tf(i=1N(a 1,il)Tdb i)
∂ l ∂ W 1 = ∑ i = 1 N ∂ l ∂ a ⃗ 1 , i x ⃗ i T \frac{\partial l}{\partial W_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}} \vec{x}_i^T W1l=i=1Na 1,ilx iT.
∂ l ∂ b ⃗ 1 = ∑ i = 1 N ∂ l ∂ a ⃗ 1 , i . \frac{\partial l}{\partial \vec{b}_1} = \sum_{i=1}^{N} \frac{\partial l}{ \partial \vec{a}_{1,i}}. b 1l=i=1Na 1,il.
【例8】 将上题给成矩阵形式, X = [ x ⃗ 1 , ⋯   , x ⃗ N ] X = [\vec{x}_1,\cdots,\vec{x}_N] X=[x 1,,x N]
A 1 = [ a ⃗ 1 , 1 , ⋯   , a ⃗ 1 , N ] = W 1 X + b ⃗ 1 1 ⃗ T A_1 = [\vec{a}_{1,1},\cdots, \vec{a}_{1,N}] = W_1X+\vec{b}_1 \vec{1}^T A1=[a 1,1,,a 1,N]=W1X+b 11 T
H 1 = [ h ⃗ 1 , 1 , ⋯   , h ⃗ 1 , N ] = σ ( A 1 ) H_1 = [\vec{h}_{1,1},\cdots, \vec{h}_{1,N}] = \sigma(A_1) H1=[h 1,1,,h 1,N]=σ(A1)
A 2 = [ a ⃗ 2 , 1 , ⋯   , a ⃗ 2 , N ] = W 2 H 1 + b ⃗ 2 1 ⃗ T A_2 = [\vec{a}_{2,1},\cdots, \vec{a}_{2,N}] = W_2H_1+\vec{b}_2 \vec{1}^T A2=[a 2,1,,a 2,N]=W2H1+b 21 T.

【解】 先求损失对第2层输出的微分 ∂ l ∂ A 2 = [ s o f t m a x ( a 2 , 1 ⃗ ) − y ⃗ 1 , ⋯   , s o f t m a x ( a 2 , N ⃗ ) − y ⃗ N ] \frac{\partial l}{\partial A_2} = [softmax(\vec{a_{2,1}}) - \vec{y}_1, \cdots, softmax(\vec{a_{2,N}}) - \vec{y}_N] A2l=[softmax(a2,1 )y 1,,softmax(a2,N )y N]
再求损失对第1层输出、连接第1-2层间的权重的微分。这里由于没有定义对矩阵求导的一些链式法则,因此使用导数与微分的关系。
d l = t f ( ( ∂ l ∂ A 2 ) T d A 2 ) = t f ( ( ∂ l ∂ A 2 ) T d ( W 2 H 1 + b ⃗ 2 1 ⃗ T ) ) = t f ( H 1 ( ∂ l ∂ A 2 ) T d ( W 2 ) ) + t f ( ( ∂ l ∂ A 2 ) T W 2 d ( H 1 ) ) + t f ( ( ∂ l ∂ A 2 1 ⃗ ) T d b ⃗ 2 ) . \begin{aligned} \mathrm{d} l = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} A_2 \right) \\ = & tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} \left( W_2H_1+\vec{b}_2 \vec{1}^T \right) \right) \\ = & tf \left( H_1 \left( \frac{\partial l}{\partial A_2} \right)^T \mathrm{d} (W_2) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \right)^T W_2 \mathrm{d} (H_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_2} \vec{1} \right)^T \mathrm{d} \vec{b}_2 \right). \end{aligned} dl===tf((A2l)TdA2)tf((A2l)Td(W2H1+b 21 T))tf(H1(A2l)Td(W2))+tf((A2l)TW2d(H1))+tf((A2l1 )Tdb 2).

∂ l ∂ W 2 = ∂ l ∂ A 2 H 1 T \frac{\partial l}{\partial W_2} = \frac{\partial l}{\partial A_2} H_1^T W2l=A2lH1T.
∂ l ∂ H 1 = W 2 T ∂ l ∂ A 2 \frac{\partial l}{\partial H_1} = W_2^T \frac{\partial l}{\partial A_2} H1l=W2TA2l.
∂ l ∂ b ⃗ 2 = ∂ l ∂ A 2 1 ⃗ \frac{\partial l}{\partial \vec{b}_2} = \frac{\partial l}{\partial A_2} \vec{1} b 2l=A2l1 .
再求损失对第1层输入的微分。
∂ l ∂ A 1 = ∂ l ∂ H 1 ⊙ σ ′ ( A 1 ) \frac{\partial l}{\partial A_1} = \frac{\partial l}{\partial H_1} \odot \sigma^{'}(A_1) A1l=H1lσ(A1).
再求损失对第1层输出、连接第1-2层间的权重的微分。
d l = t f ( ( ∂ l ∂ A 1 ) T d A 1 ) = t f ( ( ∂ l ∂ A 1 ) T d ( W 1 X + b ⃗ 1 1 ⃗ T ) ) = t f ( ( ∂ l ∂ A 1 ) T ( d W 1 ) X ) + t f ( ( ∂ l ∂ A 1 ) T W 1 ( d X ) ) + t f ( ( ∂ l ∂ A 1 ) T ( d b ⃗ 1 ) 1 ⃗ T ) = t f ( X ( ∂ l ∂ A 1 ) T ( d W 1 ) ) + t f ( ( ∂ l ∂ A 1 ) T W 1 ( d X ) ) + t f ( 1 ⃗ T ( ∂ l ∂ A 1 ) T ( d b ⃗ 1 ) ) . \begin{aligned} \mathrm{d} l = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} A_1 \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T \mathrm{d} \left( W_1X+\vec{b}_1 \vec{1}^T \right) \right) \\ = & tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) X \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \vec{1}^T \right) \\ = & tf\left( X \left( \frac{\partial l}{\partial A_1} \right)^T (\mathrm{d} W_1) \right) + tf \left( \left( \frac{\partial l}{\partial A_1} \right)^T W_1 (\mathrm{d} X) \right) + tf\left( \vec{1}^T \left( \frac{\partial l}{\partial A_1} \right)^T ( \mathrm{d} \vec{b}_1 ) \right). \end{aligned} dl====tf((A1l)TdA1)tf((A1l)Td(W1X+b 11 T))tf((A1l)T(dW1)X)+tf((A1l)TW1(dX))+tf((A1l)T(db 1)1 T)tf(X(A1l)T(dW1))+tf((A1l)TW1(dX))+tf(1 T(A1l)T(db 1)).
∂ l ∂ W 1 = ∂ l ∂ A 1 X T . \frac{\partial l}{\partial W_1} = \frac{\partial l}{\partial A_1} X^T. W1l=A1lXT.
∂ l ∂ b ⃗ 1 = ∂ l ∂ A 1 1 ⃗ . \frac{\partial l}{\partial \vec{b}_1} = \frac{\partial l}{\partial A_1} \vec{1}. b 1l=A1l1 .

参考文献

[1] KHENG L W. Matrix differentiation,cs5240 theoretical foundations in multimedia.
[2] 张贤达. 矩阵分析与应用[M]. 北京: 清华大学出版社, 2004: 255-285.
[3] 长躯鬼侠. 矩阵求导术(上).

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值