矩阵求导之五:机器学习中的常用公式(上)

在了解了矩阵求导的定义之后,可以借助定义进行一些机器学习中常用公式的推导。约定 x \boldsymbol{x} x n n n维列向量:
x = [ x 1 x 2 ⋯ x n ] T \boldsymbol{x}=\left[ \begin{matrix} x_1& x_2& \cdots& x\\ \end{matrix}_n \right] ^T x=[x1x2xn]T

结论一

∂ a ∂ x = 0 \frac{\partial a}{\partial \boldsymbol{x}}=0 xa=0
【证明】
∂ a ∂ x = [ ∂ a ∂ x 1 ∂ a ∂ x 2 ⋯ ∂ a ∂ x n ] T = [ 0 0 ⋯ 0 ] T \frac{\partial a}{\partial \boldsymbol{x}}=\left[ \begin{matrix} \frac{\partial a}{\partial x_1}& \frac{\partial a}{\partial x_2}& \cdots& \frac{\partial a}{\partial x_n}\\ \end{matrix} \right] ^T=\left[ \begin{matrix} 0& 0& \cdots& 0\\ \end{matrix} \right] ^T xa=[x1ax2axna]T=[000]T

结论二

∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = A \frac{\partial \left( \boldsymbol{x}^T\cdot \boldsymbol{A} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{A}^T\cdot \boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{A} x(xTA)=x(ATx)=A
【证明】

A = [ α 1 T α 2 T ⋯ α n T ] T \boldsymbol{A}=\left[ \begin{matrix} {\boldsymbol{\alpha }_1}^T& {\boldsymbol{\alpha }_2}^T& \cdots& {\boldsymbol{\alpha }_n}^T\\ \end{matrix} \right] ^T A=[α1Tα2TαnT]T
其中:
α i = [ a i 1 a i 2 ⋯ a i n ] T \boldsymbol{\alpha }_i=\left[ \begin{matrix} a_{i1}& a_{i2}& \cdots& a_{in}\\ \end{matrix} \right] ^T αi=[ai1ai2ain]T
则有:
∂ ( x T ⋅ A ) ∂ x = ∂ ( A T ⋅ x ) ∂ x = ∂ ( x 1 ⋅ α 1 T + x 2 ⋅ α 2 T + ⋯ + x n ⋅ α n T ) ∂ x = [ ∂ ( x 1 ⋅ α 1 T + x 2 ⋅ α 2 T + ⋯ + x n ⋅ α n T ) ∂ x 1 ∂ ( x 1 ⋅ α 1 T + x 2 ⋅ α 2 T + ⋯ + x n ⋅ α n T ) ∂ x 2 ⋮ ∂ ( x 1 ⋅ α 1 T + x 2 ⋅ α 2 T + ⋯ + x n ⋅ α n T ) ∂ x n ] = [ α 1 T α 2 T ⋮ α n T ] = A \begin{aligned} \frac{\partial \left( \boldsymbol{x}^T\cdot \boldsymbol{A} \right)}{\partial \boldsymbol{x}}&=\frac{\partial \left( \boldsymbol{A}^T\cdot \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \\ &=\frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial \boldsymbol{x}} \\ &=\left[ \begin{array}{c} \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_1}\\ \\ \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( x_1\cdot {\boldsymbol{\alpha }_1}^T+x_2\cdot {\boldsymbol{\alpha }_2}^T+\cdots +x_n\cdot {\boldsymbol{\alpha }_n}^T \right)}{\partial x_n}\\ \end{array} \right] \\ &=\left[ \begin{array}{c} {\boldsymbol{\alpha }_1}^T\\ \\ {\boldsymbol{\alpha }_2}^T\\ \\ \vdots\\ \\ {\boldsymbol{\alpha }_n}^T\\ \end{array} \right] =A \end{aligned} x(xTA)=x(ATx)=x(x1α1T+x2α2T++xnαnT)= x1(x1α1T+x2α2T++xnαnT)x2(x1α1T+x2α2T++xnαnT)xn(x1α1T+x2α2T++xnαnT) = α1Tα2TαnT =A

结论三

∂ x T x ∂ x = 2 x \frac{\partial \boldsymbol{x}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=2\boldsymbol{x} xxTx=2x
【证明】设
x = [ x 1 , x 2 , ⋯   , x m ] T \boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_m \right] ^T x=[x1,x2,,xm]T

f ( x ) = x T x = x 1 2 + x 2 2 + ⋯ + x n 2 f\left( \boldsymbol{x} \right) =\boldsymbol{x}^T\boldsymbol{x}={x_1}^2+{x_2}^2+\cdots +{x_n}^2 f(x)=xTx=x12+x22++xn2

∂ f ∂ x = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] = [ 2 x 1 2 x 2 ⋮ 2 x n ] = 2 x \frac{\partial f}{\partial \boldsymbol{x}}=\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} 2x_1\\ \\ 2x_2\\ \\ \vdots\\ \\ 2x_n\\ \end{array} \right] =2\boldsymbol{x} xf= x1fx2fxnf = 2x12x22xn =2x
即:
∂ x T x ∂ x = 2 x \frac{\partial \boldsymbol{x}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=2\boldsymbol{x} xxTx=2x

结论四

∂ ( x T A x ) ∂ x = A x + A T x \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{Ax} \right)}{\partial \boldsymbol{x}}=\boldsymbol{Ax}+\boldsymbol{A}^T\boldsymbol{x} x(xTAx)=Ax+ATx
【证明】
x T A x = [ x 1 x 2 ⋯ x n ] ⋅ [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a n 1 a n 2 ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] = [ x 1 a 11 + x 2 a 21 + ⋯ + x n a n 1 x 1 a 12 + x 2 a 22 + ⋯ + x n a n 2 ⋯ x 1 a 1 n + x 2 a 2 n + ⋯ + x n a n n ] ⋅ [ x 1 x 2 ⋮ x n ] = x 1 ( x 1 a 11 + x 2 a 21 + ⋯ + x n a n 1 ) + x 2 ( x 1 a 12 + x 2 a 22 + ⋯ + x n a n 2 ) + ⋯ + x n ( x 1 a 1 n + x 2 a 2 n + ⋯ + x n a n n ) \begin{aligned} \boldsymbol{x}^T\boldsymbol{Ax}&=\left[ \begin{matrix} x_1& x_2& \cdots& x_n\\ \end{matrix} \right] \cdot \left[ \begin{matrix} a_{11}& a_{12}& \cdots& a_{1n}\\ a_{21}& a_{22}& \cdots& a_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ a_{n1}& a_{n2}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ &=\left[ \begin{matrix} x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1}& x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2}& \cdots& x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ &=x_1\left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +x_2\left( x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2} \right) +\cdots +x_n\left( x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn} \right) \end{aligned} xTAx=[x1x2xn] a11a21an1a12a22an2a1na2nann x1x2xn =[x1a11+x2a21++xnan1x1a12+x2a22++xnan2x1a1n+x2a2n++xnann] x1x2xn =x1(x1a11+x2a21++xnan1)+x2(x1a12+x2a22++xnan2)++xn(x1a1n+x2a2n++xnann)

f ( x ) = x T A x f\left( \boldsymbol{x} \right) =\boldsymbol{x}^T\boldsymbol{Ax} f(x)=xTAx
则:
∂ f ( x ) ∂ x 1 = ( x 1 a 11 + x 2 a 21 + ⋯ + x n a n 1 ) + ( x 1 a 11 + x 2 a 12 + ⋯ + x n a 1 n ) \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_1}=\left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +\left( x_1a_{11}+x_2a_{12}+\cdots +x_na_{1\boldsymbol{n}} \right) x1f(x)=(x1a11+x2a21++xnan1)+(x1a11+x2a12++xna1n)
∂ f ( x ) ∂ x = [ ∂ f ( x ) ∂ x 1 ∂ f ( x ) ∂ x 2 ⋮ ∂ f ( x ) ∂ x n ] = [ ( x 1 a 11 + x 2 a 21 + ⋯ + x n a n 1 ) + ( x 1 a 11 + x 2 a 12 + ⋯ + x n a 1 n ) ( x 1 a 12 + x 2 a 22 + ⋯ + x n a n 2 ) + ( x 1 a 21 + x 2 a 22 + ⋯ + x n a 2 n ) ⋮ ( x 1 a 1 n + x 2 a 2 n + ⋯ + x n a n n ) + ( x 1 a n 1 + x 2 a n 2 + ⋯ + x n a n n ) ] = [ x 1 a 11 + x 2 a 21 + ⋯ + x n a n 1 x 1 a 12 + x 2 a 22 + ⋯ + x n a n 2 ⋮ x 1 a 1 n + x 2 a 2 n + ⋯ + x n a n n ] + [ x 1 a 11 + x 2 a 12 + ⋯ + x n a 1 n x 1 a 21 + x 2 a 22 + ⋯ + x n a 2 n ⋮ x 1 a n 1 + x 2 a n 2 + ⋯ + x n a n n ] = [ a 11 a 21 ⋯ a n 1 a 12 a 22 ⋯ a n 2 ⋮ ⋮ ⋱ ⋮ a 1 n a 2 n ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] + [ a 11 a 12 ⋯ a 1 n a 21 a 22 ⋯ a 2 n ⋮ ⋮ ⋱ ⋮ a n 1 a n 2 ⋯ a n n ] ⋅ [ x 1 x 2 ⋮ x n ] = A T x + A x \begin{aligned} \frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_1}\\ \\ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \left( x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1} \right) +\left( x_1a_{11}+x_2a_{12}+\cdots +x_na_{1n} \right)\\ \\ \left( x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2} \right) +\left( x_1a_{21}+x_2a_{22}+\cdots +x_na_{2n} \right)\\ \\ \vdots\\ \\ \left( x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn} \right) +\left( x_1a_{n1}+x_2a_{n2}+\cdots +x_na_{nn} \right)\\ \end{array} \right] \\ \\ &=\left[ \begin{array}{c} x_1a_{11}+x_2a_{21}+\cdots +x_na_{n1}\\ \\ x_1a_{12}+x_2a_{22}+\cdots +x_na_{n2}\\ \\ \vdots\\ \\ x_1a_{1n}+x_2a_{2n}+\cdots +x_na_{nn}\\ \end{array} \right] +\left[ \begin{array}{c} x_1a_{11}+x_2a_{12}+\cdots +x_na_{1n}\\ \\ x_1a_{21}+x_2a_{22}+\cdots +x_na_{2n}\\ \\ \vdots\\ \\ x_1a_{n1}+x_2a_{n2}+\cdots +x_na_{nn}\\ \end{array} \right] \\ \\ &=\left[ \begin{matrix} a_{11}& a_{21}& \cdots& a_{n1}\\ a_{12}& a_{22}& \cdots& a_{n2}\\ \vdots& \vdots& \ddots& \vdots\\ a_{1n}& a_{2n}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] +\left[ \begin{matrix} a_{11}& a_{12}& \cdots& a_{1n}\\ a_{21}& a_{22}& \cdots& a_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ a_{n1}& a_{n2}& \cdots& a_{nn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} x_1\\ x_2\\ \vdots\\ x_n\\ \end{array} \right] \\ \\ &=\boldsymbol{A}^T\boldsymbol{x}+\boldsymbol{Ax} \end{aligned} xf(x)= x1f(x)x2f(x)xnf(x) = (x1a11+x2a21++xnan1)+(x1a11+x2a12++xna1n)(x1a12+x2a22++xnan2)+(x1a21+x2a22++xna2n)(x1a1n+x2a2n++xnann)+(x1an1+x2an2++xnann) = x1a11+x2a21++xnan1x1a12+x2a22++xnan2x1a1n+x2a2n++xnann + x1a11+x2a12++xna1nx1a21+x2a22++xna2nx1an1+x2an2++xnann = a11a12a1na21a22a2nan1an2ann x1x2xn + a11a21an1a12a22an2a1na2nann x1x2xn =ATx+Ax

结论五

∂ ( x T a ) ∂ x = ∂ ( a T x ) ∂ x = a \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{a} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{a} x(xTa)=x(aTx)=a
其中 a \boldsymbol{a} a为常数向量:
a = [ a 1 a 2 ⋯ a n ] T \boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T a=[a1a2an]T
【证明】
∂ ( x T a ) ∂ x = ∂ ( a T x ) ∂ x = ∂ ( x 1 a 1 + x 2 a 2 + ⋯ + x n a n ) ∂ x = [ ∂ ( x 1 a 1 + x 2 a 2 + ⋯ + x n a n ) ∂ x 1 ∂ ( x 1 a 1 + x 2 a 2 + ⋯ + x n a n ) ∂ x 2 ⋮ ∂ ( x 1 a 1 + x 2 a 2 + ⋯ + x n a n ) ∂ x n ] = [ a 1 a 2 ⋮ a n ] = a \begin{aligned} \frac{\partial \left( \boldsymbol{x}^T\boldsymbol{a} \right)}{\partial \boldsymbol{x}}&=\frac{\partial \left( \boldsymbol{a}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}} \\ \\ &=\frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial \boldsymbol{x}} \\ \\ &=\left[ \begin{array}{c} \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_1}\\ \\ \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( x_1a_1+x_2a_2+\cdots +x_na_n \right)}{\partial x_n}\\ \\ \end{array} \right] \\ &=\left[ \begin{array}{c} a_1\\ \\ a_2\\ \\ \vdots\\ \\ a_n\\ \end{array} \right] \\ &=\boldsymbol{a} \end{aligned} x(xTa)=x(aTx)=x(x1a1+x2a2++xnan)= x1(x1a1+x2a2++xnan)x2(x1a1+x2a2++xnan)xn(x1a1+x2a2++xnan) = a1a2an =a

结论六

∂ ( a T x x T b ) ∂ x = a b T x + b a T x \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\boldsymbol{ab}^T\boldsymbol{x}+\boldsymbol{ba}^T\boldsymbol{x} x(aTxxTb)=abTx+baTx
其中 a \boldsymbol{a} a b \boldsymbol{b} b为常数向量:
a = [ a 1 a 2 ⋯ a n ] T    b = [ b 1 b 2 ⋯ b n ] T \boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_n\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] ^T a=[a1a2an]T  b=[b1b2bn]T
【证明】
因为 a T x = x T a , x T b = b T x \boldsymbol{a}^T\boldsymbol{x}=\boldsymbol{x}^T\boldsymbol{a}, \boldsymbol{x}^T\boldsymbol{b}=\boldsymbol{b}^T\boldsymbol{x} aTx=xTa,xTb=bTx,所以有:
∂ ( a T x x T b ) ∂ x = ∂ ( x T a b T x ) ∂ x \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{ab}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}} x(aTxxTb)=x(xTabTx)
又因为 a b T \boldsymbol{ab}^T abT n × n n\times n n×n的常数矩阵,由结论四可知:
∂ ( a T x x T b ) ∂ x = ∂ ( x T a b T x ) ∂ x = a b T x + b a T x \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{xx}^T\boldsymbol{b} \right)}{\partial \boldsymbol{x}}=\frac{\partial \left( \boldsymbol{x}^T\boldsymbol{ab}^T\boldsymbol{x} \right)}{\partial \boldsymbol{x}}=\boldsymbol{ab}^T\boldsymbol{x}+\boldsymbol{ba}^T\boldsymbol{x} x(aTxxTb)=x(xTabTx)=abTx+baTx

结论七

∂ ( a T X b ) ∂ X = a b T \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}=\boldsymbol{ab}^T X(aTXb)=abT
其中 a \boldsymbol{a} a b \boldsymbol{b} b为常数向量:
a = [ a 1 a 2 ⋯ a m ] T    b = [ b 1 b 2 ⋯ b n ] T \boldsymbol{a}=\left[ \begin{matrix} a_1& a_2& \cdots& a_m\\ \end{matrix} \right] ^T \\ \ \ \\ \boldsymbol{b}=\left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] ^T a=[a1a2am]T  b=[b1b2bn]T
【证明】
a T X b = [ a 1 a 2 ⋯ a m ] ⋅ [ x 11 x 12 ⋯ x 1 n x 21 x 22 ⋯ x 2 n ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m n ] ⋅ [ b 1 b 2 ⋮ b n ] = [ a 1 x 11 + a 2 x 21 + ⋯ + a m a m 1 a 1 x 12 + a 2 x 22 + ⋯ + a m a m 2 ⋯ a 1 x 1 n + a 2 x 2 n + ⋯ + a m a m n ] ⋅ [ b 1 b 2 ⋮ b n ] = b 1 ( a 1 x 11 + a 2 x 21 + ⋯ + a m a m 1 ) + b 2 ( a 1 x 12 + a 2 x 22 + ⋯ + a m a m 2 ) + ⋯ + b n ( a 1 x 1 n + a 2 x 2 n + ⋯ + a m a m n ) \begin{aligned} \boldsymbol{a}^T\boldsymbol{Xb}&=\left[ \begin{matrix} a_1& a_2& \cdots& a_m\\ \end{matrix} \right] \cdot \left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ x_{21}& x_{22}& \cdots& x_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_n\\ \end{array} \right] \\ &=\left[ \begin{matrix} a_1x_{11}+a_2x_{21}+\cdots +a_ma_{m1}& a_1x_{12}+a_2x_{22}+\cdots +a_ma_{m2}& \cdots& a_1x_{1n}+a_2x_{2n}+\cdots +a_ma_{mn}\\ \end{matrix} \right] \cdot \left[ \begin{array}{c} b_1\\ b_2\\ \vdots\\ b_n\\ \end{array} \right] \\ &=b_1\left( a_1x_{11}+a_2x_{21}+\cdots +a_ma_{m1} \right) +b_2\left( a_1x_{12}+a_2x_{22}+\cdots +a_ma_{m2} \right) +\cdots +b_n\left( a_1x_{1n}+a_2x_{2n}+\cdots +a_ma_{mn} \right) \end{aligned} aTXb=[a1a2am] x11x21xm1x12x22xm2x1nx2nxmn b1b2bn =[a1x11+a2x21++amam1a1x12+a2x22++amam2a1x1n+a2x2n++amamn] b1b2bn =b1(a1x11+a2x21++amam1)+b2(a1x12+a2x22++amam2)++bn(a1x1n+a2x2n++amamn)

f ( X ) = a T X b f\left( \boldsymbol{X} \right) =\boldsymbol{a}^T\boldsymbol{Xb} f(X)=aTXb
则:
∂ ( a T X b ) ∂ X = ∂ f ( X ) ∂ X = [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 ⋯ ∂ f ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ∂ x m 1 ∂ f ∂ x m 2 ⋯ ∂ f ∂ x m n ] m × n = [ a 1 b 1 a 1 b 2 ⋯ a 1 b n a 2 b 1 a 2 b 2 ⋯ a 2 b n ⋮ ⋮ ⋱ ⋮ a m b 1 a m b 2 ⋯ a m b n ] m × n = [ a 1 a 2 ⋮ a m ] ⋅ [ b 1 b 2 ⋯ b n ] = a b T \begin{aligned} \frac{\partial \left( \boldsymbol{a}^T\boldsymbol{Xb} \right)}{\partial \boldsymbol{X}}&=\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \\ \\ &=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{m1}}& \frac{\partial f}{\partial x_{m2}}& \cdots& \frac{\partial f}{\partial x_{mn}}\\ \end{matrix} \right] _{m\times n} \\ \\ &=\left[ \begin{matrix} a_1b_1& a_1b_2& \cdots& a_1b_n\\ \\ a_2b_1& a_2b_2& \cdots& a_2b_n\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ a_mb_1& a_mb_2& \cdots& a_mb_n\\ \end{matrix} \right] _{m\times n} \\ \\ &=\left[ \begin{array}{c} a_1\\ \\ a_2\\ \\ \vdots\\ \\ a_m\\ \end{array} \right] \cdot \left[ \begin{matrix} b_1& b_2& \cdots& b_n\\ \end{matrix} \right] \\ \\ &=\boldsymbol{ab}^T \end{aligned} X(aTXb)=Xf(X)= x11fx21fxm1fx12fx22fxm2fx1nfx2nfxmnf m×n= a1b1a2b1amb1a1b2a2b2amb2a1bna2bnambn m×n= a1a2am [b1b2bn]=abT

参考文献

[1] 机器学习中的矩阵求导方法
[2] 矩阵求导公式的数学推导
[3] 矩阵的求导

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值