矩阵求导之二:定义篇(上)

有了上一篇文章的直观感受,本篇文章将针对同类型的函数、变元给出严谨的导数定义。因为向量标量都可以表示成简单的矩阵形式,所以这里我们使用矩阵来泛化的表示所有含义。规定:粗体小写字母(如 x \boldsymbol{x} x)表示向量,粗体大写字母(如 X \boldsymbol{X} X)表示矩阵。

1 标量对向量求导

1.1 定义

f ( x ) = f ( x 1 , x 2 , ⋯   , x n ) f\left( \boldsymbol{x} \right) =f\left( x_1,x_2,\cdots ,x_n \right) f(x)=f(x1,x2,,xn) x = [ x 1 , x 2 , ⋯   , x n ] T \boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_n \right] ^T x=[x1,x2,,xn]T,则:
在分子布局下:
∂ f ∂ x T = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] \frac{\partial f}{\partial \boldsymbol{x}^T}=\left[ \frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots ,\frac{\partial f}{\partial x_n} \right] xTf=[x1f,x2f,,xnf]
在分母布局下:
∂ f ∂ x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] T \frac{\partial f}{\partial \boldsymbol{x}}=\left[ \frac{\partial f}{\partial x_1},\frac{\partial f}{\partial x_2},\cdots ,\frac{\partial f}{\partial x_n} \right] ^T xf=[x1f,x2f,,xnf]T

1.2 运算法则

1.2.1 线性法则

对于 f ( x ) f\left( \boldsymbol{x} \right) f(x) g ( x ) g\left( \boldsymbol{x} \right) g(x),则:
∂ [ c 1 f ( x ) ± c 2 g ( x ) ] ∂ x = c 1 ∂ f ( x ) ∂ x ± c 2 ∂ g ( x ) ∂ x \frac{\partial \left[ c_1f\left( \boldsymbol{x} \right) \pm c_2g\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=c_1\frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\pm c_2\frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} x[c1f(x)±c2g(x)]=c1xf(x)±c2xg(x)
【证明】
∂ [ c 1 f ( x ) ± c 2 g ( x ) ] ∂ x = [ ∂ ( c 1 f + c 2 g ) ∂ x 1 ∂ ( c 1 f + c 2 g ) ∂ x 2 ⋮ ∂ ( c 1 f + c 2 g ) ∂ x n ] = [ c 1 ∂ f ∂ x 1 + c 2 ∂ g ∂ x 1 c 1 ∂ f ∂ x 2 + c 2 ∂ g ∂ x 2 ⋮ c 1 ∂ f ∂ x n + c 2 ∂ g ∂ x n ] = c 1 [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] + c 2 [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] = c 1 ∂ f ( x ) ∂ x ± c 2 ∂ g ( x ) ∂ x \begin{aligned} \frac{\partial \left[ c_1f\left( \boldsymbol{x} \right) \pm c_2g\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial \left( c_1f+c_2g \right)}{\partial x_1}\\ \\ \frac{\partial \left( c_1f+c_2g \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( c_1f+c_2g \right)}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} c_1\frac{\partial f}{\partial x_1}+c_2\frac{\partial g}{\partial x_1}\\ \\ c_1\frac{\partial f}{\partial x_2}+c_2\frac{\partial g}{\partial x_2}\\ \\ \vdots\\ \\ c_1\frac{\partial f}{\partial x_n}+c_2\frac{\partial g}{\partial x_n}\\ \end{array} \right] \\ \\ &=c_1\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] +c_2\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] \\ \\ &=c_1\frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}\pm c_2\frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \end{aligned} x[c1f(x)±c2g(x)]= x1(c1f+c2g)x2(c1f+c2g)xn(c1f+c2g) = c1x1f+c2x1gc1x2f+c2x2gc1xnf+c2xng =c1 x1fx2fxnf +c2 x1fx2fxnf =c1xf(x)±c2xg(x)

1.2.2 乘积公式

∂ [ f ( x ) g ( x ) ] ∂ x = ∂ f ( x ) ∂ x g ( x ) ± f ( x ) ∂ g ( x ) ∂ x \frac{\partial \left[ f\left( \boldsymbol{x} \right) g\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}=\frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}g\left( \boldsymbol{x} \right) \pm f\left( \boldsymbol{x} \right) \frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} x[f(x)g(x)]=xf(x)g(x)±f(x)xg(x)
【证明】
∂ [ f ( x ) g ( x ) ] ∂ x = [ ∂ ( f g ) ∂ x 1 ∂ ( f g ) ∂ x 2 ⋮ ∂ ( f g ) ∂ x n ] = [ ∂ f ∂ x 1 ⋅ g + f ⋅ ∂ g ∂ x 1 ∂ f ∂ x 2 ⋅ g + f ⋅ ∂ g ∂ x 2 ⋮ ∂ f ∂ x n ⋅ g + f ⋅ ∂ g ∂ x n ] = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] g + f [ ∂ g ∂ x 1 ∂ g ∂ x 2 ⋮ ∂ g ∂ x n ] = ∂ f ( x ) ∂ x g ( x ) ± f ( x ) ∂ g ( x ) ∂ x \begin{aligned} \frac{\partial \left[ f\left( \boldsymbol{x} \right) g\left( \boldsymbol{x} \right) \right]}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial \left( fg \right)}{\partial x_1}\\ \\ \frac{\partial \left( fg \right)}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial \left( fg \right)}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\cdot g+f\cdot \frac{\partial g}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\cdot g+f\cdot \frac{\partial g}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\cdot g+f\cdot \frac{\partial g}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] g+f\left[ \begin{array}{c} \frac{\partial g}{\partial x_1}\\ \\ \frac{\partial g}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial g}{\partial x_n}\\ \end{array} \right] \\ \\ &=\frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}g\left( \boldsymbol{x} \right) \pm f\left( \boldsymbol{x} \right) \frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \end{aligned} x[f(x)g(x)]= x1(fg)x2(fg)xn(fg) = x1fg+fx1gx2fg+fx2gxnfg+fxng = x1fx2fxnf g+f x1gx2gxng =xf(x)g(x)±f(x)xg(x)

1.2.3 商公式

∂ [ f ( x ) g ( x ) ] ∂ x = 1 g 2 ( x ) [ ∂ f ( x ) ∂ x g ( x ) − f ( x ) ∂ g ( x ) ∂ x ] \frac{\partial \left[ \frac{f\left( \boldsymbol{x} \right)}{g\left( \boldsymbol{x} \right)} \right]}{\partial \boldsymbol{x}}=\frac{1}{g^2\left( \boldsymbol{x} \right)}\left[ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}g\left( \boldsymbol{x} \right) -f\left( \boldsymbol{x} \right) \frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \right] x[g(x)f(x)]=g2(x)1[xf(x)g(x)f(x)xg(x)]

【证明】
∂ [ f ( x ) g ( x ) ] ∂ x = [ ∂ ( f / g ) ∂ x 1 ∂ ( f / g ) ∂ x 2 ⋮ ∂ ( f / g ) ∂ x n ] = [ 1 g 2 ( ∂ f ∂ x 1 g − f ∂ g ∂ x 1 ) 1 g 2 ( ∂ f ∂ x 2 g − f ∂ g ∂ x 2 ) ⋮ 1 g 2 ( ∂ f ∂ x n g − f ∂ g ∂ x n ) ] = 1 g 2 ( [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] g − f [ ∂ g ∂ x 1 ∂ g ∂ x 2 ⋮ ∂ g ∂ x n ] ) = 1 g 2 ( x ) [ ∂ f ( x ) ∂ x g ( x ) − f ( x ) ∂ g ( x ) ∂ x ] \begin{aligned} \frac{\partial \left[ \frac{f\left( \boldsymbol{x} \right)}{g\left( \boldsymbol{x} \right)} \right]}{\partial \boldsymbol{x}}&=\left[ \begin{array}{c} \frac{\partial \left( {{f}\Bigg/{g}} \right)}{\partial x_1}\\ \frac{\partial \left( {{f}\Bigg/{g}} \right)}{\partial x_2}\\ \vdots\\ \frac{\partial \left( {{f}\Bigg/{g}} \right)}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} \frac{1}{g^2}\left( \frac{\partial f}{\partial x_1}g-f\frac{\partial g}{\partial x_1} \right)\\ \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial x_2}g-f\frac{\partial g}{\partial x_2} \right)\\ \\ \vdots\\ \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial x_n}g-f\frac{\partial g}{\partial x_n} \right)\\ \end{array} \right] \\ \\ &=\frac{1}{g^2}\left( \left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] g-f\left[ \begin{array}{c} \frac{\partial g}{\partial x_1}\\ \\ \frac{\partial g}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial g}{\partial x_n}\\ \end{array} \right] \right) \\ \\ &=\frac{1}{g^2\left( \boldsymbol{x} \right)}\left[ \frac{\partial f\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}}g\left( \boldsymbol{x} \right) -f\left( \boldsymbol{x} \right) \frac{\partial g\left( \boldsymbol{x} \right)}{\partial \boldsymbol{x}} \right] \end{aligned} x[g(x)f(x)]= x1(f/g)x2(f/g)xn(f/g) = g21(x1fgfx1g)g21(x2fgfx2g)g21(xnfgfxng) =g21 x1fx2fxnf gf x1gx2gxng =g2(x)1[xf(x)g(x)f(x)xg(x)]
其中
g ( x ) ≠ 0 g\left( \boldsymbol{x} \right) \ne 0 g(x)=0

1.3 示例

【例1.1】设 x = [ x 1 , x 2 , ⋯   , x m ] T \boldsymbol{x}=\left[ x_1,x_2,\cdots ,x_m \right] ^T x=[x1,x2,,xm]T,求函数 f ( x ) = x T x = x 1 2 + x 2 2 + ⋯ + x n 2 f\left( \boldsymbol{x} \right) =\boldsymbol{x}^T\boldsymbol{x}={x_1}^2+{x_2}^2+\cdots +{x_n}^2 f(x)=xTx=x12+x22++xn2
x \boldsymbol{x} x的导数。
【解】
∂ f ∂ x = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] = [ 2 x 1 2 x 2 ⋮ 2 x n ] = 2 x \frac{\partial f}{\partial \boldsymbol{x}}=\left[ \begin{array}{c} \frac{\partial f}{\partial x_1}\\ \\ \frac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \frac{\partial f}{\partial x_n}\\ \end{array} \right] =\left[ \begin{array}{c} 2x_1\\ 2x_2\\ \vdots\\ 2x_n\\ \end{array} \right] =2\boldsymbol{x} xf= x1fx2fxnf = 2x12x22xn =2x
即:
∂ x T x ∂ x = 2 x \frac{\partial \boldsymbol{x}^T\boldsymbol{x}}{\partial \boldsymbol{x}}=2\boldsymbol{x} xxTx=2x

2 标量对矩阵求导

2.1 定义

f ( X ) = f ( x 11 , x 12 , ⋯   , x 1 n , x 21 , x 22 , ⋯   , x m 1 , x m 2 , ⋯   , x m n ) f\left( \boldsymbol{X} \right) =f\left( x_{11},x_{12},\cdots ,x_{1n},x_{21},x_{22},\cdots ,x_{m1,}x_{m2},\cdots ,x_{mn} \right) f(X)=f(x11,x12,,x1n,x21,x22,,xm1,xm2,,xmn)
X = [ x 11 x 12 ⋯ x 1 n x 21 x 22 ⋯ x 2 n ⋮ ⋮ ⋱ ⋮ x m 1 x m 2 ⋯ x m n ] m × n \boldsymbol{X}=\left[ \begin{matrix} x_{11}& x_{12}& \cdots& x_{1n}\\ x_{21}& x_{22}& \cdots& x_{2n}\\ \vdots& \vdots& \ddots& \vdots\\ x_{m1}& x_{m2}& \cdots& x_{mn}\\ \end{matrix} \right] _{m\times n} X= x11x21xm1x12x22xm2x1nx2nxmn m×n
则:
∂ f ∂ X = [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n f ∂ x 21 f ∂ x 22 ⋯ f ∂ x 2 n   ⋮ ⋮ ⋱ ⋮   f ∂ x m 1 f ∂ x m 2 ⋯ f ∂ x m n ] m × n \frac{\partial f}{\partial \boldsymbol{X}}=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{f}{\partial x_{21}}& \frac{f}{\partial x_{22}}& \cdots& \frac{f}{\partial x_{2n}}\\ \\\ \vdots& \vdots& \ddots& \vdots\\ \\\ \frac{f}{\partial x_{m1}}& \frac{f}{\partial x_{m2}}& \cdots& \frac{f}{\partial x_{mn}}\\ \end{matrix} \right] _{m\times n} Xf= x11fx21f  xm1fx12fx22fxm2fx1nfx2nfxmnf m×n

2.2 运算法则

2.2.1 线性法则

∂ [ c 1 f ( X ) + c 2 g ( X ) ] ∂ X = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \frac{\partial \left[ c_1f\left( \boldsymbol{X} \right) +c_2g\left( \boldsymbol{X} \right) \right]}{\partial \boldsymbol{X}}=c_1\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}+c_2\frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} X[c1f(X)+c2g(X)]=c1Xf(X)+c2Xg(X)
其中 c 1 c_1 c1 c 2 c_2 c2为常数。
【证明】
∂ [ c 1 f ( X ) + c 2 g ( X ) ] ∂ X = [ ∂ ( c 1 f + c 2 g ) ∂ x 11 ∂ ( c 1 f + c 2 g ) ∂ x 12 ⋯ ∂ ( c 1 f + c 2 g ) ∂ x 1 n ∂ ( c 1 f + c 2 g ) ∂ x 21 ∂ ( c 1 f + c 2 g ) ∂ x 22 ⋯ ∂ ( c 1 f + c 2 g ) ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ ( c 1 f + c 2 g ) ∂ x n 1 ∂ ( c 1 f + c 2 g ) ∂ x n 2 ⋯ ∂ ( c 1 f + c 2 g ) ∂ x n n ] = [ c 1 ∂ f ∂ x 11 + c 2 ∂ g ∂ x 11 c 1 ∂ f ∂ x 12 + c 2 ∂ g ∂ x 12 ⋯ c 1 ∂ f ∂ x 1 n + c 2 ∂ g ∂ x 1 n c 1 ∂ f ∂ x 21 + c 2 ∂ g ∂ x 21 c 1 ∂ f ∂ x 22 + c 2 ∂ g ∂ x 22 ⋯ c 1 ∂ f ∂ x 2 n + c 2 ∂ g ∂ x 2 n ⋮ ⋮ ⋱ ⋮ c 1 ∂ f ∂ x n 1 + c 2 ∂ g ∂ x n 1 c 1 ∂ f ∂ x n 2 + c 2 ∂ g ∂ x n 2 ⋯ c 1 ∂ f ∂ x n n + c 2 ∂ g ∂ x n n ] = c 1 [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 ⋯ ∂ f ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ∂ x n 1 ∂ f ∂ x n 2 ⋯ ∂ f ∂ x n n ] + c 2 [ ∂ g ∂ x 11 ∂ g ∂ x 12 ⋯ ∂ g ∂ x 1 n ∂ g ∂ x 21 ∂ g ∂ x 22 ⋯ ∂ g ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ g ∂ x n 1 ∂ g ∂ x n 2 ⋯ ∂ g ∂ x n n ] = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \begin{aligned} \frac{\partial \left[ c_1f\left( \boldsymbol{X} \right) +c_2g\left( \boldsymbol{X} \right) \right]}{\partial \boldsymbol{X}}&=\left[ \begin{matrix} \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{11}}& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{12}}& \cdots& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{1n}}\\ \\ \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{21}}& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{22}}& \cdots& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{n1}}& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{n2}}& \cdots& \frac{\partial \left( c_1f+c_2g \right)}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} c_1\frac{\partial f}{\partial x_{11}}+c_2\frac{\partial g}{\partial x_{11}}& c_1\frac{\partial f}{\partial x_{12}}+c_2\frac{\partial g}{\partial x_{12}}& \cdots& c_1\frac{\partial f}{\partial x_{1n}}+c_2\frac{\partial g}{\partial x_{1n}}\\ \\ c_1\frac{\partial f}{\partial x_{21}}+c_2\frac{\partial g}{\partial x_{21}}& c_1\frac{\partial f}{\partial x_{22}}+c_2\frac{\partial g}{\partial x_{22}}& \cdots& c_1\frac{\partial f}{\partial x_{2n}}+c_2\frac{\partial g}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ c_1\frac{\partial f}{\partial x_{n1}}+c_2\frac{\partial g}{\partial x_{n1}}& c_1\frac{\partial f}{\partial x_{n2}}+c_2\frac{\partial g}{\partial x_{n2}}& \cdots& c_1\frac{\partial f}{\partial x_{nn}}+c_2\frac{\partial g}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=c_1\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{n1}}& \frac{\partial f}{\partial x_{n2}}& \cdots& \frac{\partial f}{\partial x_{nn}}\\ \end{matrix} \right] +c_2\left[ \begin{matrix} \frac{\partial g}{\partial x_{11}}& \frac{\partial g}{\partial x_{12}}& \cdots& \frac{\partial g}{\partial x_{1n}}\\ \\ \frac{\partial g}{\partial x_{21}}& \frac{\partial g}{\partial x_{22}}& \cdots& \frac{\partial g}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial g}{\partial x_{n1}}& \frac{\partial g}{\partial x_{n2}}& \cdots& \frac{\partial g}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=c_1\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}+c_2\frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \end{aligned} X[c1f(X)+c2g(X)]= x11(c1f+c2g)x21(c1f+c2g)xn1(c1f+c2g)x12(c1f+c2g)x22(c1f+c2g)xn2(c1f+c2g)x1n(c1f+c2g)x2n(c1f+c2g)xnn(c1f+c2g) = c1x11f+c2x11gc1x21f+c2x21gc1xn1f+c2xn1gc1x12f+c2x12gc1x22f+c2x22gc1xn2f+c2xn2gc1x1nf+c2x1ngc1x2nf+c2x2ngc1xnnf+c2xnng =c1 x11fx21fxn1fx12fx22fxn2fx1nfx2nfxnnf +c2 x11gx21gxn1gx12gx22gxn2gx1ngx2ngxnng =c1Xf(X)+c2Xg(X)

2.2.2 乘积公式

∂ [ f ( X ) g ( X ) ] ∂ X = ∂ f ( X ) ∂ X g ( X ) + f ( X ) ∂ g ( X ) ∂ X \frac{\partial \left[ f\left( \boldsymbol{X} \right) g\left( \boldsymbol{X} \right) \right]}{\partial \boldsymbol{X}}=\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}g\left( \boldsymbol{X} \right) +f\left( \boldsymbol{X} \right) \frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} X[f(X)g(X)]=Xf(X)g(X)+f(X)Xg(X)
【证明】
∂ [ f ( X ) g ( X ) ] ∂ X = [ ∂ ( f g ) ∂ x 11 ∂ ( f g ) ∂ x 12 ⋯ ∂ ( f g ) ∂ x 1 n ∂ ( f g ) ∂ x 21 ∂ ( f g ) ∂ x 22 ⋯ ∂ ( f g ) ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ ( f g ) ∂ x n 1 ∂ ( f g ) ∂ x n 2 ⋯ ∂ ( f g ) ∂ x n n ] = [ ∂ f ∂ x 11 g + f ∂ g ∂ x 11 ∂ f ∂ x 12 g + f ∂ g ∂ x 12 ⋯ ∂ f ∂ x 1 n g + f ∂ g ∂ x 1 n ∂ f ∂ x 21 g + f ∂ g ∂ x 21 ∂ f ∂ x 22 g + f ∂ g ∂ x 22 ⋯ ∂ f ∂ x 2 n g + f ∂ g ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ∂ x n 1 g + f ∂ g ∂ x n 1 ∂ f ∂ x n 2 g + f ∂ g ∂ x n 2 ⋯ ∂ f ∂ x n n g + f ∂ g ∂ x n n ] = [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 ⋯ ∂ f ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ∂ x n 1 ∂ f ∂ x n 2 ⋯ ∂ f ∂ x n n ] g + f [ ∂ g ∂ x 11 ∂ g ∂ x 12 ⋯ ∂ g ∂ x 1 n ∂ g ∂ x 21 ∂ g ∂ x 22 ⋯ ∂ g ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ g ∂ x n 1 ∂ g ∂ x n 2 ⋯ ∂ g ∂ x n n ] = ∂ f ( X ) ∂ X g ( X ) + f ( X ) ∂ g ( X ) ∂ X \begin{aligned} \frac{\partial \left[ f\left( \boldsymbol{X} \right) g\left( \boldsymbol{X} \right) \right]}{\partial \boldsymbol{X}}&=\left[ \begin{matrix} \frac{\partial \left( fg \right)}{\partial x_{11}}& \frac{\partial \left( fg \right)}{\partial x_{12}}& \cdots& \frac{\partial \left( fg \right)}{\partial x_{1n}}\\ \\ \frac{\partial \left( fg \right)}{\partial x_{21}}& \frac{\partial \left( fg \right)}{\partial x_{22}}& \cdots& \frac{\partial \left( fg \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial \left( fg \right)}{\partial x_{n1}}& \frac{\partial \left( fg \right)}{\partial x_{n2}}& \cdots& \frac{\partial \left( fg \right)}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}g+f\frac{\partial g}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}g+f\frac{\partial g}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}g+f\frac{\partial g}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}g+f\frac{\partial g}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}g+f\frac{\partial g}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}g+f\frac{\partial g}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{n1}}g+f\frac{\partial g}{\partial x_{n1}}& \frac{\partial f}{\partial x_{n2}}g+f\frac{\partial g}{\partial x_{n2}}& \cdots& \frac{\partial f}{\partial x_{nn}}g+f\frac{\partial g}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{n1}}& \frac{\partial f}{\partial x_{n2}}& \cdots& \frac{\partial f}{\partial x_{nn}}\\ \end{matrix} \right] g+f\left[ \begin{matrix} \frac{\partial g}{\partial x_{11}}& \frac{\partial g}{\partial x_{12}}& \cdots& \frac{\partial g}{\partial x_{1n}}\\ \\ \frac{\partial g}{\partial x_{21}}& \frac{\partial g}{\partial x_{22}}& \cdots& \frac{\partial g}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial g}{\partial x_{n1}}& \frac{\partial g}{\partial x_{n2}}& \cdots& \frac{\partial g}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=\frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}g\left( \boldsymbol{X} \right) +f\left( \boldsymbol{X} \right) \frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \end{aligned} X[f(X)g(X)]= x11(fg)x21(fg)xn1(fg)x12(fg)x22(fg)xn2(fg)x1n(fg)x2n(fg)xnn(fg) = x11fg+fx11gx21fg+fx21gxn1fg+fxn1gx12fg+fx12gx22fg+fx22gxn2fg+fxn2gx1nfg+fx1ngx2nfg+fx2ngxnnfg+fxnng = x11fx21fxn1fx12fx22fxn2fx1nfx2nfxnnf g+f x11gx21gxn1gx12gx22gxn2gx1ngx2ngxnng =Xf(X)g(X)+f(X)Xg(X)

2.2.3 商公式

∂ [ f ( X ) g ( X ) ] ∂ X = 1 g 2 ( X ) [ ∂ f ( X ) ∂ X g ( X ) − f ( X ) ∂ g ( X ) ∂ X ] \frac{\partial \left[ \frac{f\left( \boldsymbol{X} \right)}{g\left( \boldsymbol{X} \right)} \right]}{\partial \boldsymbol{X}}=\frac{1}{g^2\left( \boldsymbol{X} \right)}\left[ \frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}g\left( \boldsymbol{X} \right) -f\left( \boldsymbol{X} \right) \frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \right] X[g(X)f(X)]=g2(X)1[Xf(X)g(X)f(X)Xg(X)]
【证明】
∂ [ f ( X ) g ( X ) ] ∂ X = [ ∂ ( f g ) ∂ x 11 ∂ ( f g ) ∂ x 12 ⋯ ∂ ( f g ) ∂ x 1 n ∂ ( f g ) ∂ x 21 ∂ ( f g ) ∂ x 22 ⋯ ∂ ( f g ) ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ ( f g ) ∂ x n 1 ∂ ( f g ) ∂ x n 2 ⋯ ∂ ( f g ) ∂ x n n ] = [ 1 g 2 ( ∂ f ∂ x 11 g − f ∂ g ∂ x 11 ) 1 g 2 ( ∂ f ∂ x 12 g − f ∂ g ∂ x 12 ) ⋯ 1 g 2 ( ∂ f ∂ x 1 n g − f ∂ g ∂ x 1 n ) 1 g 2 ( ∂ f ∂ x 21 g − f ∂ g ∂ x 21 ) 1 g 2 ( ∂ f ∂ x 22 g − f ∂ g ∂ x 22 ) ⋯ 1 g 2 ( ∂ f ∂ x 2 n g − f ∂ g ∂ x 2 n ) ⋮ ⋮ ⋱ ⋮ 1 g 2 ( ∂ f ∂ x n 1 g − f ∂ g ∂ x n 1 ) 1 g 2 ( ∂ f ∂ x n 2 g − f ∂ g ∂ x n 2 ) ⋯ 1 g 2 ( ∂ f ∂ x n n g − f ∂ g ∂ x n n ) ] = 1 g 2 ( [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 ⋯ ∂ f ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ f ∂ x n 1 ∂ f ∂ x n 2 ⋯ ∂ f ∂ x n n ] g − f [ ∂ g ∂ x 11 ∂ g ∂ x 12 ⋯ ∂ g ∂ x 1 n ∂ g ∂ x 21 ∂ g ∂ x 22 ⋯ ∂ g ∂ x 2 n ⋮ ⋮ ⋱ ⋮ ∂ g ∂ x n 1 ∂ g ∂ x n 2 ⋯ ∂ g ∂ x n n ] ) = 1 g 2 ( X ) [ ∂ f ( X ) ∂ X g ( X ) − f ( X ) ∂ g ( X ) ∂ X ] \begin{aligned} \frac{\partial \left[ \frac{f\left( \boldsymbol{X} \right)}{g\left( \boldsymbol{X} \right)} \right]}{\partial \boldsymbol{X}}&=\left[ \begin{matrix} \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{11}}& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{12}}& \cdots& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{1n}}\\ \\ \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{21}}& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{22}}& \cdots& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{n1}}& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{n2}}& \cdots& \frac{\partial \left( \frac{f}{g} \right)}{\partial x_{nn}}\\ \end{matrix} \right] \\ \\ &=\left[ \begin{matrix} \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{11}}g-f\frac{\partial g}{\partial x_{11}} \right)& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{12}}g-f\frac{\partial g}{\partial x_{12}} \right)& \cdots& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{1n}}g-f\frac{\partial g}{\partial x_{1n}} \right)\\ & & & \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{21}}g-f\frac{\partial g}{\partial x_{21}} \right)& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{22}}g-f\frac{\partial g}{\partial x_{22}} \right)& \cdots& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{2n}}g-f\frac{\partial g}{\partial x_{2n}} \right)\\ & & & \\ \vdots& \vdots& \ddots& \vdots\\ & & & \\ \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{n1}}g-f\frac{\partial g}{\partial x_{n1}} \right)& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{n2}}g-f\frac{\partial g}{\partial x_{n2}} \right)& \cdots& \frac{1}{g^2}\left( \frac{\partial f}{\partial x_{nn}}g-f\frac{\partial g}{\partial x_{nn}} \right)\\ \end{matrix} \right] \\ \\ &=\frac{1}{g^2}\left( \left[ \begin{matrix} \frac{\partial f}{\partial x_{11}}& \frac{\partial f}{\partial x_{12}}& \cdots& \frac{\partial f}{\partial x_{1n}}\\ \\ \frac{\partial f}{\partial x_{21}}& \frac{\partial f}{\partial x_{22}}& \cdots& \frac{\partial f}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial f}{\partial x_{n1}}& \frac{\partial f}{\partial x_{n2}}& \cdots& \frac{\partial f}{\partial x_{nn}}\\ \end{matrix} \right] g-f\left[ \begin{matrix} \frac{\partial g}{\partial x_{11}}& \frac{\partial g}{\partial x_{12}}& \cdots& \frac{\partial g}{\partial x_{1n}}\\ \\ \frac{\partial g}{\partial x_{21}}& \frac{\partial g}{\partial x_{22}}& \cdots& \frac{\partial g}{\partial x_{2n}}\\ \\ \vdots& \vdots& \ddots& \vdots\\ \\ \frac{\partial g}{\partial x_{n1}}& \frac{\partial g}{\partial x_{n2}}& \cdots& \frac{\partial g}{\partial x_{nn}}\\ \end{matrix} \right] \right) \\ \\ &=\frac{1}{g^2\left( \boldsymbol{X} \right)}\left[ \frac{\partial f\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}}g\left( \boldsymbol{X} \right) -f\left( \boldsymbol{X} \right) \frac{\partial g\left( \boldsymbol{X} \right)}{\partial \boldsymbol{X}} \right] \end{aligned} X[g(X)f(X)]= x11(gf)x21(gf)xn1(gf)x12(gf)x22(gf)xn2(gf)x1n(gf)x2n(gf)xnn(gf) = g21(x11fgfx11g)g21(x21fgfx21g)g21(xn1fgfxn1g)g21(x12fgfx12g)g21(x22fgfx22g)g21(xn2fgfxn2g)g21(x1nfgfx1ng)g21(x2nfgfx2ng)g21(xnnfgfxnng) =g21 x11fx21fxn1fx12fx22fxn2fx1nfx2nfxnnf gf x11gx21gxn1gx12gx22gxn2gx1ngx2ngxnng =g2(X)1[Xf(X)g(X)f(X)Xg(X)]
其中
g ( X ) ≠ 0 g\left( \boldsymbol{X} \right) \ne 0 g(X)=0

2.3 示例

【例2】求函数 f ( A ) = x T A x f\left( \boldsymbol{A} \right) =\boldsymbol{x}^T\boldsymbol{Ax} f(A)=xTAx对矩阵 A \boldsymbol{A} A的导数。其中 A \boldsymbol{A} A是对称矩阵, x \boldsymbol{x} x为三维定常列向量。
【解】
f ( A ) = [ x 1 x 2 x 3 ] [ a 11 a 12 a 13 a 21 a 22 a 23 a 31 a 32 a 33 ] [ x 1 x 2 x 3 ] = a 11 ⋅ x 1 2 + a 22 ⋅ x 2 2 + a 33 ⋅ x 3 2 + 2 a 12 x 1 ⋅ x 2 + 2 a 13 x 1 ⋅ x 3 + 2 a 23 x 2 ⋅ x 3 f\left( \boldsymbol{A} \right) =\left[ \begin{matrix} x_1& x_2& x_3\\ \end{matrix} \right] \left[ \begin{matrix} a_{11}& a_{12}& a_{13}\\ a_{21}& a_{22}& a_{23}\\ a_{31}& a_{32}& a_{33}\\ \end{matrix} \right] \left[ \begin{array}{c} x_1\\ x_2\\ x_3\\ \end{array} \right] \\ \\ =a_{11}\cdot {x_1}^2+a_{22}\cdot {x_2}^2+a_{33}\cdot {x_3}^2+2a_{12}x_1\cdot x_2+2a_{13}x_1\cdot x_3+2a_{23}x_2\cdot x_3 f(A)=[x1x2x3] a11a21a31a12a22a32a13a23a33 x1x2x3 =a11x12+a22x22+a33x32+2a12x1x2+2a13x1x3+2a23x2x3
根据定义,有:
∂ f ( A ) ∂ A = [ ∂ f ∂ a 11 ∂ f ∂ a 12 ∂ f ∂ a 13 ∂ f ∂ a 21 ∂ f ∂ a 22 ∂ f ∂ a 23 ∂ f ∂ a 31 ∂ f ∂ a 32 ∂ f ∂ a 33 ] = [ x 1 2 x 1 x 2 x 1 x 3 x 1 x 2 x 2 2 x 2 x 3 x 3 x 1 x 3 x 2 x 3 2 ] = [ x 1 x 2 x 3 ] [ x 1 x 2 x 3 ] = x x T \begin{aligned} \frac{\partial f\left( \boldsymbol{A} \right)}{\partial \boldsymbol{A}}&=\left[ \begin{matrix} \frac{\partial f}{\partial a_{11}}& \frac{\partial f}{\partial a_{12}}& \frac{\partial f}{\partial a_{13}}\\ & & \\ \mathrm{ }\frac{\partial f}{\partial a_{21}}& \frac{\partial f}{\partial a_{22}}& \frac{\partial f}{\partial a_{23}}\\ & & \\ \mathrm{ }\frac{\partial f}{\partial a_{31}}& \frac{\partial f}{\partial a_{32}}& \frac{\partial f}{\partial a_{33}}\\ \end{matrix} \right] =\left[ \begin{matrix} {x_1}^2& x_1x_2& x_1x_3\\ & & \\ x_1x_2& {x_2}^2& x_2x_3\\ & & \\ x_3x_1& x_3x_2& {x_3}^2\\ \end{matrix} \right] =\left[ \begin{array}{c} x_1\\ x_2\\ x_3\\ \end{array} \right] \left[ \begin{matrix} x_1& x_2& x_3\\ \end{matrix} \right] \\ \\ &=\boldsymbol{xx}^T \end{aligned} Af(A)= a11fa21fa31fa12fa22fa32fa13fa23fa33f = x12x1x2x3x1x1x2x22x3x2x1x3x2x3x32 = x1x2x3 [x1x2x3]=xxT
即:
∂ x T A x ∂ A = x x T \frac{\partial \boldsymbol{x}^T\boldsymbol{Ax}}{\partial \boldsymbol{A}}=\boldsymbol{xx}^T AxTAx=xxT

3 向量对标量求导

3.1 定义

y = [ y 1 , y 2 , ⋯   , y m ] T \boldsymbol{y}=\left[ y_1,y_2,\cdots ,y_m \right] ^T y=[y1,y2,,ym]T,在分子布局下,有:
∂ y ∂ x = [ ∂ y 1 ∂ x , ∂ y 2 ∂ x , ⋯   , ∂ y m ∂ x ] T \frac{\partial \boldsymbol{y}}{\partial x}=\left[ \frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},\cdots ,\frac{\partial y_m}{\partial x} \right] ^T xy=[xy1,xy2,,xym]T
在分母布局下,有:
∂ y ∂ x = [ ∂ y 1 ∂ x , ∂ y 2 ∂ x , ⋯   , ∂ y m ∂ x ] \frac{\partial \boldsymbol{y}}{\partial x}=\left[ \frac{\partial y_1}{\partial x},\frac{\partial y_2}{\partial x},\cdots ,\frac{\partial y_m}{\partial x} \right] xy=[xy1,xy2,,xym]

3.2 标量变元的微分

∂ ( A ± B ) ∂ t = ∂ A ∂ t ± ∂ B ∂ t \frac{\partial \left( A\pm B \right)}{\partial t}=\frac{\partial A}{\partial t}\pm \frac{\partial B}{\partial t} t(A±B)=tA±tB

∂ ( λ A ) ∂ t = ∂ λ ∂ t A + λ ∂ A ∂ t \frac{\partial \left( \lambda A \right)}{\partial t}=\frac{\partial \mathrm{\lambda}}{\partial t}A+\lambda \frac{\partial A}{\partial t} t(λA)=tλA+λtA

∂ ( a T b ) ∂ t = ∂ a T ∂ t b + a T ∂ b ∂ t \frac{\partial \left( a^Tb \right)}{\partial t}=\frac{\partial a^T}{\partial t}b+a^T\frac{\partial b}{\partial t} t(aTb)=taTb+aTtb
∂ ( A B ) ∂ t = ∂ A ∂ t B + A ∂ B ∂ t \frac{\partial \left( AB \right)}{\partial t}=\frac{\partial A}{\partial t}B+A\frac{\partial B}{\partial t} t(AB)=tAB+AtB

参考文献

[1] 矩阵的求导
[2] 矩阵求导、几种重要的矩阵及常用的矩阵求导公式
[3] 矩阵、向量微分计算

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值