线性代数之 矩阵求导(2)标量函数求导基本法则与公式

前言

上篇矩阵求导(1)解决了求导时的布局问题,也是矩阵求导最基础的求导方法。现在进入矩阵求导的核心:基本求导法则与基本公式。

基本约定

本篇只涉及标量对向量、矩阵的求导,默认向量是列向量。

标量对向量求导

基本法则

常数求导:
∂ c 0 ∂ x = 0 n × 1 \frac {\partial c_0}{\partial x}=0^{n\times 1} xc0=0n×1
常数求导很简单,在此不证明。


线性变换:
∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}=c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} x(c1f(x)+c2g(x))=c1xf+c2xg
证明:
∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x = [ ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 1 ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x 2 … ∂ ( c 1 f ( x ) + c 2 g ( x ) ) ∂ x n ] = [ c 1 ∂ ( f ( x ) ) ∂ x 1 c 1 ∂ ( f ( x ) ) ∂ x 2 … c 1 ∂ ( f ( x ) ) ∂ x n ] + [ c 2 ∂ ( g ( x ) ) ∂ x 1 c 2 ∂ ( g ( x ) ) ∂ x 2 … c 2 ∂ ( g ( x ) ) ∂ x n ] = c 1 ∂ f ∂ x + c 2 ∂ g ∂ x \frac {\partial (c_1f(x)+c_2g(x))}{\partial x}= \begin{bmatrix} \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_1}\\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_2}\\ \dots \\ \frac {\partial (c_1f(x)+c_2g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ =\begin{bmatrix} \frac {c_1\partial (f(x))}{\partial x_1}\\ \frac {c_1\partial (f(x))}{\partial x_2}\\ \dots \\ \frac {c_1\partial (f(x))}{\partial x_n} \end{bmatrix} + \begin{bmatrix} \frac {c_2\partial (g(x))}{\partial x_1}\\ \frac {c_2\partial (g(x))}{\partial x_2}\\ \dots \\ \frac {c_2\partial (g(x))}{\partial x_n} \end{bmatrix} \\ \quad \\ = c_1\frac {\partial f}{\partial x}+c_2\frac {\partial g}{\partial x} x(c1f(x)+c2g(x))=x1(c1f(x)+c2g(x))x2(c1f(x)+c2g(x))xn(c1f(x)+c2g(x))=x1c1(f(x))x2c1(f(x))xnc1(f(x))+x1c2(g(x))x2c2(g(x))xnc2(g(x))=c1xf+c2xg
加减法就不细说了,和普通函数求导是一样的,也很好证。


乘积:
∂ ( f ( x ) g ( x ) ) ∂ x = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial (f(x)g(x))}{\partial x}= \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} x(f(x)g(x))=xf(x)g(x)+f(x)xg(x)
证明:
∂ f ( x ) g ( x ) ∂ x = [ ∂ f g ∂ x 1 ∂ f g ∂ x 2 … ∂ f g ∂ x n ] = [ ∂ f ∂ x 1 g + f ∂ g ∂ x 1 ∂ f ∂ x 2 g + f ∂ g ∂ x 2 … ∂ f ∂ x n g + f ∂ g ∂ x n ] = [ ∂ f ∂ x 1 ∂ f ∂ x 2 … ∂ f ∂ x n ] g + f [ ∂ g ∂ x 1 ∂ g ∂ x 2 … ∂ g ∂ x n ] = ∂ f ( x ) ∂ x g ( x ) + f ( x ) ∂ g ( x ) ∂ x \frac {\partial f(x)g(x)}{\partial x} = \begin{bmatrix} \frac {\partial fg}{\partial x_1} \\ \frac {\partial fg}{\partial x_2} \\ \dots \\ \frac {\partial fg}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1}g+ f\frac {\partial g}{\partial x_1}\\ \frac {\partial f}{\partial x_2}g+ f\frac {\partial g}{\partial x_2}\\ \dots \\ \frac {\partial f}{\partial x_n}g+ f\frac {\partial g}{\partial x_n}\\ \end{bmatrix}\\ \quad \\ = \begin{bmatrix} \frac {\partial f}{\partial x_1} \\ \frac {\partial f}{\partial x_2} \\ \dots \\ \frac {\partial f}{\partial x_n} \\ \end{bmatrix}g + f\begin{bmatrix} \frac {\partial g}{\partial x_1} \\ \frac {\partial g}{\partial x_2} \\ \dots \\ \frac {\partial g}{\partial x_n} \\ \end{bmatrix}\\ \quad \\ = \frac {\partial f(x)}{\partial x}g(x)+f(x)\frac {\partial g(x)}{\partial x} xf(x)g(x)=x1fgx2fgxnfg=x1fg+fx1gx2fg+fx2gxnfg+fxng=x1fx2fxnfg+fx1gx2gxng=xf(x)g(x)+f(x)xg(x)


除法:
∂ f ( x ) g ( x ) ∂ x = ∂ f ( x ) ∂ x g ( x ) − f ( x ) ∂ g ( x ) ∂ x g ( x ) 2 \frac {\partial \frac {f(x)}{g(x)}}{\partial x} = \frac {\frac {\partial f(x)}{\partial x}g(x) - f(x)\frac {\partial g(x)}{\partial x}}{g(x)^2} xg(x)f(x)=g(x)2xf(x)g(x)f(x)xg(x)
这个证明和乘积的流程是一样的,只是 ∂ ( f g ) / ∂ x \partial (fg)/\partial x (fg)/x ∂ ( f / g ) / ∂ x \partial (f/g)/\partial x (f/g)/x 求导形式不一样而已,在此省略。

公式

公式1
∂ a T x ∂ x = ∂ x T a ∂ x = a \frac {\partial a^Tx}{\partial x}=\frac {\partial x^Ta}{\partial x}=a xaTx=xxTa=a
证明:
∂ a T x ∂ x = ∂ ( a 1 x 1 + a 2 x 2 + ⋯ + a n x n ) ∂ x = ∂ x T a ∂ x = [ a 1 a 2 … a n ] = a \frac {\partial a^Tx}{\partial x}= \frac {\partial (a_1x_1+a_2x_2+\dots+a_nx_n)}{\partial x} =\frac {\partial x^Ta}{\partial x} \\ \quad \\ =\begin{bmatrix} a_1\\ a_2\\ \dots\\ a_n \end{bmatrix} = a xaTx=x(a1x1+a2x2++anxn)=xxTa=a1a2an=a


公式2
∂ f ( x T x ) ∂ x = 2 x ∂ f ( x T x ) ∂ x T = 2 x T \frac {\partial f(x^Tx)}{\partial x}=2x \\ \quad \\ \frac {\partial f(x^Tx)}{\partial x^T}=2x^T \\ xf(xTx)=2xxTf(xTx)=2xT

证明:
∂ f ( x T x ) ∂ x = ∂ ( x 1 2 + x 2 2 + ⋯ + x n 2 ) ∂ x = [ 2 x 1 2 x 2 … 2 x n ] = 2 x \frac {\partial f(x^Tx)}{\partial x}=\frac {\partial (x_1^2+x_2^2+\dots+x_n^2)}{\partial x} \\ = \begin{bmatrix} 2x_1 \\ 2x_2 \\ \dots \\ 2x_n \end{bmatrix} =2x xf(xTx)=x(x12+x22++xn2)=2x12x22xn=2x


公式3
∂ f ( x T A x ) ∂ x = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=Ax+A^Tx xf(xTAx)=Ax+ATx

证明:
∂ f ( x T A x ) ∂ x = ∂ ( [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] x ) / ∂ x = ∂ ( a 11 x 1 2 + a 21 x 2 x 1 + ⋯ + a n 1 x n x 1 + a 12 x 1 x 2 + a 22 x 2 2 + ⋯ + a n 2 x n x 2 + … a 1 n x 1 x n + a 2 n x 2 x n + ⋯ + a n n x n x n ) / ∂ x = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n + a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n + a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n + a 1 n x 1 + a 2 n x 2 + ⋯ + a n n x n ] = [ a 11 x 1 + a 12 x 2 + ⋯ + a 1 n x n a 21 x 1 + a 22 x 2 + ⋯ + a 2 n x n … a n 1 x 1 + a n 2 x 2 + ⋯ + a n n x n ] + [ a 11 x 1 + a 21 x 2 + ⋯ + a n 1 x n a 12 x 1 + a 22 x 2 + ⋯ + a n 2 x n … a 1 n x 2 + a 2 n x 2 + ⋯ + a n n x n ] = A x + A T x \frac {\partial f(x^TAx)}{\partial x}=\partial(\begin{bmatrix} a_{11}x_1 + a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1 + a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_1 + a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix}x)/ \partial x \\ =\partial( a_{11}x_1^2 + a_{21}x_2x_1+\dots+a_{n1}x_nx_1 + \\ a_{12}x_1x_2 + a_{22}x_2^2+\dots+a_{n2}x_nx_2 + \\ \dots \\ a_{1n}x_1x_n + a_{2n}x_2x_n+\dots+a_{nn}x_nx_n )/ \partial x \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n +a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n +a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n +a_{1n}x_1+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ =\begin{bmatrix} a_{11}x_1 + a_{12}x_2+\dots+a_{1n}x_n \\ a_{21}x_1 + a_{22}x_2+\dots+a_{2n}x_n \\ \dots \\ a_{n1}x_1 + a_{n2}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} + \begin{bmatrix} a_{11}x_1 +a_{21}x_2+\dots+a_{n1}x_n \\ a_{12}x_1+a_{22}x_2+\dots+a_{n2}x_n \\ \dots \\ a_{1n}x_2+a_{2n}x_2+\dots+a_{nn}x_n \\ \end{bmatrix} \\ \quad \\ = Ax+A^Tx xf(xTAx)=(a11x1+a21x2++an1xna12x1+a22x2++an2xna1nx1+a2nx2++annxnx)/x=(a11x12+a21x2x1++an1xnx1+a12x1x2+a22x22++an2xnx2+a1nx1xn+a2nx2xn++annxnxn)/x=a11x1+a12x2++a1nxn+a11x1+a21x2++an1xna21x1+a22x2++a2nxn+a12x1+a22x2++an2xnan1x1+an2x2++annxn+a1nx1+a2nx2++annxn=a11x1+a12x2++a1nxna21x1+a22x2++a2nxnan1x1+an2x2++annxn+a11x1+a21x2++an1xna12x1+a22x2++an2xna1nx2+a2nx2++annxn=Ax+ATx

公式4:
∂ ( a T x x T b ) ∂ x = a b T x + b a T x \frac {\partial (a^Txx^Tb)}{\partial x}=ab^Tx+ba^Tx x(aTxxTb)=abTx+baTx
证明:
a T x = x T a , x T b = b T x ∂ ( a T x x T b ) ∂ x = ∂ ( x T a b T x ) ∂ x = a b T x + ( a b T ) T x = a b T x + b a T x a^Tx=x^Ta,x^Tb=b^Tx \\ \quad \\ \frac {\partial (a^Txx^Tb)}{\partial x}=\frac {\partial (x^Tab^Tx)}{\partial x}\\ \quad \\ =ab^Tx+(ab^T)^Tx=ab^Tx+ba^Tx aTx=xTa,xTb=bTxx(aTxxTb)=x(xTabTx)=abTx+(abT)Tx=abTx+baTx

标量对矩阵求导

基本法则

常数求导:
∂ c 0 ∂ X = 0 m × n \frac {\partial c_0}{\partial X}=0^{m\times n} Xc0=0m×n
常数求导很简单,在此不证明。


线性变换:
∂ ( c 1 f ( X ) + c 2 g ( X ) ) ∂ X = c 1 ∂ f ( X ) ∂ X + c 2 ∂ g ( X ) ∂ X \frac {\partial (c_1f(X)+c_2g(X))}{\partial X}=c_1\frac {\partial f(X)}{\partial X}+c_2\frac {\partial g(X)}{\partial X} X(c1f(X)+c2g(X))=c1Xf(X)+c2Xg(X)
证明方法与标量的线性变换对向量求导相同。


乘积:
∂ ( f ( X ) g ( X ) ) ∂ X = ∂ f ( X ) ∂ X g ( X ) + f ( X ) ∂ g ( X ) ∂ X \frac {\partial (f(X)g(X))}{\partial X}= \frac {\partial f(X)}{\partial X}g(X)+f(X)\frac {\partial g(X)}{\partial X} X(f(X)g(X))=Xf(X)g(X)+f(X)Xg(X)
证明方法与标量的乘积对向量求导相同。


除法:
∂ f ( X ) g ( X ) ∂ X = ∂ f ( X ) ∂ X g ( X ) − f ( X ) ∂ g ( X ) ∂ X g ( X ) 2 \frac {\partial \frac {f(X)}{g(X)}}{\partial X} = \frac {\frac {\partial f(X)}{\partial X}g(X) - f(X)\frac {\partial g(X)}{\partial X}}{g(X)^2} Xg(X)f(X)=g(X)2Xf(X)g(X)f(X)Xg(X)
证明方法与标量除法对向量求导相同。

公式

公式1:
∂ a T X b ∂ X = a b T \frac {\partial a^TXb}{\partial X}=ab^T XaTXb=abT
证明:
a T X b = a 1 b 1 x 11 + a 2 b 1 x 21 + ⋯ + a n b 1 x n 1 + a 1 b 2 x 12 + a 2 b 2 x 22 + ⋯ + a n b 2 x n 2 + … + a 1 b n x 1 n + a 2 b n x 2 n + ⋯ + a n b n x n n ∂ a T X b ∂ X = [ a 1 b 1 a 1 b 2 … a 1 b n a 2 b 1 a 2 b 2 … a 2 b n … … … … a n b 1 a n b 2 … a n b n ] = a b T a^TXb=a_1b_1x_{11}+a_2b_1x_{21}+\dots+a_nb_1x_{n1} \\ +a_1b_2x_{12}+a_2b_2x_{22}+\dots+a_nb_2x_{n2}\\ +\dots \\+a_1b_nx_{1n}+a_2b_nx_{2n}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TXb}{\partial X}=\begin{bmatrix} a_1b_1 & a_1b_2 & \dots & a_1b_n \\ a_2b_1 & a_2b_2 & \dots & a_2b_n \\ \dots & \dots & \dots & \dots \\ a_nb_1 & a_nb_2 & \dots & a_nb_n \end{bmatrix} =ab^T aTXb=a1b1x11+a2b1x21++anb1xn1+a1b2x12+a2b2x22++anb2xn2++a1bnx1n+a2bnx2n++anbnxnnXaTXb=a1b1a2b1anb1a1b2a2b2anb2a1bna2bnanbn=abT

公式2:
∂ a T X T b ∂ X = b a T \frac {\partial a^TX^Tb}{\partial X}=ba^T XaTXTb=baT
证明:
a T X T b = a 1 b 1 x 11 + a 2 b 1 x 12 + ⋯ + a n b 1 x 1 n + a 1 b 2 x 21 + a 2 b 2 x 22 + ⋯ + a n b 2 x 2 n + … + a 1 b n x n 1 + a 2 b n x n 2 + ⋯ + a n b n x n n ∂ a T X T b ∂ X = [ a 1 b 1 a 2 b 1 … a n b 1 a 1 b 2 a 2 b 2 … a n b 2 … … … … a 1 b n a 2 b n … a n b n ] = b a T a^TX^Tb=a_1b_1x_{11}+a_2b_1x_{12}+\dots+a_nb_1x_{1n} \\ +a_1b_2x_{21}+a_2b_2x_{22}+\dots+a_nb_2x_{2n}\\ +\dots \\+a_1b_nx_{n1}+a_2b_nx_{n2}+\dots+a_nb_nx_{nn} \\ \quad \\ \frac {\partial a^TX^Tb}{\partial X}=\begin{bmatrix} a_1b_1 & a_2b_1 & \dots & a_nb_1 \\ a_1b_2 & a_2b_2 & \dots & a_nb_2 \\ \dots & \dots & \dots & \dots \\ a_1b_n & a_2b_n & \dots & a_nb_n \end{bmatrix} =ba^T aTXTb=a1b1x11+a2b1x12++anb1x1n+a1b2x21+a2b2x22++anb2x2n++a1bnxn1+a2bnxn2++anbnxnnXaTXTb=a1b1a1b2a1bna2b1a2b2a2bnanb1anb2anbn=baT

公式3:
∂ a T X X T b ∂ X = a b T X + b a T X \frac {\partial a^TXX^Tb}{\partial X}=ab^TX+ba^TX XaTXXTb=abTX+baTX
这个证明与之前的标量对向量求导公式3过程类似,但是展开 a T X X T b a^TXX^Tb aTXXTb非常麻烦,在此省略。

后记

本篇写起来太蛮烦了,证明部分的katex写起来简直折磨。下一篇将记录矩阵的迹的性质。

  • 2
    点赞
  • 12
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值