矩阵微分常用公式整理

1.矩阵的导数

\qquad 如果矩阵 A ( t ) = [ a i j ( t ) ] m × n \boldsymbol A(t)=[a_{ij}(t)]_{m\times n} A(t)=[aij(t)]m×n 的每一个元素 a i j ( t ) a_{ij}(t) aij(t) 都是变量 t t t 的可微函数,则称矩阵 A ( t ) \boldsymbol A(t) A(t)可微的,其导数定义为:

d A ( t ) d t = [ d a i j ( t ) d t ] m × n = [ d a 11 ( t ) d t d a 12 ( t ) d t ⋯ d a 1 n ( t ) d t d a 21 ( t ) d t d a 22 ( t ) d t ⋯ d a 2 n ( t ) d t ⋮ ⋮ ⋯ ⋮ d a m 1 ( t ) d t d a m 2 ( t ) d t ⋯ d a m n ( t ) d t ] \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{ij}(t)}{\mathrm{d}t}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{11}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{12}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{1n}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{21}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{22}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{2n}(t)}{\mathrm{d}t} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\mathrm{d}a_{m1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{m2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{mn}(t)}{\mathrm{d}t} \\ \end{matrix}\right] dtdA(t)=[dtdaij(t)]m×n= dtda11(t)dtda21(t)dtdam1(t)dtda12(t)dtda22(t)dtdam2(t)dtda1n(t)dtda2n(t)dtdamn(t)

\qquad

  • m = 1 m=1 m=1 时,矩阵 A ( t ) = [ a 1 ( t ) , a 2 ( t ) , ⋯   , a n ( t ) ] \boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_n(t)] A(t)=[a1(t),a2(t),,an(t)] 为(行)向量值函数

    d A ( t ) d t = [ d a j ( t ) d t ] 1 × n = [ d a 1 ( t ) d t d a 2 ( t ) d t ⋯ d a n ( t ) d t ] 1 × n \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{j}(t)}{\mathrm{d}t}\right]_{1\times n}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} & \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} & \cdots & \dfrac{\mathrm{d}a_{n}(t)}{\mathrm{d}t} \\ \end{matrix}\right]_{1\times n} dtdA(t)=[dtdaj(t)]1×n=[dtda1(t)dtda2(t)dtdan(t)]1×n

    \qquad
  • n = 1 n=1 n=1 时,矩阵 A ( t ) = [ a 1 ( t ) , a 2 ( t ) , ⋯   , a m ( t ) ] T \boldsymbol A(t)=[a_1(t),a_2(t),\cdots,a_m(t)]^T A(t)=[a1(t),a2(t),,am(t)]T 为(列)向量值函数

    d A ( t ) d t = [ d a i ( t ) d t ] m × 1 = [ d a 1 ( t ) d t d a 2 ( t ) d t ⋮ d a m ( t ) d t ] m × 1 \qquad\qquad \dfrac{\mathrm{d}\boldsymbol A(t)}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}a_{i}(t)}{\mathrm{d}t}\right]_{m\times 1}=\left[\begin{matrix} \dfrac{\mathrm{d}a_{1}(t)}{\mathrm{d}t} \\ \\ \dfrac{\mathrm{d}a_{2}(t)}{\mathrm{d}t} \\ \\ \vdots\\ \\ \dfrac{\mathrm{d}a_{m}(t)}{\mathrm{d}t}\\ \end{matrix}\right]_{m\times 1} dtdA(t)=[dtdai(t)]m×1= dtda1(t)dtda2(t)dtdam(t) m×1

\qquad

2.多元函数对矩阵的导数

\qquad 设矩阵 X = [ x i j ] m × n \bold X=[x_{ij}]_{m\times n} X=[xij]m×n,考虑该矩阵的 m n mn mn 元函数 f ( X ) = f ( x 11 , x 12 , ⋯   , x m 1 , x m 2 , ⋯   , x m n ) f(\bold X)=f(x_{11},x_{12},\cdots,x_{m1},x_{m2},\cdots,x_{mn}) f(X)=f(x11,x12,,xm1,xm2,,xmn), 那么 f ( X ) f(\bold X) f(X) 对矩阵 X \bold X X 的导数定义为:

d f ( X ) d X = [ ∂ f ∂ x i j ] m × n = [ ∂ f ∂ x 11 ∂ f ∂ x 12 ⋯ ∂ f ∂ x 1 n ∂ f ∂ x 21 ∂ f ∂ x 22 ⋯ ∂ f ∂ x 2 n ⋮ ⋮ ⋯ ⋮ ∂ f ∂ x m 1 ∂ f ∂ x m 2 ⋯ ∂ f ∂ x m n ] \qquad\qquad \dfrac{\mathrm{d}f(\bold X)}{\mathrm{d}\bold X}=\left[\dfrac{\partial f}{\partial x_{ij}}\right]_{m\times n}=\left[\begin{matrix} \dfrac{\partial f}{\partial x_{11}} & \dfrac{\partial f}{\partial x_{12}} & \cdots & \dfrac{\partial f}{\partial x_{1n}} \\ \\ \dfrac{\partial f}{\partial x_{21}} & \dfrac{\partial f}{\partial x_{22}} & \cdots & \dfrac{\partial f}{\partial x_{2n}} \\ \\ \vdots & \vdots & \cdots & \vdots \\ \\ \dfrac{\partial f}{\partial x_{m1}} & \dfrac{\partial f}{\partial x_{m2}} & \cdots & \dfrac{\partial f}{\partial x_{mn}} \\ \end{matrix}\right] dXdf(X)=[xijf]m×n= x11fx21fxm1fx12fx22fxm2fx1nfx2nfxmnf

\qquad

3.多元函数对(列)向量的导数

\qquad n n n 维(列)向量 x = [ x 1 , x 2 , ⋯   , x n ] T \boldsymbol x=[x_1,x_2,\cdots,x_n]^T x=[x1,x2,,xn]T,考虑该向量的 n n n 元函数 f ( x ) = f ( x 1 , x 2 , ⋯   , x n ) f(\boldsymbol x)=f(x_{1},x_{2},\cdots,x_{n}) f(x)=f(x1,x2,,xn),那么:

d f ( x ) d x = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] T = [ ∂ f ∂ x 1 ∂ f ∂ x 2 ⋮ ∂ f ∂ x n ] \qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right]^T=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \dfrac{\partial f}{\partial x_2}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right] dxdf(x)=[x1f,x2f,,xnf]T= x1fx2fxnf
,即: f ( x ) f(\boldsymbol x) f(x)梯度 ∇ f ( x ) = d f ( x ) d x \nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x} f(x)=dxdf(x)

d f ( x ) d x T = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] \qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right] dxTdf(x)=[x1f,x2f,,xnf],即: f ( x ) f(\boldsymbol x) f(x)梯度的转置 ∇ T f ( x ) = d f ( x ) d x T \nabla^T f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T} Tf(x)=dxTdf(x)
\qquad

\qquad 因此 ∇ f ( x ) = d f ( x ) d x = [ d f ( x ) d x T ] T \qquad\nabla f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right]^T f(x)=dxdf(x)=[dxTdf(x)]T
\qquad

常用公式

( 1 ) \qquad(1) (1) 海塞 (Hessian) \text{(Hessian)} (Hessian) 矩阵:

\qquad   ∇ T { ∇ f ( x ) } = d d x T ( d f ( x ) d x ) \nabla^T \{\nabla f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}\right) T{f(x)}=dxTd(dxdf(x)) 或  ∇ { ∇ T f ( x ) } = d d x ( d f ( x ) d x T ) \nabla \{\nabla^T f(\boldsymbol x)\}=\dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x}\left(\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}\right) {Tf(x)}=dxd(dxTdf(x))

\qquad
d d x T ( d f d x ) = [ ∂ 2 f ∂ x 1 2 ∂ 2 f ∂ x 1 ∂ x 2 ⋯ ∂ 2 f ∂ x 1 ∂ x n ∂ 2 f ∂ x 2 ∂ x 1 ∂ 2 f ∂ x 2 2 ⋯ ∂ 2 f ∂ x 2 ∂ x n ⋮ ⋮ ⋱ ⋮ ∂ 2 f ∂ x n ∂ x 1 ∂ 2 f ∂ x n ∂ x 2 ⋯ ∂ 2 f ∂ x n 2 ] \qquad\qquad\qquad \dfrac{\mathrm{d}}{\mathrm{d}\boldsymbol x^T}\left(\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right)=\left[\begin{matrix} \dfrac{\partial^2 f}{\partial x_1^2} & \dfrac{\partial^2 f}{\partial x_1\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_1\partial x_n} \\ \\ \dfrac{\partial^2 f}{\partial x_2\partial x_1} & \dfrac{\partial^2 f}{\partial x_2^2} & \cdots & \dfrac{\partial^2 f}{\partial x_2\partial x_n} \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \dfrac{\partial^2 f}{\partial x_n\partial x_1} & \dfrac{\partial^2 f}{\partial x_n\partial x_2} & \cdots & \dfrac{\partial^2 f}{\partial x_n^2} \\ \end{matrix}\right] dxTd(dxdf)= x122fx2x12fxnx12fx1x22fx222fxnx22fx1xn2fx2xn2fxn22f
\qquad

( 2 ) \qquad(2) (2) 二次函数 f ( x ) = x T A x f(\boldsymbol x)=\boldsymbol x^T \boldsymbol A \boldsymbol x f(x)=xTAx 的导数为 d f ( x ) d x = ( A + A T ) x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=(\boldsymbol A+\boldsymbol A^T )\boldsymbol x dxdf(x)=(A+AT)x

\quad    若 A = [ a i j ] n × n \boldsymbol A=[a_{ij}]_{n\times n} A=[aij]n×n对称矩阵,那么 d f ( x ) d x = 2 A x \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=2\boldsymbol A \boldsymbol x dxdf(x)=2Ax

\qquad   证明:
f ( x ) = x T A x = ∑ i = 1 n ∑ j = 1 n a i j x i x j = x 1 ∑ j = 1 n a 1 j x j + x 2 ∑ j = 1 n a 2 j x j + ⋯ + x k ∑ j = 1 n a k j x j + ⋯ + x n ∑ j = 1 n a n j x j \qquad\qquad\qquad \begin{aligned}f(\boldsymbol x)&=\boldsymbol x^T \boldsymbol A \boldsymbol x=\displaystyle\sum_{i=1}^{n}\displaystyle\sum_{j=1}^{n}a_{ij}x_ix_j \\ &=x_1\displaystyle\sum_{j=1}^{n}a_{1j}x_j +x_2\displaystyle\sum_{j=1}^{n}a_{2j}x_j+\cdots +x_k\displaystyle\sum_{j=1}^{n}a_{kj}x_j+\cdots+x_n\displaystyle\sum_{j=1}^{n}a_{nj}x_j \\ \end{aligned} f(x)=xTAx=i=1nj=1naijxixj=x1j=1na1jxj+x2j=1na2jxj++xkj=1nakjxj++xnj=1nanjxj

∂ f ∂ x k = x 1 a 1 k + x 2 a 2 k + ⋯ + ( ∑ j = 1 n a k j x j + x k a k k ) + ⋯ + x n a n k = ( x 1 a 1 k + x 2 a 2 k + ⋯ + x k a k k + ⋯ + x n a n k ) + ∑ j = 1 n a k j x j = ∑ i = 1 n a i k x i + ∑ j = 1 n a k j x j \qquad\qquad\qquad \begin{aligned}\dfrac{\partial f}{\partial x_k}&=x_1a_{1k}+x_2a_{2k}+\cdots+\left(\displaystyle\sum_{j=1}^{n}a_{kj}x_j+x_ka_{kk}\right)+\cdots+x_na_{nk}\\ &=(x_1a_{1k}+x_2a_{2k}+\cdots+x_ka_{kk}+\cdots+x_na_{nk}) +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \\ &=\displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j \end{aligned} xkf=x1a1k+x2a2k++(j=1nakjxj+xkakk)++xnank=(x1a1k+x2a2k++xkakk++xnank)+j=1nakjxj=i=1naikxi+j=1nakjxj

d f ( x ) d x = [ ∂ f ∂ x 1 ⋮ ∂ f ∂ x k ⋮ ∂ f ∂ x n ] = [ ∑ i = 1 n a i 1 x i + ∑ j = 1 n a 1 j x j ⋮ ∑ i = 1 n a i k x i + ∑ j = 1 n a k j x j ⋮ ∑ i = 1 n a i n x i + ∑ j = 1 n a n j x j ] = [ ∑ i = 1 n a i 1 x i ⋮ ∑ i = 1 n a i k x i ⋮ ∑ i = 1 n a i n x i ] + [ ∑ j = 1 n a 1 j x j ⋮ ∑ j = 1 n a k j x j ⋮ ∑ j = 1 n a n j x j ] = A x + A T x = ( A + A T ) x \qquad\qquad\qquad\begin{aligned} \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}&=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i +\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i +\displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i +\displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right]=\left[\begin{matrix}\displaystyle\sum_{i=1}^{n}a_{i1}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{ik}x_i \\ \\ \vdots\\ \\ \displaystyle\sum_{i=1}^{n}a_{in}x_i \end{matrix}\right]+\left[\begin{matrix}\displaystyle\sum_{j=1}^{n}a_{1j}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{kj}x_j\\ \\ \vdots\\ \\ \displaystyle\sum_{j=1}^{n}a_{nj}x_j \end{matrix}\right] \\ &=\boldsymbol A\boldsymbol x+\boldsymbol A^T\boldsymbol x \\ &=(\boldsymbol A +\boldsymbol A^T)\boldsymbol x \\ \end{aligned} dxdf(x)= x1fxkfxnf = i=1nai1xi+j=1na1jxji=1naikxi+j=1nakjxji=1nainxi+j=1nanjxj = i=1nai1xii=1naikxii=1nainxi + j=1na1jxjj=1nakjxjj=1nanjxj =Ax+ATx=(A+AT)x
\qquad

( 3 ) \qquad(3) (3) 线性函数 f ( x ) = b T x f(\boldsymbol x)=\boldsymbol b^T \boldsymbol x f(x)=bTx 的导数为 d f ( x ) d x = b \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\boldsymbol b dxdf(x)=b,或者 d f ( x ) d x T = b T \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T}=\boldsymbol b^T dxTdf(x)=bT

\quad    若假设 b \boldsymbol b b 为变量,由于 b T x = x T b \boldsymbol b^T \boldsymbol x= \boldsymbol x^T \boldsymbol b bTx=xTb,因此 d f ( b ) d b = x \dfrac{\mathrm{d}f(\boldsymbol b)}{\mathrm{d}\boldsymbol b}=\boldsymbol x dbdf(b)=x

\qquad  证明:  f ( x ) = b T x = ∑ i = 1 n b i x i f(\boldsymbol x) =\boldsymbol b^T \boldsymbol x=\displaystyle\sum_{i=1}^{n}b_ix_i f(x)=bTx=i=1nbixi

d f ( x ) d x = [ ∂ f ∂ x 1 ⋮ ∂ f ∂ x k ⋮ ∂ f ∂ x n ] = [ b 1 ⋮ b k ⋮ b n ] = b \qquad\qquad\qquad \dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x}=\left[\begin{matrix}\dfrac{\partial f}{\partial x_1}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_k}\\ \\ \vdots\\ \\ \dfrac{\partial f}{\partial x_n}\end{matrix}\right]= \left[\begin{matrix} b_1\\ \\ \vdots\\ \\ b_k\\ \\ \vdots\\ \\ b_n\end{matrix}\right]=\boldsymbol b dxdf(x)= x1fxkfxnf = b1bkbn =b
\qquad

\qquad

4.一元函数关于向量的复合求导

\qquad 向量值函数 x ( t ) = [ x 1 ( t ) , x 2 ( t ) , ⋯   , x n ( t ) ] T \boldsymbol x(t)=[x_1(t),x_2(t),\cdots,x_n(t)]^T x(t)=[x1(t),x2(t),,xn(t)]T,考虑该向量函数的一元函数 f ( x ( t ) ) = f ( x 1 ( t ) , x 2 ( t ) , ⋯   , x n ( t ) ) f(\boldsymbol x(t))=f(x_1(t),x_2(t),\cdots,x_n(t)) f(x(t))=f(x1(t),x2(t),,xn(t)),那么:

d f d t = [ d f d x ] T d x d t = d f d x T d x d t \qquad\qquad\dfrac{\mathrm{d}f}{\mathrm{d}t}=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t} dtdf=[dxdf]Tdtdx=dxTdfdtdx

\qquad 又由于 ∇ T f ( x ) = d f ( x ) d x T \nabla^T f(\boldsymbol x)=\dfrac{\mathrm{d}f(\boldsymbol x)}{\mathrm{d}\boldsymbol x^T} Tf(x)=dxTdf(x),因此 d f d t = d f d x T d x d t = ∇ T f ( x ) d x d t \dfrac{\mathrm{d}f}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\nabla^T f(\boldsymbol x)\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t} dtdf=dxTdfdtdx=Tf(x)dtdx

\qquad 证明:

d f d t = ∂ f ∂ x 1 d x 1 d t + ∂ f ∂ x 2 d x 2 d t + ⋯ + ∂ f ∂ x n d x n d t = [ ∂ f ∂ x 1 , ∂ f ∂ x 2 , ⋯   , ∂ f ∂ x n ] [ d x 1 d t d x 2 d t ⋮ d x n d t ] = [ d f d x ] T d x d t = d f d x T d x d t \qquad\qquad \begin{aligned}\dfrac{\mathrm{d}f}{\mathrm{d}t}&=\dfrac{\partial f}{\partial x_1}\dfrac{\mathrm{d}x_1}{\mathrm{d}t}+\dfrac{\partial f}{\partial x_2}\dfrac{\mathrm{d}x_2}{\mathrm{d}t}+\cdots+\dfrac{\partial f}{\partial x_n}\dfrac{\mathrm{d}x_n}{\mathrm{d}t}\\ &=\left[\dfrac{\partial f}{\partial x_1},\dfrac{\partial f}{\partial x_2},\cdots,\dfrac{\partial f}{\partial x_n}\right] \left[\begin{matrix}\dfrac{\mathrm{d} x_1}{\mathrm{d} t}\\ \\ \dfrac{\mathrm{d} x_2}{\mathrm{d} t}\\ \\ \vdots\\ \\ \dfrac{\mathrm{d} x_n}{\mathrm{d} t}\end{matrix}\right]=\left[\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x}\right]^T\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}=\dfrac{\mathrm{d}f}{\mathrm{d}\boldsymbol x^T}\dfrac{\mathrm{d}\boldsymbol x}{\mathrm{d}t}\\ \end{aligned} dtdf=x1fdtdx1+x2fdtdx2++xnfdtdxn=[x1f,x2f,,xnf] dtdx1dtdx2dtdxn =[dxdf]Tdtdx=dxTdfdtdx
\qquad

5. 泰勒级数

\qquad 首先考虑二维的情况,即 x = [ x 1 , x 2 ] T \boldsymbol x=[x_1,x_2]^T x=[x1,x2]T,那么

f ( x 1 + δ 1 , x 2 + δ 2 ) = f ( x 1 , x 2 ) + ∂ f ∂ x 1 δ 1 + ∂ f ∂ x 2 δ 2 + 1 2 ( ∂ 2 f ∂ x 1 2 δ 1 2 + ∂ 2 f ∂ x 1 ∂ x 2 δ 1 δ 2 + ∂ 2 f ∂ x 2 2 δ 2 2 ) + o ( ∥ δ ∥ 2 ) \qquad\qquad\begin{aligned}f(x_1+\delta_1,x_2+\delta_2)&=f(x_1,x_2)+\dfrac{\partial f}{\partial x_1}\delta_1+\dfrac{\partial f}{\partial x_2}\delta_2\\ &\quad+\dfrac{1}{2}\left( \dfrac{\partial^2 f}{\partial x_1^2}\delta_1^2+\dfrac{\partial^2 f}{\partial x_1\partial x_2}\delta_1\delta_2+\dfrac{\partial^2 f}{\partial x_2^2}\delta_2^2 \right) \\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned} f(x1+δ1,x2+δ2)=f(x1,x2)+x1fδ1+x2fδ2+21(x122fδ12+x1x22fδ1δ2+x222fδ22)+o(δ2)

\qquad 扩展到 n n n 维的情况,即 x = [ x 1 , x 2 , ⋯   , x n ] T \boldsymbol x=[x_1,x_2,\cdots,x_n]^T x=[x1,x2,,xn]T,那么

f ( x 1 + δ 1 , x 2 + δ 2 , ⋯   , x n + δ n ) = f ( x 1 , x 2 , ⋯   , x n ) + ∑ i = 1 n ∂ f ∂ x i δ i + 1 2 ∑ i = 1 n ∑ j = 1 n ∂ 2 f ∂ x i ∂ x j δ i δ j + o ( ∥ δ ∥ 2 ) \qquad\qquad \begin{aligned}f(x_1+\delta_1,x_2+\delta_2,\cdots,x_n+\delta_n)&=f(x_1,x_2,\cdots,x_n)+\displaystyle\sum_{i=1}^n\dfrac{\partial f}{\partial x_i}\delta_i \\ &\quad+\dfrac{1}{2}\displaystyle\sum_{i=1}^n\displaystyle\sum_{j=1}^n\dfrac{\partial^2 f}{\partial x_i\partial x_j}\delta_i\delta_j\\ &\quad+o\left(\Vert\boldsymbol\delta\Vert^2\right) \end{aligned} f(x1+δ1,x2+δ2,,xn+δn)=f(x1,x2,,xn)+i=1nxifδi+21i=1nj=1nxixj2fδiδj+o(δ2)

\qquad
\qquad 写成矩阵的形式:

f ( x + δ ) = f ( x ) + ∇ f ( x ) T δ + 1 2 δ T ∇ 2 f ( x ) δ + o ( ∥ δ ∥ 2 ) \qquad\qquad f(\boldsymbol x+\boldsymbol\delta)=f(\boldsymbol x)+\nabla f(\boldsymbol x)^T\boldsymbol\delta+\dfrac{1}{2}\boldsymbol\delta^T\nabla^2 f(\boldsymbol x)\boldsymbol\delta+o\left(\Vert\boldsymbol\delta\Vert^2\right) f(x+δ)=f(x)+f(x)Tδ+21δT2f(x)δ+o(δ2),其中 δ = [ δ 1 , δ 2 , ⋯   , δ n ] T \boldsymbol\delta=[\delta_1,\delta_2,\cdots,\delta_n]^T δ=[δ1,δ2,,δn]T

\qquad
\qquad 或者,写成向量值函数 f ( x ) f(\boldsymbol x) f(x) 在点 x ˉ \bar{\boldsymbol x} xˉ 的展开形式:

f ( x ) = f ( x ˉ ) + ∇ f ( x ˉ ) T ( x − x ˉ ) + 1 2 ( x − x ˉ ) T ∇ 2 f ( x ˉ ) ( x − x ˉ ) + o ( ∥ x − x ˉ ∥ 2 ) \qquad\qquad f(\boldsymbol x)=f(\bar{\boldsymbol x})+\nabla f(\bar{\boldsymbol x})^T(\boldsymbol x-\bar{\boldsymbol x})+\dfrac{1}{2}(\boldsymbol x-\bar{\boldsymbol x})^T\nabla^2 f(\bar{\boldsymbol x})(\boldsymbol x-\bar{\boldsymbol x})+o\left(\Vert\boldsymbol x-\bar{\boldsymbol x}\Vert^2\right) f(x)=f(xˉ)+f(xˉ)T(xxˉ)+21(xxˉ)T2f(xˉ)(xxˉ)+o(xxˉ2)

\qquad 【注】此处采用 ∇ f ( x ) \nabla f(\boldsymbol x) f(x) 表示梯度,采用 ∇ 2 f ( x ) \nabla^2 f(\boldsymbol x) 2f(x) 表示 hessian \text{hessian} hessian 矩阵(而非 PDE \text{PDE} PDE 中的拉普拉斯算符)。

  • 14
    点赞
  • 82
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
矩阵微分在机器学习、图像处理和最优化等领域的公式推导中经常使用。以下是矩阵微分常用公式推导: 1. 标量对向量的微分: 若 y 是标量函数,x 是列向量,则有 dy/dx = (∂y/∂x₁, ∂y/∂x₂, ..., ∂y/∂xₙ)。 2. 向量对标量的微分: 若 y 是列向量函数,x 是标量,则有 dy/dx = (∂y₁/∂x, ∂y₂/∂x, ..., ∂yₙ/∂x)ᵀ。 3. 向量对向量的微分(雅可比矩阵): 若 y 是列向量函数,x 是列向量,则有 J = (∂y/∂x) = [∂y₁/∂x₁, ∂y₁/∂x₂, ..., ∂y₁/∂xₙ; ∂y₂/∂x₁, ∂y₂/∂x₂, ..., ∂y₂/∂xₙ; ... ∂yₘ/∂x₁, ∂yₘ/∂x₂, ..., ∂yₘ/∂xₙ]。 4. 矩阵对标量的微分: 若 Y 是矩阵函数,x 是标量,则有 dY/dx = (∂Y/∂x) = [∂y₁/∂x, ∂y₂/∂x, ..., ∂yₘ/∂x],其中 yi 表示 Y 的第 i 行。 5. 标量对矩阵微分: 若 y 是标量函数,X 是矩阵,则有 dy/dX = (∂y/∂X) = [∂y/∂X₁, ∂y/∂X₂, ..., ∂y/∂Xₙ],其中 ∂y/∂Xᵢ 表示对矩阵 X 的第 i 个元素求偏导数。 6. 矩阵矩阵微分: 若 Y 是矩阵函数,X 是矩阵,则有 dY/dX = (∂Y/∂X) = [∂y₁/∂X, ∂y₂/∂X, ..., ∂yₘ/∂X],其中 ∂yᵢ/∂X 表示对矩阵 X 的每个元素求偏导数。 以上是矩阵微分常用公式推导。请注意,这只是一些基础的公式,实际应用中可能会有更复杂的情况。如果你有更具体的问题或需要更深入的了解,请提出。
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值