矩阵微分

1. 定义

设有矩阵: A = ( a i j ( θ ) ) p × q = [ a 11 ( θ ) a 12 ( θ ) . . . a 1 q ( θ ) a 21 ( θ ) a 22 ( θ ) . . . a 2 q ( θ ) . . . . . . . . . . . . a p 1 ( θ ) a p 2 ( θ ) . . . a p q ( θ ) ] , θ ∈ R A=(a_{ij}(\theta))_{p\times q}=\left[\begin{matrix} a_{11}(\theta)&a_{12}(\theta)&...&a_{1q}(\theta)\\ a_{21}(\theta)&a_{22}(\theta)&...&a_{2q}(\theta)\\ ...&...&...&...\\ a_{p1}(\theta)&a_{p2}(\theta)&...&a_{pq}(\theta) \end{matrix} \right],\theta\in\mathbb{R} A=(aij(θ))p×q=a11(θ)a21(θ)...ap1(θ)a12(θ)a22(θ)...ap2(θ)............a1q(θ)a2q(θ)...apq(θ),θR
定义矩阵 A A A关于 θ \theta θ的微分:
∂ A ∂ θ = ( ∂ a i j ( θ ) ∂ θ ) p × q = [ ∂ a 11 ( θ ) ∂ θ ∂ a 12 ( θ ) ∂ θ . . . ∂ a 1 q ( θ ) ∂ θ ∂ a 21 ( θ ) ∂ θ ∂ a 22 ( θ ) ∂ θ . . . ∂ a 2 q ( θ ) ∂ θ . . . . . . . . . . . . ∂ a p 1 ( θ ) ∂ θ ∂ a p 2 ( θ ) ∂ θ . . . ∂ a p q ( θ ) ∂ θ ] \frac{\partial A}{\partial \theta}=(\frac{\partial a_{ij}(\theta)}{\partial\theta})_{p\times q}=\left[\begin{matrix} \frac{\partial a_{11}(\theta)}{\partial\theta}&\frac{\partial a_{12}(\theta)}{\partial\theta}&...&\frac{\partial a_{1q}(\theta)}{\partial\theta}\\ \frac{\partial a_{21}(\theta)}{\partial\theta}&\frac{\partial a_{22}(\theta)}{\partial\theta}&...&\frac{\partial a_{2q}(\theta)}{\partial\theta}\\ ...&...&...&...\\ \frac{\partial a_{p1}(\theta)}{\partial\theta}&\frac{\partial a_{p2}(\theta)}{\partial\theta}&...&\frac{\partial a_{pq}(\theta)}{\partial\theta} \end{matrix} \right] θA=(θaij(θ))p×q=θa11(θ)θa21(θ)...θap1(θ)θa12(θ)θa22(θ)...θap2(θ)............θa1q(θ)θa2q(θ)...θapq(θ)

设有向量 a = ( c a i ( θ ) ) 1 ≤ i ≤ k = ( a 1 ( θ ) , . . . , a k ( θ ) ) T , θ = ( θ 1 , . . . , θ l ) T ∈ R l a=(_ca_i(\theta))_{1\leq i\leq k}=(a_1(\theta),...,a_k(\theta))^T,\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l a=(cai(θ))1ik=(a1(θ),...,ak(θ))T,θ=(θ1,...,θl)TRl
定义 a a a关于 θ \theta θ的微分为:
∂ a ∂ θ ′ = [ a 1 ( θ ) ∂ θ 1 a 1 ( θ ) ∂ θ 2 . . . a 1 ( θ ) ∂ θ l a 2 ( θ ) ∂ θ 1 a 2 ( θ ) ∂ θ 2 . . . a 2 ( θ ) ∂ θ l . . . . . . . . . . . . a k ( θ ) ∂ θ 1 a k ( θ ) ∂ θ 2 . . . a k ( θ ) ∂ θ l ] k × l \frac{\partial a}{\partial \theta'}=\left[\begin{matrix} \frac{a_1(\theta)}{\partial\theta_1}&\frac{a_1(\theta)}{\partial\theta_2}&...&\frac{a_1(\theta)}{\partial\theta_l}\\ \frac{a_2(\theta)}{\partial\theta_1}&\frac{a_2(\theta)}{\partial\theta_2}&...&\frac{a_2(\theta)}{\partial\theta_l}\\ ...&...&...&...\\ \frac{a_k(\theta)}{\partial\theta_1}&\frac{a_k(\theta)}{\partial\theta_2}&...&\frac{a_k(\theta)}{\partial\theta_l} \end{matrix} \right]_{k\times l} θa=θ1a1(θ)θ1a2(θ)...θ1ak(θ)θ2a1(θ)θ2a2(θ)...θ2ak(θ)............θla1(θ)θla2(θ)...θlak(θ)k×l
∂ a ′ ∂ θ = ( ∂ a ∂ θ ′ ) ′ = [ a 1 ( θ ) ∂ θ 1 a 2 ( θ ) ∂ θ 1 . . . a k ( θ ) ∂ θ l a 1 ( θ ) ∂ θ 2 a 2 ( θ ) ∂ θ 2 . . . a k ( θ ) ∂ θ 2 . . . . . . . . . . . . a 1 ( θ ) ∂ θ l a 2 ( θ ) ∂ θ l . . . a k ( θ ) ∂ θ l ] l × k \frac{\partial a'}{\partial \theta}=(\frac{\partial a}{\partial \theta'})'=\left[\begin{matrix} \frac{a_1(\theta)}{\partial\theta_1}&\frac{a_2(\theta)}{\partial\theta_1}&...&\frac{a_k(\theta)}{\partial\theta_l}\\ \frac{a_1(\theta)}{\partial\theta_2}&\frac{a_2(\theta)}{\partial\theta_2}&...&\frac{a_k(\theta)}{\partial\theta_2}\\ ...&...&...&...\\ \frac{a_1(\theta)}{\partial\theta_l}&\frac{a_2(\theta)}{\partial\theta_l}&...&\frac{a_k(\theta)}{\partial\theta_l} \end{matrix} \right]_{l\times k} θa=(θa)=θ1a1(θ)θ2a1(θ)...θla1(θ)θ1a2(θ)θ2a2(θ)...θla2(θ)............θlak(θ)θ2ak(θ)...θlak(θ)l×k

2. 性质

(i)(Innerproduct)设 a = ( a 1 ( θ ) , . . . , a k ( θ ) ) T , b = ( b 1 ( θ ) , . . . , b k ( θ ) ) T , θ = ( θ 1 , . . . , θ l ) T ∈ R l a=(a_1(\theta),...,a_k(\theta))^T,b=(b_1(\theta),...,b_k(\theta))^T,\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l a=(a1(θ),...,ak(θ))T,b=(b1(θ),...,bk(θ))T,θ=(θ1,...,θl)TRl,则 ∂ ( a ′ b ) ∂ θ = ( ∂ a ′ ∂ θ ) b + ( ∂ b ′ ∂ θ ) a \frac{\partial (a'b)}{\partial \theta}=(\frac{\partial a'}{\partial\theta})b+(\frac{\partial b'}{\partial\theta})a θ(ab)=(θa)b+(θb)a
证明: ∂ a ′ ( θ ) b ( θ ) ∂ θ j = ∂ ∑ i = 1 k a i ( θ ) b i ( θ ) ∂ θ j = ∑ i = 1 k ∂ a i ( θ ) b j ( θ ) ∂ θ j = ∑ i = 1 k [ ∂ a i ( θ ) ∂ θ j b i ( θ ) + a i ( θ ) ∂ b i ( θ ) ∂ θ j ] = ∑ i = 1 k ( ∂ a i ( θ ) ∂ θ j b i ( θ ) ) + ∑ i = 1 k ( a i ( θ ) ∂ b i ( θ ) ∂ θ j ) = ∂ a ′ ( θ ) ∂ θ j b ( θ ) + ∂ b ′ ( θ ) ∂ θ j a ( θ ) \begin{aligned} \frac{\partial a^{\prime}(\theta)b(\theta)}{\partial \theta_{j}} &=\frac{\partial \sum_{i=1}^{k} a_{i} (\theta) b_{i}(\theta)}{\partial \theta_{j}}=\sum_{i=1}^{k}\frac{\partial a_{i}(\theta) b_j (\theta)}{\partial \theta_{j}} \\ &=\sum_{i=1}^{k}\left[\frac{\partial a_{i}(\theta)}{\partial \theta_{j}} b_{i}(\theta)+a_{i}(\theta) \frac{\partial b_{i}(\theta)}{\partial \theta_{j}}\right] \\ &=\sum_{i=1}^{k}\left(\frac{\partial a_{i} (\theta)}{\partial \theta_{j}} b_{i}(\theta)\right)+\sum_{i=1}^{k}\left(a_{i}(\theta) \frac{\partial b_{i}(\theta)}{\partial \theta_{j}}\right) \\ &=\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}} b(\theta)+\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}} a(\theta) \end{aligned} θja(θ)b(θ)=θji=1kai(θ)bi(θ)=i=1kθjai(θ)bj(θ)=i=1k[θjai(θ)bi(θ)+ai(θ)θjbi(θ)]=i=1k(θjai(θ)bi(θ))+i=1k(ai(θ)θjbi(θ))=θja(θ)b(θ)+θjb(θ)a(θ)
∂ ( a ′ b ) ∂ θ = ( c ∂ a ′ ( θ ) b ( θ ) ∂ θ j ) 1 ≤ j ≤ l = ( c ∂ a ′ ( θ ) ∂ θ j b ( θ ) + ∂ b ′ ( θ ) ∂ θ j a ( θ ) ) = ( c ∂ a ′ ( θ ) ∂ θ j ) b ( θ ) + ( c ∂ b ′ ( θ ) ∂ θ j ) a ( θ ) = ( ∂ a ′ ∂ θ ) b + ( ∂ b ′ ∂ θ ) a \begin{aligned} \frac{\partial (a'b)}{\partial \theta}&=\left(_c\frac{\partial a^{\prime}(\theta)b(\theta)}{\partial \theta_{j}}\right)_{1\leq j\leq l}=\left(_c\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}} b(\theta)+\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}} a(\theta)\right)\\&=\left(_c\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}}\right)b(\theta)+\left(_c\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}}\right)a(\theta) \\&=(\frac{\partial a'}{\partial\theta})b+(\frac{\partial b'}{\partial\theta})a \end{aligned} θ(ab)=(cθja(θ)b(θ))1jl=(cθja(θ)b(θ)+θjb(θ)a(θ))=(cθja(θ))b(θ)+(cθjb(θ))a(θ)=(θa)b+(θb)a

(ii)(Quadratic form)设 x = ( x 1 , . . . , x k ) T x=(x_1,...,x_k)^T x=(x1,...,xk)T A = ( a i j ) k × k A=(a_{ij})_{k\times k} A=(aij)k×k x x x无关,则 ∂ x ′ A x ∂ x = A x + A ′ x \frac{\partial x'Ax}{\partial x}=Ax+A'x xxAx=Ax+Ax
证明: ∂ x ′ ∂ x = I \frac{\partial x'}{\partial x}=I xx=I
∂ ( A x ) ′ ∂ x = ( j , l ∂ ∑ i = 1 k a l i x i ∂ x j ) k × k = ( j , l a l j ) k × k = A ′ \frac{\partial (Ax)'}{\partial x}=\left(_{j,l}\frac{\partial\sum_{i=1}^{k}a_{li}x_i}{\partial x_j}\right)_{k\times k}=\left(_{j,l}a_{lj}\right)_{k\times k}=A' x(Ax)=(j,lxji=1kalixi)k×k=(j,lalj)k×k=A
∂ x ′ A x ∂ x = ∂ x ′ ∂ x A x + ∂ ( A x ) ′ ∂ x x = A x + A ′ x \frac{\partial x'Ax}{\partial x}=\frac{\partial x'}{\partial x}Ax+\frac{\partial(Ax)'}{\partial x}x=Ax+A'x xxAx=xxAx+x(Ax)x=Ax+Ax

(iii)(Inverse) A = ( a i j ( θ ) ) k × k A=(a_{ij}(\theta))_{k\times k} A=(aij(θ))k×k非奇异, θ = ( θ 1 , . . . , θ l ) T ∈ R l \theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l θ=(θ1,...,θl)TRl,则对任意 θ m , 1 ≤ m ≤ l \theta_m,1\leq m\leq l θm,1ml有:
∂ A − 1 ∂ θ m = − A − 1 ( ∂ A ∂ θ m ) A − 1 \frac{\partial A^{-1}}{\partial \theta_m}=-A^{-1}(\frac{\partial A}{\partial \theta_m})A^{-1} θmA1=A1(θmA)A1
证明:首先说明,对 A = ( a i j ( ω ) ) k × k , B = ( b i j ( ω ) ) k × k , ω ∈ R A=(a_{ij}(\omega))_{k\times k},B=(b_{ij}(\omega))_{k\times k},\omega\in\mathbb{R} A=(aij(ω))k×k,B=(bij(ω))k×k,ωR,有 ∂ A B ∂ ω = ∂ A ∂ ω B + A ∂ B ∂ ω \frac{\partial AB}{\partial \omega}=\frac{\partial A}{\partial \omega}B+A\frac{\partial B}{\partial \omega} ωAB=ωAB+AωB

∂ ∑ n = 1 k a i n b l n ∂ ω = ∑ n = 1 k ∂ a i n b n j ∂ ω = ∑ n = 1 k [ ∂ a i n ∂ ω b n j + a i n ∂ b n j ∂ ω ] = ∑ n = 1 k [ ∂ a i n ∂ ω b n j ] + ∑ n = 1 k [ a i n ∂ b n j ∂ ω ] \begin{aligned} \frac{\partial\sum_{n=1}^{k}a_{in}b_{ln}}{\partial \omega}&=\sum_{n=1}^{k}\frac{\partial a_{in}b_{nj}}{\partial \omega}=\sum_{n=1}^{k}\left[ \frac{\partial a_{in}}{\partial \omega}b_{nj}+a_{in}\frac{\partial b_{nj}}{\partial \omega}\right] \\&=\sum_{n=1}^{k}\left[\frac{\partial a_{in}}{\partial\omega}b_{nj}\right]+\sum_{n=1}^{k}\left[a_{in}\frac{\partial b_{nj}}{\partial\omega}\right] \end{aligned} ωn=1kainbln=n=1kωainbnj=n=1k[ωainbnj+ainωbnj]=n=1k[ωainbnj]+n=1k[ainωbnj]
∂ A B ∂ ω = ( i j ∑ n = 1 k [ ∂ a i n ∂ ω b n j ] ) + ( i j ∑ n = 1 k [ a i n ∂ b n j ∂ ω ] ) = ∂ A ∂ ω B + A ∂ B ∂ ω \frac{\partial AB}{\partial \omega}=\left(_{ij}\sum_{n=1}^{k}\left[\frac{\partial a_{in}}{\partial\omega}b_{nj}\right]\right)+\left(_{ij}\sum_{n=1}^{k}\left[a_{in}\frac{\partial b_{nj}}{\partial\omega}\right]\right) =\frac{\partial A}{\partial \omega}B+A\frac{\partial B}{\partial \omega} ωAB=(ijn=1k[ωainbnj])+(ijn=1k[ainωbnj])=ωAB+AωB
A − 1 A = I A^{-1}A=I A1A=I两边关于 θ m \theta_m θm求偏导: ∂ A − 1 ∂ θ m A + A − 1 ∂ A ∂ θ m = 0 \frac{\partial A^{-1}}{\partial \theta_m}A+A^{-1}\frac{\partial A}{\partial \theta_m}=0 θmA1A+A1θmA=0
立得结论。

(iv)(log-determinant) A = ( a i j ( θ ) ) k × k A=(a_{ij}(\theta))_{k\times k} A=(aij(θ))k×k正定, θ = ( θ 1 , . . . , θ l ) T ∈ R l \theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l θ=(θ1,...,θl)TRl,则对任意 θ m , 1 ≤ m ≤ l \theta_m,1\leq m\leq l θm,1ml有:
∂ l o g ∣ A ∣ ∂ θ m = t r ( A − 1 ∂ A ∂ θ m ) \frac{\partial log|A|}{\partial \theta_m}=tr(A^{-1}\frac{\partial A}{\partial\theta_m}) θmlogA=tr(A1θmA)
证明:记元素 a i j a_{ij} aij的代数余子式 A i j A_{ij} Aij
∂ l o g ∣ A ∣ ∂ θ m = 1 ∣ A ∣ ∂ ∣ A ∣ ∂ θ m = 1 ∣ A ∣ ∑ i ∑ j ∂ ∣ A ∣ ∂ a i j ∂ a i j ∂ θ m = 1 ∣ A ∣ ∑ i ∑ j A i j ∂ a i j ∂ θ m = ∑ j ( ∑ i A i j ∣ A ∣ ∂ a i j ∂ θ m ) = ∑ j ( ∑ i ( A − 1 ) j i ∂ a i j ∂ θ m ) = ∑ j ( A − 1 ∂ A ∂ θ m ) j j = t r ( A − 1 ∂ A ∂ θ m ) \begin{aligned} \frac{\partial log|A|}{\partial \theta_m}&=\frac{1}{|A|}\frac{\partial|A|}{\partial\theta_m}=\frac{1}{|A|}\sum_{i}\sum_{j}\frac{\partial|A|}{\partial a_{ij}}\frac{\partial a_{ij}}{\partial\theta_m} \\&=\frac{1}{|A|}\sum_{i}\sum_{j}A_{ij}\frac{\partial a_{ij}}{\partial\theta_m}=\sum_{j}\left(\sum_{i}\frac{A_{ij}}{|A|}\frac{\partial a_{ij}}{\partial\theta_m}\right) \\&=\sum_{j}\left(\sum_{i}(A^{-1})_{ji}\frac{\partial a_{ij}}{\partial\theta_m}\right) \\&=\sum_{j}(A^{-1}\frac{\partial A}{\partial\theta_m})_{jj}\\&=tr(A^{-1}\frac{\partial A}{\partial\theta_m}) \end{aligned} θmlogA=A1θmA=A1ijaijAθmaij=A1ijAijθmaij=j(iAAijθmaij)=j(i(A1)jiθmaij)=j(A1θmA)jj=tr(A1θmA)
其中,第三个等号由 ∣ A ∣ = ∑ n a i n A i n |A|=\sum{n}a_{in}A_{in} A=nainAin,第五个等号是因为 A − 1 = A ∗ ∣ A ∣ , A ∗ = ( i , j A j i ) A^{-1}=\frac{A^*}{|A|},A^*=(_{i,j}A_{ji}) A1=AA,A=(i,jAji)

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值