1. 定义
设有矩阵:
A
=
(
a
i
j
(
θ
)
)
p
×
q
=
[
a
11
(
θ
)
a
12
(
θ
)
.
.
.
a
1
q
(
θ
)
a
21
(
θ
)
a
22
(
θ
)
.
.
.
a
2
q
(
θ
)
.
.
.
.
.
.
.
.
.
.
.
.
a
p
1
(
θ
)
a
p
2
(
θ
)
.
.
.
a
p
q
(
θ
)
]
,
θ
∈
R
A=(a_{ij}(\theta))_{p\times q}=\left[\begin{matrix} a_{11}(\theta)&a_{12}(\theta)&...&a_{1q}(\theta)\\ a_{21}(\theta)&a_{22}(\theta)&...&a_{2q}(\theta)\\ ...&...&...&...\\ a_{p1}(\theta)&a_{p2}(\theta)&...&a_{pq}(\theta) \end{matrix} \right],\theta\in\mathbb{R}
A=(aij(θ))p×q=⎣⎢⎢⎡a11(θ)a21(θ)...ap1(θ)a12(θ)a22(θ)...ap2(θ)............a1q(θ)a2q(θ)...apq(θ)⎦⎥⎥⎤,θ∈R
定义矩阵
A
A
A关于
θ
\theta
θ的微分:
∂
A
∂
θ
=
(
∂
a
i
j
(
θ
)
∂
θ
)
p
×
q
=
[
∂
a
11
(
θ
)
∂
θ
∂
a
12
(
θ
)
∂
θ
.
.
.
∂
a
1
q
(
θ
)
∂
θ
∂
a
21
(
θ
)
∂
θ
∂
a
22
(
θ
)
∂
θ
.
.
.
∂
a
2
q
(
θ
)
∂
θ
.
.
.
.
.
.
.
.
.
.
.
.
∂
a
p
1
(
θ
)
∂
θ
∂
a
p
2
(
θ
)
∂
θ
.
.
.
∂
a
p
q
(
θ
)
∂
θ
]
\frac{\partial A}{\partial \theta}=(\frac{\partial a_{ij}(\theta)}{\partial\theta})_{p\times q}=\left[\begin{matrix} \frac{\partial a_{11}(\theta)}{\partial\theta}&\frac{\partial a_{12}(\theta)}{\partial\theta}&...&\frac{\partial a_{1q}(\theta)}{\partial\theta}\\ \frac{\partial a_{21}(\theta)}{\partial\theta}&\frac{\partial a_{22}(\theta)}{\partial\theta}&...&\frac{\partial a_{2q}(\theta)}{\partial\theta}\\ ...&...&...&...\\ \frac{\partial a_{p1}(\theta)}{\partial\theta}&\frac{\partial a_{p2}(\theta)}{\partial\theta}&...&\frac{\partial a_{pq}(\theta)}{\partial\theta} \end{matrix} \right]
∂θ∂A=(∂θ∂aij(θ))p×q=⎣⎢⎢⎢⎡∂θ∂a11(θ)∂θ∂a21(θ)...∂θ∂ap1(θ)∂θ∂a12(θ)∂θ∂a22(θ)...∂θ∂ap2(θ)............∂θ∂a1q(θ)∂θ∂a2q(θ)...∂θ∂apq(θ)⎦⎥⎥⎥⎤
设有向量
a
=
(
c
a
i
(
θ
)
)
1
≤
i
≤
k
=
(
a
1
(
θ
)
,
.
.
.
,
a
k
(
θ
)
)
T
,
θ
=
(
θ
1
,
.
.
.
,
θ
l
)
T
∈
R
l
a=(_ca_i(\theta))_{1\leq i\leq k}=(a_1(\theta),...,a_k(\theta))^T,\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l
a=(cai(θ))1≤i≤k=(a1(θ),...,ak(θ))T,θ=(θ1,...,θl)T∈Rl
定义
a
a
a关于
θ
\theta
θ的微分为:
∂
a
∂
θ
′
=
[
a
1
(
θ
)
∂
θ
1
a
1
(
θ
)
∂
θ
2
.
.
.
a
1
(
θ
)
∂
θ
l
a
2
(
θ
)
∂
θ
1
a
2
(
θ
)
∂
θ
2
.
.
.
a
2
(
θ
)
∂
θ
l
.
.
.
.
.
.
.
.
.
.
.
.
a
k
(
θ
)
∂
θ
1
a
k
(
θ
)
∂
θ
2
.
.
.
a
k
(
θ
)
∂
θ
l
]
k
×
l
\frac{\partial a}{\partial \theta'}=\left[\begin{matrix} \frac{a_1(\theta)}{\partial\theta_1}&\frac{a_1(\theta)}{\partial\theta_2}&...&\frac{a_1(\theta)}{\partial\theta_l}\\ \frac{a_2(\theta)}{\partial\theta_1}&\frac{a_2(\theta)}{\partial\theta_2}&...&\frac{a_2(\theta)}{\partial\theta_l}\\ ...&...&...&...\\ \frac{a_k(\theta)}{\partial\theta_1}&\frac{a_k(\theta)}{\partial\theta_2}&...&\frac{a_k(\theta)}{\partial\theta_l} \end{matrix} \right]_{k\times l}
∂θ′∂a=⎣⎢⎢⎢⎡∂θ1a1(θ)∂θ1a2(θ)...∂θ1ak(θ)∂θ2a1(θ)∂θ2a2(θ)...∂θ2ak(θ)............∂θla1(θ)∂θla2(θ)...∂θlak(θ)⎦⎥⎥⎥⎤k×l
∂
a
′
∂
θ
=
(
∂
a
∂
θ
′
)
′
=
[
a
1
(
θ
)
∂
θ
1
a
2
(
θ
)
∂
θ
1
.
.
.
a
k
(
θ
)
∂
θ
l
a
1
(
θ
)
∂
θ
2
a
2
(
θ
)
∂
θ
2
.
.
.
a
k
(
θ
)
∂
θ
2
.
.
.
.
.
.
.
.
.
.
.
.
a
1
(
θ
)
∂
θ
l
a
2
(
θ
)
∂
θ
l
.
.
.
a
k
(
θ
)
∂
θ
l
]
l
×
k
\frac{\partial a'}{\partial \theta}=(\frac{\partial a}{\partial \theta'})'=\left[\begin{matrix} \frac{a_1(\theta)}{\partial\theta_1}&\frac{a_2(\theta)}{\partial\theta_1}&...&\frac{a_k(\theta)}{\partial\theta_l}\\ \frac{a_1(\theta)}{\partial\theta_2}&\frac{a_2(\theta)}{\partial\theta_2}&...&\frac{a_k(\theta)}{\partial\theta_2}\\ ...&...&...&...\\ \frac{a_1(\theta)}{\partial\theta_l}&\frac{a_2(\theta)}{\partial\theta_l}&...&\frac{a_k(\theta)}{\partial\theta_l} \end{matrix} \right]_{l\times k}
∂θ∂a′=(∂θ′∂a)′=⎣⎢⎢⎢⎡∂θ1a1(θ)∂θ2a1(θ)...∂θla1(θ)∂θ1a2(θ)∂θ2a2(θ)...∂θla2(θ)............∂θlak(θ)∂θ2ak(θ)...∂θlak(θ)⎦⎥⎥⎥⎤l×k
2. 性质
(i)(Innerproduct)设
a
=
(
a
1
(
θ
)
,
.
.
.
,
a
k
(
θ
)
)
T
,
b
=
(
b
1
(
θ
)
,
.
.
.
,
b
k
(
θ
)
)
T
,
θ
=
(
θ
1
,
.
.
.
,
θ
l
)
T
∈
R
l
a=(a_1(\theta),...,a_k(\theta))^T,b=(b_1(\theta),...,b_k(\theta))^T,\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l
a=(a1(θ),...,ak(θ))T,b=(b1(θ),...,bk(θ))T,θ=(θ1,...,θl)T∈Rl,则
∂
(
a
′
b
)
∂
θ
=
(
∂
a
′
∂
θ
)
b
+
(
∂
b
′
∂
θ
)
a
\frac{\partial (a'b)}{\partial \theta}=(\frac{\partial a'}{\partial\theta})b+(\frac{\partial b'}{\partial\theta})a
∂θ∂(a′b)=(∂θ∂a′)b+(∂θ∂b′)a
证明:
∂
a
′
(
θ
)
b
(
θ
)
∂
θ
j
=
∂
∑
i
=
1
k
a
i
(
θ
)
b
i
(
θ
)
∂
θ
j
=
∑
i
=
1
k
∂
a
i
(
θ
)
b
j
(
θ
)
∂
θ
j
=
∑
i
=
1
k
[
∂
a
i
(
θ
)
∂
θ
j
b
i
(
θ
)
+
a
i
(
θ
)
∂
b
i
(
θ
)
∂
θ
j
]
=
∑
i
=
1
k
(
∂
a
i
(
θ
)
∂
θ
j
b
i
(
θ
)
)
+
∑
i
=
1
k
(
a
i
(
θ
)
∂
b
i
(
θ
)
∂
θ
j
)
=
∂
a
′
(
θ
)
∂
θ
j
b
(
θ
)
+
∂
b
′
(
θ
)
∂
θ
j
a
(
θ
)
\begin{aligned} \frac{\partial a^{\prime}(\theta)b(\theta)}{\partial \theta_{j}} &=\frac{\partial \sum_{i=1}^{k} a_{i} (\theta) b_{i}(\theta)}{\partial \theta_{j}}=\sum_{i=1}^{k}\frac{\partial a_{i}(\theta) b_j (\theta)}{\partial \theta_{j}} \\ &=\sum_{i=1}^{k}\left[\frac{\partial a_{i}(\theta)}{\partial \theta_{j}} b_{i}(\theta)+a_{i}(\theta) \frac{\partial b_{i}(\theta)}{\partial \theta_{j}}\right] \\ &=\sum_{i=1}^{k}\left(\frac{\partial a_{i} (\theta)}{\partial \theta_{j}} b_{i}(\theta)\right)+\sum_{i=1}^{k}\left(a_{i}(\theta) \frac{\partial b_{i}(\theta)}{\partial \theta_{j}}\right) \\ &=\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}} b(\theta)+\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}} a(\theta) \end{aligned}
∂θj∂a′(θ)b(θ)=∂θj∂∑i=1kai(θ)bi(θ)=i=1∑k∂θj∂ai(θ)bj(θ)=i=1∑k[∂θj∂ai(θ)bi(θ)+ai(θ)∂θj∂bi(θ)]=i=1∑k(∂θj∂ai(θ)bi(θ))+i=1∑k(ai(θ)∂θj∂bi(θ))=∂θj∂a′(θ)b(θ)+∂θj∂b′(θ)a(θ)
∂
(
a
′
b
)
∂
θ
=
(
c
∂
a
′
(
θ
)
b
(
θ
)
∂
θ
j
)
1
≤
j
≤
l
=
(
c
∂
a
′
(
θ
)
∂
θ
j
b
(
θ
)
+
∂
b
′
(
θ
)
∂
θ
j
a
(
θ
)
)
=
(
c
∂
a
′
(
θ
)
∂
θ
j
)
b
(
θ
)
+
(
c
∂
b
′
(
θ
)
∂
θ
j
)
a
(
θ
)
=
(
∂
a
′
∂
θ
)
b
+
(
∂
b
′
∂
θ
)
a
\begin{aligned} \frac{\partial (a'b)}{\partial \theta}&=\left(_c\frac{\partial a^{\prime}(\theta)b(\theta)}{\partial \theta_{j}}\right)_{1\leq j\leq l}=\left(_c\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}} b(\theta)+\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}} a(\theta)\right)\\&=\left(_c\frac{\partial a^{\prime}(\theta)}{\partial \theta_{j}}\right)b(\theta)+\left(_c\frac{\partial b^{\prime}(\theta)}{\partial \theta_{j}}\right)a(\theta) \\&=(\frac{\partial a'}{\partial\theta})b+(\frac{\partial b'}{\partial\theta})a \end{aligned}
∂θ∂(a′b)=(c∂θj∂a′(θ)b(θ))1≤j≤l=(c∂θj∂a′(θ)b(θ)+∂θj∂b′(θ)a(θ))=(c∂θj∂a′(θ))b(θ)+(c∂θj∂b′(θ))a(θ)=(∂θ∂a′)b+(∂θ∂b′)a
(ii)(Quadratic form)设
x
=
(
x
1
,
.
.
.
,
x
k
)
T
x=(x_1,...,x_k)^T
x=(x1,...,xk)T,
A
=
(
a
i
j
)
k
×
k
A=(a_{ij})_{k\times k}
A=(aij)k×k与
x
x
x无关,则
∂
x
′
A
x
∂
x
=
A
x
+
A
′
x
\frac{\partial x'Ax}{\partial x}=Ax+A'x
∂x∂x′Ax=Ax+A′x
证明:
∂
x
′
∂
x
=
I
\frac{\partial x'}{\partial x}=I
∂x∂x′=I
∂
(
A
x
)
′
∂
x
=
(
j
,
l
∂
∑
i
=
1
k
a
l
i
x
i
∂
x
j
)
k
×
k
=
(
j
,
l
a
l
j
)
k
×
k
=
A
′
\frac{\partial (Ax)'}{\partial x}=\left(_{j,l}\frac{\partial\sum_{i=1}^{k}a_{li}x_i}{\partial x_j}\right)_{k\times k}=\left(_{j,l}a_{lj}\right)_{k\times k}=A'
∂x∂(Ax)′=(j,l∂xj∂∑i=1kalixi)k×k=(j,lalj)k×k=A′
∂
x
′
A
x
∂
x
=
∂
x
′
∂
x
A
x
+
∂
(
A
x
)
′
∂
x
x
=
A
x
+
A
′
x
\frac{\partial x'Ax}{\partial x}=\frac{\partial x'}{\partial x}Ax+\frac{\partial(Ax)'}{\partial x}x=Ax+A'x
∂x∂x′Ax=∂x∂x′Ax+∂x∂(Ax)′x=Ax+A′x
(iii)(Inverse)
A
=
(
a
i
j
(
θ
)
)
k
×
k
A=(a_{ij}(\theta))_{k\times k}
A=(aij(θ))k×k非奇异,
θ
=
(
θ
1
,
.
.
.
,
θ
l
)
T
∈
R
l
\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l
θ=(θ1,...,θl)T∈Rl,则对任意
θ
m
,
1
≤
m
≤
l
\theta_m,1\leq m\leq l
θm,1≤m≤l有:
∂
A
−
1
∂
θ
m
=
−
A
−
1
(
∂
A
∂
θ
m
)
A
−
1
\frac{\partial A^{-1}}{\partial \theta_m}=-A^{-1}(\frac{\partial A}{\partial \theta_m})A^{-1}
∂θm∂A−1=−A−1(∂θm∂A)A−1
证明:首先说明,对
A
=
(
a
i
j
(
ω
)
)
k
×
k
,
B
=
(
b
i
j
(
ω
)
)
k
×
k
,
ω
∈
R
A=(a_{ij}(\omega))_{k\times k},B=(b_{ij}(\omega))_{k\times k},\omega\in\mathbb{R}
A=(aij(ω))k×k,B=(bij(ω))k×k,ω∈R,有
∂
A
B
∂
ω
=
∂
A
∂
ω
B
+
A
∂
B
∂
ω
\frac{\partial AB}{\partial \omega}=\frac{\partial A}{\partial \omega}B+A\frac{\partial B}{\partial \omega}
∂ω∂AB=∂ω∂AB+A∂ω∂B
∂
∑
n
=
1
k
a
i
n
b
l
n
∂
ω
=
∑
n
=
1
k
∂
a
i
n
b
n
j
∂
ω
=
∑
n
=
1
k
[
∂
a
i
n
∂
ω
b
n
j
+
a
i
n
∂
b
n
j
∂
ω
]
=
∑
n
=
1
k
[
∂
a
i
n
∂
ω
b
n
j
]
+
∑
n
=
1
k
[
a
i
n
∂
b
n
j
∂
ω
]
\begin{aligned} \frac{\partial\sum_{n=1}^{k}a_{in}b_{ln}}{\partial \omega}&=\sum_{n=1}^{k}\frac{\partial a_{in}b_{nj}}{\partial \omega}=\sum_{n=1}^{k}\left[ \frac{\partial a_{in}}{\partial \omega}b_{nj}+a_{in}\frac{\partial b_{nj}}{\partial \omega}\right] \\&=\sum_{n=1}^{k}\left[\frac{\partial a_{in}}{\partial\omega}b_{nj}\right]+\sum_{n=1}^{k}\left[a_{in}\frac{\partial b_{nj}}{\partial\omega}\right] \end{aligned}
∂ω∂∑n=1kainbln=n=1∑k∂ω∂ainbnj=n=1∑k[∂ω∂ainbnj+ain∂ω∂bnj]=n=1∑k[∂ω∂ainbnj]+n=1∑k[ain∂ω∂bnj]
∂
A
B
∂
ω
=
(
i
j
∑
n
=
1
k
[
∂
a
i
n
∂
ω
b
n
j
]
)
+
(
i
j
∑
n
=
1
k
[
a
i
n
∂
b
n
j
∂
ω
]
)
=
∂
A
∂
ω
B
+
A
∂
B
∂
ω
\frac{\partial AB}{\partial \omega}=\left(_{ij}\sum_{n=1}^{k}\left[\frac{\partial a_{in}}{\partial\omega}b_{nj}\right]\right)+\left(_{ij}\sum_{n=1}^{k}\left[a_{in}\frac{\partial b_{nj}}{\partial\omega}\right]\right) =\frac{\partial A}{\partial \omega}B+A\frac{\partial B}{\partial \omega}
∂ω∂AB=(ijn=1∑k[∂ω∂ainbnj])+(ijn=1∑k[ain∂ω∂bnj])=∂ω∂AB+A∂ω∂B
对
A
−
1
A
=
I
A^{-1}A=I
A−1A=I两边关于
θ
m
\theta_m
θm求偏导:
∂
A
−
1
∂
θ
m
A
+
A
−
1
∂
A
∂
θ
m
=
0
\frac{\partial A^{-1}}{\partial \theta_m}A+A^{-1}\frac{\partial A}{\partial \theta_m}=0
∂θm∂A−1A+A−1∂θm∂A=0
立得结论。
(iv)(log-determinant)
A
=
(
a
i
j
(
θ
)
)
k
×
k
A=(a_{ij}(\theta))_{k\times k}
A=(aij(θ))k×k正定,
θ
=
(
θ
1
,
.
.
.
,
θ
l
)
T
∈
R
l
\theta=(\theta_1,...,\theta_l)^T\in\mathbb{R}^l
θ=(θ1,...,θl)T∈Rl,则对任意
θ
m
,
1
≤
m
≤
l
\theta_m,1\leq m\leq l
θm,1≤m≤l有:
∂
l
o
g
∣
A
∣
∂
θ
m
=
t
r
(
A
−
1
∂
A
∂
θ
m
)
\frac{\partial log|A|}{\partial \theta_m}=tr(A^{-1}\frac{\partial A}{\partial\theta_m})
∂θm∂log∣A∣=tr(A−1∂θm∂A)
证明:记元素
a
i
j
a_{ij}
aij的代数余子式
A
i
j
A_{ij}
Aij,
∂
l
o
g
∣
A
∣
∂
θ
m
=
1
∣
A
∣
∂
∣
A
∣
∂
θ
m
=
1
∣
A
∣
∑
i
∑
j
∂
∣
A
∣
∂
a
i
j
∂
a
i
j
∂
θ
m
=
1
∣
A
∣
∑
i
∑
j
A
i
j
∂
a
i
j
∂
θ
m
=
∑
j
(
∑
i
A
i
j
∣
A
∣
∂
a
i
j
∂
θ
m
)
=
∑
j
(
∑
i
(
A
−
1
)
j
i
∂
a
i
j
∂
θ
m
)
=
∑
j
(
A
−
1
∂
A
∂
θ
m
)
j
j
=
t
r
(
A
−
1
∂
A
∂
θ
m
)
\begin{aligned} \frac{\partial log|A|}{\partial \theta_m}&=\frac{1}{|A|}\frac{\partial|A|}{\partial\theta_m}=\frac{1}{|A|}\sum_{i}\sum_{j}\frac{\partial|A|}{\partial a_{ij}}\frac{\partial a_{ij}}{\partial\theta_m} \\&=\frac{1}{|A|}\sum_{i}\sum_{j}A_{ij}\frac{\partial a_{ij}}{\partial\theta_m}=\sum_{j}\left(\sum_{i}\frac{A_{ij}}{|A|}\frac{\partial a_{ij}}{\partial\theta_m}\right) \\&=\sum_{j}\left(\sum_{i}(A^{-1})_{ji}\frac{\partial a_{ij}}{\partial\theta_m}\right) \\&=\sum_{j}(A^{-1}\frac{\partial A}{\partial\theta_m})_{jj}\\&=tr(A^{-1}\frac{\partial A}{\partial\theta_m}) \end{aligned}
∂θm∂log∣A∣=∣A∣1∂θm∂∣A∣=∣A∣1i∑j∑∂aij∂∣A∣∂θm∂aij=∣A∣1i∑j∑Aij∂θm∂aij=j∑(i∑∣A∣Aij∂θm∂aij)=j∑(i∑(A−1)ji∂θm∂aij)=j∑(A−1∂θm∂A)jj=tr(A−1∂θm∂A)
其中,第三个等号由
∣
A
∣
=
∑
n
a
i
n
A
i
n
|A|=\sum{n}a_{in}A_{in}
∣A∣=∑nainAin,第五个等号是因为
A
−
1
=
A
∗
∣
A
∣
,
A
∗
=
(
i
,
j
A
j
i
)
A^{-1}=\frac{A^*}{|A|},A^*=(_{i,j}A_{ji})
A−1=∣A∣A∗,A∗=(i,jAji)。