文章目录
1📖 舒尔补介绍
1-1🔖 舒尔补定义
给定任意的矩阵块
M
\mathbf{M}
M , 如下所示:
M
=
[
A
B
C
D
]
\mathbf{M}=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]
M=[ACBD]
- 如果,矩阵块 D \mathrm{D} D 是可逆的,则 A − B D − 1 C \mathrm{A}-\mathrm{BD}^{-1} \mathrm{C} A−BD−1C 称之为 D \mathrm{D} D 关于 M \mathrm{M} M 的舒尔补。
- 如果,矩阵块 A \mathbf{A} A 是可逆的,则 D − C A − 1 B \mathrm{D}-\mathrm{CA}^{-1} \mathrm{~B} D−CA−1 B 称之为 A \mathrm{A} A 关于 M \mathrm{M} M 的舒尔补。
1-2🔖 舒尔补的定理推导
将
M
\mathrm{M}
M 矩阵分别变成上三角或者下三角形:
[
I
0
−
C
A
−
1
I
]
[
A
B
C
D
]
=
[
A
B
0
Δ
A
]
[
A
B
C
D
]
[
I
−
A
−
1
B
0
I
]
=
[
A
0
C
Δ
A
]
\begin{array}{l} {\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{- C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]} \\ {\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & 0 \\ \mathbf{C} & \Delta_{\mathbf{A}} \end{array}\right]} \end{array}
[I−CA−10I][ACBD]=[A0BΔA][ACBD][I0−A−1BI]=[AC0ΔA]
其中:
Δ
A
=
D
−
C
A
−
1
B
\Delta_{\mathrm{A}}=\mathrm{D}-\mathbf{C A}^{-1} \mathbf{B}
ΔA=D−CA−1B 。联合起来, 将
M
\mathbf{M}
M 变形成对角形:
[
I
0
−
C
A
−
1
I
]
[
A
B
C
D
]
[
I
−
A
−
1
B
0
I
]
=
[
A
0
0
Δ
A
]
\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ -\mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]
[I−CA−10I][ACBD][I0−A−1BI]=[A00ΔA]
反过来,我们又能从对角形恢复成矩阵
M
\mathbf{M}
M :
[
I
0
C
A
−
1
I
]
[
A
0
0
Δ
A
]
[
I
A
−
1
B
0
I
]
=
[
A
B
C
D
]
\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]
[ICA−10I][A00ΔA][I0A−1BI]=[ACBD]
1-3 🔖 用途:快速求矩阵的逆
矩阵
M
\mathrm{M}
M 可写为:
M
=
[
A
B
C
D
]
=
[
I
0
C
A
−
1
I
]
[
A
0
0
Δ
A
]
[
I
A
−
1
B
0
I
]
\mathrm{M} = \left[\begin{array}{ll} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]=\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ \mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]
M=[ACBD]=[ICA−10I][A00ΔA][I0A−1BI]
所以
M
−
1
=
[
A
B
C
D
]
−
1
=
[
I
−
A
−
1
B
0
I
]
[
A
−
1
0
0
Δ
A
−
1
]
[
I
0
−
C
A
−
1
I
]
\mathrm{M}^{-1}= \left[\begin{array}{cc} \mathbf{A} & \mathbf{B} \\ \mathbf{C} & \mathbf{D} \end{array}\right]^{-1}=\left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{A}^{-1} & \mathbf{0} \\ \mathbf{0} & \Delta_{\mathbf{A}}^{-1} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{0} \\ -\mathbf{C A}^{-1} & \mathbf{I} \end{array}\right]
M−1=[ACBD]−1=[I0−A−1BI][A−100ΔA−1][I−CA−10I]
依据提示:
[ I − A − 1 B 0 I ] [ I A − 1 B 0 I ] = I \left[\begin{array}{cc} \mathbf{I} & -\mathbf{A}^{-\mathbf{1}} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]\left[\begin{array}{cc} \mathbf{I} & \mathbf{A}^{-1} \mathbf{B} \\ \mathbf{0} & \mathbf{I} \end{array}\right]=\mathbf{I} [I0−A−1BI][I0A−1BI]=I
最终结果:
M
−
1
=
[
A
−
1
+
A
−
1
B
Δ
A
−
1
C
A
−
1
−
A
−
1
B
Δ
A
−
1
−
Δ
A
−
1
C
A
−
1
Δ
A
−
1
]
\mathrm{M}^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} B \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} B \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right]
M−1=[A−1+A−1BΔA−1CA−1−ΔA−1CA−1−A−1BΔA−1ΔA−1]
1-4🔖用途:舒尔补在信息矩阵求解中的使用
假设我们已知信息矩阵:
另外,根据舒尔补公式可知,协方差矩阵各块和信息矩阵之间的关系有:
协方差矩阵:
Σ
=
[
A
C
⊤
C
D
]
\mathbf{\Sigma} =\left[\begin{array}{ll} A & C^{\top} \\ C & D \end{array}\right]
Σ=[ACC⊤D]
对应的信息矩阵
Σ − 1 = [ A C ⊤ C D ] − 1 = [ A − 1 + A − 1 C ⊤ Δ A − 1 C A − 1 − A − 1 C ⊤ Δ A − 1 − Δ A − 1 C A − 1 Δ A − 1 ] ≜ [ Λ a a Λ a b Λ b a Λ b b ] \mathbf{\Sigma}^{-1} = \left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right] \triangleq\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right] Σ−1=[ACC⊤D]−1=[A−1+A−1C⊤ΔA−1CA−1−ΔA−1CA−1−A−1C⊤ΔA−1ΔA−1]≜[ΛaaΛbaΛabΛbb]
注意:
中间那一步是利用舒尔补求逆的过程,这里直接使用了上一小结的结论带入,,具体过程参考上一小结。
其中:
Δ A = D − C A − 1 C ⊤ \Delta_{\mathrm{A}}=\mathrm{D}-\mathbf{C A}^{-1} \mathbf{C}^{\top} ΔA=D−CA−1C⊤
根据对应关系,不难得出:
Δ
A
−
1
=
Λ
b
b
A
−
1
=
Λ
a
a
−
Λ
a
b
Λ
b
b
−
1
Λ
b
a
\Delta_{A}^{-1}=\Lambda_{b b} \\ A^{-1}=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}^{-1} \Lambda_{b a}
ΔA−1=ΛbbA−1=Λaa−ΛabΛbb−1Λba
或者:
D
−
1
=
Λ
b
b
−
Λ
b
a
Λ
a
a
−
1
Λ
a
b
D^{-1}=\Lambda_{b b}-\Lambda_{b a} \Lambda_{a a}^{-1} \Lambda_{a b}
D−1=Λbb−ΛbaΛaa−1Λab
这里的
A
−
1
A^{-1}
A−1或者
D
−
1
D^{-1}
D−1就是在下一次优化会使用的先验信息矩阵(又名:边际概率的信息矩阵)。
其他
关于边际概率和条件概率的使用,有兴趣的可以参考下一小节(先给出下一小节的结论)
P
(
a
,
b
)
=
N
(
[
μ
a
μ
b
]
,
[
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
]
)
=
N
−
1
(
[
η
a
η
b
]
,
[
Λ
a
a
Λ
a
a
Λ
b
a
Λ
b
b
]
)
P(\boldsymbol{a}, \boldsymbol{b})=\mathcal{N}\left(\left[\begin{array}{l} \boldsymbol{\mu}_{a} \\ \boldsymbol{\mu}_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Sigma}_{a a} & \boldsymbol{\Sigma}_{a b} \\ \boldsymbol{\Sigma}_{b a} & \boldsymbol{\Sigma}_{b b} \end{array}\right]\right)=\mathcal{N}^{-1}\left(\left[\begin{array}{l} \eta_{a} \\ \eta_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Lambda}_{a a} & \boldsymbol{\Lambda}_{a a} \\ \boldsymbol{\Lambda}_{b a} & \boldsymbol{\Lambda}_{b b} \end{array}\right]\right)
P(a,b)=N([μaμb],[ΣaaΣbaΣabΣbb])=N−1([ηaηb],[ΛaaΛbaΛaaΛbb])
以及
边际概率
条件概率
p
(
a
)
=
∫
p
(
a
,
b
)
d
b
p
(
a
∣
b
)
=
p
(
a
,
b
)
/
p
(
b
)
协方差矩阵
μ
=
μ
a
μ
′
=
μ
a
+
Σ
a
b
Σ
b
b
−
1
(
b
−
μ
b
)
Σ
=
Σ
a
a
Σ
′
=
Σ
a
a
−
Σ
a
b
Σ
b
b
−
1
Σ
b
a
信息矩阵
η
=
η
a
−
Λ
a
β
Λ
b
b
−
1
η
b
Λ
=
Λ
a
a
−
Λ
a
b
Λ
b
b
−
1
Λ
b
a
η
′
=
η
a
−
Λ
a
b
b
Λ
′
=
Λ
a
a
\begin{array}{|c|c|c|} \hline & \text { 边际概率 } & \text { 条件概率 } \\ & p(\boldsymbol{a})=\int p(\boldsymbol{a}, \boldsymbol{b}) d \boldsymbol{b} & p(\boldsymbol{a} \mid \boldsymbol{b})=p(\boldsymbol{a}, \boldsymbol{b}) / p(\boldsymbol{b}) \\ \hline {\text { 协方差矩阵 }} & \boldsymbol{\mu}=\boldsymbol{\mu}_{a} & \boldsymbol{\mu}^{\prime}=\boldsymbol{\mu}_{a}+\Sigma_{a b} \Sigma_{b b}^{-1}\left(\boldsymbol{b}-\boldsymbol{\mu}_{b}\right) \\ & \Sigma=\Sigma_{a a} & \Sigma^{\prime}=\Sigma_{a a}-\Sigma_{a b} \Sigma_{b b}^{-1} \Sigma_{b a} \\ \hline \text { 信息矩阵 } & \begin{array}{c} \boldsymbol{\eta}=\boldsymbol{\eta}_{a}-\Lambda_{a \beta} \Lambda_{b b}^{-1} \boldsymbol{\eta}_{b} \\ \Lambda=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}{ }^{-1} \Lambda_{b a} \end{array} & \begin{array}{c} \boldsymbol{\eta}^{\prime}=\boldsymbol{\eta}_{a}-\Lambda_{a b} \boldsymbol{b} \\ {\Lambda^{\prime}=\Lambda_{a a}} \end{array} \\ \hline \end{array}
协方差矩阵 信息矩阵 边际概率 p(a)=∫p(a,b)dbμ=μaΣ=Σaaη=ηa−ΛaβΛbb−1ηbΛ=Λaa−ΛabΛbb−1Λba 条件概率 p(a∣b)=p(a,b)/p(b)μ′=μa+ΣabΣbb−1(b−μb)Σ′=Σaa−ΣabΣbb−1Σbaη′=ηa−ΛabbΛ′=Λaa
1-5🔖用途: 舒尔补应用于多元高斯分布
通过舒尔补分解多元高斯分布
假设多元变量
x
\mathrm{x}
x 服从高斯分布,且由两部分组成:
x
=
[
a
b
]
\mathbf{x}=\left[\begin{array}{c}a \\ b\end{array}\right]
x=[ab], 变量之 间构成的协方差矩阵为:
K
=
[
A
C
⊤
C
D
]
\mathbf{K}=\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]
K=[ACC⊤D]
其中
A
=
cov
(
a
,
a
)
,
D
=
cov
(
b
,
b
)
,
C
=
cov
(
a
,
b
)
A=\operatorname{cov}(a, a), D=\operatorname{cov}(b, b), C=\operatorname{cov}(a, b)
A=cov(a,a),D=cov(b,b),C=cov(a,b). 由此变量
x
\mathrm{x}
x 的概率分布为:
P
(
a
,
b
)
=
P
(
a
)
P
(
b
∣
a
)
∝
exp
(
−
1
2
[
a
b
]
⊤
[
A
C
⊤
C
D
]
−
1
[
a
b
]
)
P(a, b)=P(a) P(b \mid a) \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}\left[\begin{array}{l} a \\ b \end{array}\right]\right)
P(a,b)=P(a)P(b∣a)∝exp(−21[ab]⊤[ACC⊤D]−1[ab])
利用舒尔补一节公式, 对高斯分布进行分解,得
P
(
a
,
b
)
∝
exp
(
−
1
2
[
a
b
]
⊤
[
A
C
⊤
C
D
]
−
1
[
a
b
]
)
∝
exp
(
−
1
2
[
a
b
]
⊤
[
I
−
A
−
1
C
⊤
0
I
]
[
A
−
1
0
0
Δ
A
−
1
]
[
I
0
−
C
A
−
1
I
]
[
a
b
]
)
∝
exp
(
−
1
2
[
a
⊤
(
b
−
C
A
−
1
a
)
⊤
]
[
A
−
1
0
0
Δ
A
−
1
]
[
a
b
−
C
A
−
1
a
]
)
∝
exp
(
−
1
2
(
a
⊤
A
−
1
a
)
+
(
b
−
C
A
−
1
a
)
⊤
Δ
A
−
1
(
b
−
C
A
−
1
a
)
)
∝
exp
(
−
1
2
a
⊤
A
−
1
a
)
⏟
p
(
a
)
exp
(
−
1
2
(
b
−
C
A
−
1
a
)
⊤
Δ
A
−
1
(
b
−
C
A
−
1
a
)
)
⏟
p
(
b
]
a
)
\begin{array}{l} P(a, b) \\ \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}\left[\begin{array}{l} a \\ b \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left[\begin{array}{l} a \\ b \end{array}\right]^{\top}\left[\begin{array}{cc} I & -A^{-1} C^{\top} \\ 0 & I \end{array}\right]\left[\begin{array}{cc} A^{-1} & 0 \\ 0 & \Delta_{\mathrm{A}}^{-1} \end{array}\right]\left[\begin{array}{cc} I & 0 \\ -C A^{-1} & I \end{array}\right]\left[\begin{array}{l} a \\ b \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left[a^{\top} \quad\left(b-C A^{-1} a\right)^{\top}\right]\left[\begin{array}{cc} A^{-1} & 0 \\ 0 & \Delta_{\mathbf{A}}^{-1} \end{array}\right]\left[\begin{array}{c} a \\ b-C A^{-1} a \end{array}\right]\right) \\ \propto \exp \left(-\frac{1}{2}\left(a^{\top} A^{-1} a\right)+\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right) \\ \propto \underbrace{\exp \left(-\frac{1}{2} a^{\top} A^{-1} a\right)}_{p(a)} \underbrace{\exp \left(-\frac{1}{2}\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right)}_{p(b] a)} \end{array}
P(a,b)∝exp(−21[ab]⊤[ACC⊤D]−1[ab])∝exp(−21[ab]⊤[I0−A−1C⊤I][A−100ΔA−1][I−CA−10I][ab])∝exp(−21[a⊤(b−CA−1a)⊤][A−100ΔA−1][ab−CA−1a])∝exp(−21(a⊤A−1a)+(b−CA−1a)⊤ΔA−1(b−CA−1a))∝p(a)
exp(−21a⊤A−1a)p(b]a)
exp(−21(b−CA−1a)⊤ΔA−1(b−CA−1a))
这意味着我们能从多元高斯分布
P
(
a
,
b
)
\mathrm{P}(\mathrm{a}, \mathrm{b})
P(a,b) 中分解得到边际概率
p
(
a
)
\mathrm{p}(\mathrm{a})
p(a) 和 条件概率
p
(
b
∣
a
)
\mathrm{p}(\mathrm{b} | \mathrm{a})
p(b∣a) 。
边际概率和条件概率的信息矩阵
假设我们已知信息矩阵:
[
A
C
⊤
C
D
]
−
1
=
[
Λ
a
a
Λ
a
b
Λ
b
a
Λ
b
b
]
\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right]
[ACC⊤D]−1=[ΛaaΛbaΛabΛbb]
另外,由舒尔补矩阵求逆公式可知,协方差矩阵各块和信息矩阵之间有:
[
A
C
⊤
C
D
]
−
1
=
[
A
−
1
+
A
−
1
C
⊤
Δ
A
−
1
C
A
−
1
−
A
−
1
C
⊤
Δ
A
−
1
−
Δ
A
−
1
C
A
−
1
Δ
A
−
1
]
≜
[
Λ
a
a
Λ
a
b
Λ
b
a
Λ
b
b
]
\left[\begin{array}{cc} A & C^{\top} \\ C & D \end{array}\right]^{-1}=\left[\begin{array}{cc} A^{-1}+A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} C A^{-1} & -A^{-1} C^{\top} \Delta_{\mathrm{A}}^{-1} \\ -\Delta_{\mathrm{A}}^{-1} C A^{-1} & \Delta_{\mathrm{A}}^{-1} \end{array}\right] \triangleq\left[\begin{array}{cc} \Lambda_{a a} & \Lambda_{a b} \\ \Lambda_{b a} & \Lambda_{b b} \end{array}\right]
[ACC⊤D]−1=[A−1+A−1C⊤ΔA−1CA−1−ΔA−1CA−1−A−1C⊤ΔA−1ΔA−1]≜[ΛaaΛbaΛabΛbb]
由条件概率
P
(
b
∣
a
)
P(b \mid a)
P(b∣a) 的协方差为
Δ
A
\Delta_{A}
ΔA 以及公式, 易得其信息矩阵为
Δ
A
−
1
=
Λ
b
b
\Delta_{A}^{-1}=\Lambda_{b b}
ΔA−1=Λbb
由边际概率
P
(
a
)
P(a)
P(a) 的协方差为
A
A
A 以及公式 , 易得其信息矩阵为:
A
−
1
=
Λ
a
a
−
Λ
a
b
Λ
b
b
−
1
Λ
b
a
A^{-1}=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}^{-1} \Lambda_{b a}
A−1=Λaa−ΛabΛbb−1Λba
总结
关于
P
(
a
)
\bf{P(a)}
P(a)
P
(
a
)
=
∫
b
P
(
a
,
b
)
P
(
a
)
∝
exp
(
−
1
2
a
⊤
A
−
1
a
)
∼
N
(
0
,
A
)
\begin{array}{l} P(a)=\int_{b} P(a, b) \\ P(a) \propto \exp \left(-\frac{1}{2} a^{\top} A^{-1} a\right) \sim \mathcal{N}(0, A) \end{array}
P(a)=∫bP(a,b)P(a)∝exp(−21a⊤A−1a)∼N(0,A)
启示:边际概率的协方差就是从联合分布中取对应的矩阵块就行了。
关于
P
(
b
∣
a
)
\bf{P(b | a)}
P(b∣a)
P
(
b
∣
a
)
∝
exp
(
−
1
2
(
b
−
C
A
−
1
a
)
⊤
Δ
A
−
1
(
b
−
C
A
−
1
a
)
)
P(b | a) \propto \exp \left(-\frac{1}{2}\left(b-C A^{-1} a\right)^{\top} \Delta_{\mathbf{A}}^{-1}\left(b-C A^{-1} a\right)\right)
P(b∣a)∝exp(−21(b−CA−1a)⊤ΔA−1(b−CA−1a))
启示:
P
(
b
∣
a
)
∼
N
(
C
A
−
1
a
,
Δ
A
)
P(b | a) \sim \mathcal{N}\left(C A^{-1} a, \Delta_{A}\right)
P(b∣a)∼N(CA−1a,ΔA) 。协方差变为
a
a
a 对应的舒尔补, 均值也变了。
最后
P
(
a
,
b
)
=
N
(
[
μ
a
μ
b
]
,
[
Σ
a
a
Σ
a
b
Σ
b
a
Σ
b
b
]
)
=
N
−
1
(
[
η
a
η
b
]
,
[
Λ
a
a
Λ
a
a
Λ
b
a
Λ
b
b
]
)
P(\boldsymbol{a}, \boldsymbol{b})=\mathcal{N}\left(\left[\begin{array}{l} \boldsymbol{\mu}_{a} \\ \boldsymbol{\mu}_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Sigma}_{a a} & \boldsymbol{\Sigma}_{a b} \\ \boldsymbol{\Sigma}_{b a} & \boldsymbol{\Sigma}_{b b} \end{array}\right]\right)=\mathcal{N}^{-1}\left(\left[\begin{array}{l} \eta_{a} \\ \eta_{b} \end{array}\right],\left[\begin{array}{cc} \boldsymbol{\Lambda}_{a a} & \boldsymbol{\Lambda}_{a a} \\ \boldsymbol{\Lambda}_{b a} & \boldsymbol{\Lambda}_{b b} \end{array}\right]\right)
P(a,b)=N([μaμb],[ΣaaΣbaΣabΣbb])=N−1([ηaηb],[ΛaaΛbaΛaaΛbb])
以及
边际概率
条件概率
p
(
a
)
=
∫
p
(
a
,
b
)
d
b
p
(
a
∣
b
)
=
p
(
a
,
b
)
/
p
(
b
)
协方差矩阵
μ
=
μ
a
μ
′
=
μ
a
+
Σ
a
b
Σ
b
b
−
1
(
b
−
μ
b
)
Σ
=
Σ
a
a
Σ
′
=
Σ
a
a
−
Σ
a
b
Σ
b
b
−
1
Σ
b
a
信息矩阵
η
=
η
a
−
Λ
a
β
Λ
b
b
−
1
η
b
Λ
=
Λ
a
a
−
Λ
a
b
Λ
b
b
−
1
Λ
b
a
η
′
=
η
a
−
Λ
a
b
b
Λ
′
=
Λ
a
a
\begin{array}{|c|c|c|} \hline & \text { 边际概率 } & \text { 条件概率 } \\ & p(\boldsymbol{a})=\int p(\boldsymbol{a}, \boldsymbol{b}) d \boldsymbol{b} & p(\boldsymbol{a} \mid \boldsymbol{b})=p(\boldsymbol{a}, \boldsymbol{b}) / p(\boldsymbol{b}) \\ \hline {\text { 协方差矩阵 }} & \boldsymbol{\mu}=\boldsymbol{\mu}_{a} & \boldsymbol{\mu}^{\prime}=\boldsymbol{\mu}_{a}+\Sigma_{a b} \Sigma_{b b}^{-1}\left(\boldsymbol{b}-\boldsymbol{\mu}_{b}\right) \\ & \Sigma=\Sigma_{a a} & \Sigma^{\prime}=\Sigma_{a a}-\Sigma_{a b} \Sigma_{b b}^{-1} \Sigma_{b a} \\ \hline \text { 信息矩阵 } & \begin{array}{c} \boldsymbol{\eta}=\boldsymbol{\eta}_{a}-\Lambda_{a \beta} \Lambda_{b b}^{-1} \boldsymbol{\eta}_{b} \\ \Lambda=\Lambda_{a a}-\Lambda_{a b} \Lambda_{b b}{ }^{-1} \Lambda_{b a} \end{array} & \begin{array}{c} \boldsymbol{\eta}^{\prime}=\boldsymbol{\eta}_{a}-\Lambda_{a b} \boldsymbol{b} \\ {\Lambda^{\prime}=\Lambda_{a a}} \end{array} \\ \hline \end{array}
协方差矩阵 信息矩阵 边际概率 p(a)=∫p(a,b)dbμ=μaΣ=Σaaη=ηa−ΛaβΛbb−1ηbΛ=Λaa−ΛabΛbb−1Λba 条件概率 p(a∣b)=p(a,b)/p(b)μ′=μa+ΣabΣbb−1(b−μb)Σ′=Σaa−ΣabΣbb−1Σbaη′=ηa−ΛabbΛ′=Λaa
参考资料
深蓝学院手写vio课程