混合概率PCA (mixture of probabilistic principal component analysis, MPPCA)
- 引言
- 潜变量模型和PCA
- 概率PCA
- 混合概率PCA
- 局部模型数量确定
引言
作为一个坚守主成分分析(principal component analysis, PCA) 的学渣,虽然了解大量关于PCA的拓展方法,但依然觉得PCA还是有许多可以研究的内容。本文浅谈一种改进的混合概率PCA 方法,将概率PCA拓展到非线性异常监测中。此外,该方法也适合于多工况异常监测,工况可自动辨识。
本文参考:Zhang, J., Chen, H., Chen, S., & Hong, X. (2019). An Improved Mixture of Probabilistic PCA for Nonlinear Data-Driven Process Monitoring. IEEE Transactions on Systems, Man, and Cybernetics, 49(1), 198–210.
正文链接
预印版链接
潜变量模型和PCA
考虑下列模型
t
=
y
(
x
;
w
)
+
ξ
{\boldsymbol t} = {\boldsymbol y}({\boldsymbol x};{\boldsymbol w}) + {\boldsymbol \xi}
t=y(x;w)+ξ,
其中,观测数据为
t
∈
ℜ
d
{\boldsymbol t}\in \Re^{d}
t∈ℜd, 潜变量为
x
∈
ℜ
q
\boldsymbol{x}\in \Re^{q}
x∈ℜq,
w
{\boldsymbol w}
w为模型参数,
ε
\boldsymbol{\varepsilon}
ε为独立噪声。线性化模型为:
t
=
W
x
+
μ
+
ξ
\boldsymbol {t} = {\boldsymbol W} {\boldsymbol x} + \boldsymbol{\mu}+ {\boldsymbol \xi}
t=Wx+μ+ξ。
假设,
x
∼
N
(
0
,
I
)
{\boldsymbol x} \sim {\rm N}(\boldsymbol{0},{\boldsymbol I})
x∼N(0,I),
ξ
∼
N
(
0
,
Ψ
)
{\boldsymbol \xi } \sim {\rm N}(\boldsymbol{0},\boldsymbol{\Psi} )
ξ∼N(0,Ψ),
Ψ
∈
ℜ
d
×
d
\boldsymbol{\Psi} \in \Re^{ d \times d }
Ψ∈ℜd×d为对角矩阵,
μ
∈
ℜ
d
\boldsymbol{\mu} \in \Re^{d}
μ∈ℜd为输出均值向量,
W
∈
ℜ
d
×
q
{\boldsymbol W} \in \Re ^{d \times q}
W∈ℜd×q为载荷矩阵。 在高斯假设下,上述线性模型产生的观测数据服从高斯分布,即
t
∼
N
(
μ
,
C
)
{\boldsymbol t} \sim {\rm N}({\boldsymbol \mu} ,{\boldsymbol C})
t∼N(μ,C) ,
C
=
Ψ
+
W
W
T
∈
ℜ
d
×
d
{\boldsymbol C}= {\boldsymbol \Psi} + {\boldsymbol W}{\boldsymbol W}^{\rm T} \in \Re^{d \times d}
C=Ψ+WWT∈ℜd×d.
观测数据集 { t n } n = 1 N \{{\boldsymbol t}_n \}_{n=1}^N {tn}n=1N,线性模型的矩阵表达为
T = W X + u 1 T + Ξ \boldsymbol {T}={\boldsymbol W} {\boldsymbol X}+\boldsymbol{u}\boldsymbol{1}^{\rm T}+\boldsymbol {\Xi} T=WX+u1T+Ξ,
其中, T = [ t 1 , . . . , t N ] ∈ ℜ d × N \boldsymbol {T}=[\boldsymbol {t}_1,..., \boldsymbol {t}_N] \in \Re^{d \times N} T=[t1,...,tN]∈ℜd×N, X = [ x 1 , . . . x N ] ∈ ℜ q × N \boldsymbol {X}=[\boldsymbol {x}_1,... \boldsymbol {x}_N] \in \Re^{q \times N} X=[x1,...xN]∈ℜq×N, Ξ = [ ξ 1 , . . . ξ N ] ∈ ℜ d × N \boldsymbol {\Xi}=[\boldsymbol {\xi}_1,... \boldsymbol {\xi}_N ] \in \Re^{d \times N} Ξ=[ξ1,...ξN]∈ℜd×N。均值为 u = 1 N ∑ n = 1 N t n \boldsymbol{u}=\frac{1}{N}\sum_{n=1}^{N}\boldsymbol {t}_n u=N1∑n=1Ntn,协方差矩阵 S = 1 N ( T − u 1 T ) ( T − u 1 T ) T {\boldsymbol S}= \frac{1}{N} (\boldsymbol {T}-\boldsymbol{u}\boldsymbol{1}^{\rm T})(\boldsymbol {T}-\boldsymbol{u}\boldsymbol{1}^{\rm T})^{\rm T} S=N1(T−u1T)(T−u1T)T。
概率PCA
PCA的主要目标是寻找到低维空间的投影向量,同时保证重构误差最小。
给定噪声
ε
∼
N
(
0
,
σ
2
I
)
{\boldsymbol \varepsilon} \sim {\rm N}({\boldsymbol 0},{\sigma ^2}\boldsymbol {I})
ε∼N(0,σ2I), 则通过以下公式揭示特定
x
\boldsymbol x
x在
t
\boldsymbol t
t-空间上的概率分布
p
(
t
∣
x
)
=
(
2
π
σ
2
)
−
d
/
2
exp
{
−
1
2
σ
2
∥
t
−
W
x
−
μ
∥
2
}
p({\boldsymbol t}|{\boldsymbol x}) = {(2\pi {\sigma ^2})^{ - d/2}}\exp \left\{ { - \frac{1}{{2{\sigma ^2}}}{{\left\| {\boldsymbol t - {\boldsymbol W} {\boldsymbol x} - \boldsymbol \mu } \right\|}^2}} \right\}
p(t∣x)=(2πσ2)−d/2exp{−2σ21∥t−Wx−μ∥2}
x
\boldsymbol x
x上的高斯先验概率可以定义为
p ( x ) = ( 2 π ) − q / 2 exp { − 1 2 x T x } p(\boldsymbol x) = {(2\pi)^{ - q/2}}\exp \left\{ { - \frac{1}{2}{{\boldsymbol x}^{\rm T}} {\boldsymbol x}} \right\} p(x)=(2π)−q/2exp{−21xTx}
t \boldsymbol t t 的边缘分布为
p ( t ) = ∫ p ( t ∣ x ) p ( x ) d x = ( 2 π ) − d / 2 ∣ C ∣ − 1 / 2 exp { − 1 2 ( t − μ ) T C − 1 ( t − μ ) } \begin{array}{l} p(\boldsymbol t) = \int { p({\boldsymbol t}|{\boldsymbol x})p(\boldsymbol x)d{\boldsymbol x}}\\ \quad \quad = {(2\pi )^{ - d/2}}{\left| { \boldsymbol C} \right|^{ - 1/2}}\exp \left\{ { - \frac{1}{2}{{({\boldsymbol t} - \boldsymbol \mu )}^{\rm T}}{{\boldsymbol C}^{ - 1}}(\boldsymbol t - \boldsymbol \mu )} \right\} \end{array} p(t)=∫p(t∣x)p(x)dx=(2π)−d/2∣C∣−1/2exp{−21(t−μ)TC−1(t−μ)}
其中. 模型协方差为
C
=
σ
2
I
+
W
W
T
\boldsymbol C = {\sigma ^2}{\boldsymbol I} + \boldsymbol W{{\boldsymbol W}^{\rm T}}
C=σ2I+WWT
在 Bayesian理论下, 给定
t
\boldsymbol t
t,
x
\boldsymbol x
x 的后验分布为
p ( x ∣ t ) = e x p { − 1 2 { x − M − 1 W T ( t − μ ) } T ( σ − 2 M ) { x − M − 1 W T ( t − μ ) } } × ( 2 π ) − q / 2 ∣ σ − 2 M ∣ 1 / 2 p(\boldsymbol x|\boldsymbol t) = \rm{exp} \left\{ -\frac{1}{2} \left\{\boldsymbol x-{\boldsymbol M^{ - 1}}{\boldsymbol W^{\rm T}}(\boldsymbol t -\boldsymbol \mu ) \right\} ^{\rm T} {({\sigma ^{ - 2}}{\boldsymbol M})} \right. \left. { \left\{\boldsymbol x-{\boldsymbol M^{ - 1}}{\boldsymbol W^{\rm T}}(\boldsymbol t -\boldsymbol \mu ) \right\}} \right\} \times (2 \pi)^{-q/2} \left|{\sigma ^{ -2}}{\boldsymbol M}\right|^{1/2} p(x∣t)=exp{−21{x−M−1WT(t−μ)}T(σ−2M){x−M−1WT(t−μ)}}×(2π)−q/2∣∣σ−2M∣∣1/2
其中,后验协方差为
σ
2
M
−
1
=
σ
2
(
σ
2
I
+
W
T
W
)
−
1
{\sigma ^2}{\boldsymbol M^{ - 1}} = {\sigma ^2}{({\sigma ^2}{\boldsymbol I} + {{\boldsymbol W}^{\rm T}}{\boldsymbol W})^{ - 1}}
σ2M−1=σ2(σ2I+WTW)−1.
M
∈
ℜ
q
×
q
\boldsymbol{M} \in \Re^{ q\times q }
M∈ℜq×q,
C
∈
ℜ
d
×
d
\boldsymbol{C} \in \Re^{ d\times d }
C∈ℜd×d .
对数似然函数为
L
=
∑
n
=
1
N
ln
{
p
(
t
n
)
}
=
−
N
2
{
d
ln
(
2
π
)
+
ln
∣
C
∣
+
t
r
(
C
−
1
S
)
}
\begin{array}{l} L = \sum\limits_{n = 1}^N {\ln \left\{ {p\left( {{{\boldsymbol t_n}}} \right)} \right\}} \\ \;\;{\kern 1pt} = - \frac{N}{2}\left\{ {d\ln \left( {2\pi } \right) + \ln \left|{\boldsymbol C } \right| + tr\left( {{{\boldsymbol C}^{ - 1}}{\boldsymbol S}} \right)} \right\} \end{array}
L=n=1∑Nln{p(tn)}=−2N{dln(2π)+ln∣C∣+tr(C−1S)}
当 W {\boldsymbol W} W的列跨越数据的主子空间时,对数似然最大。解析解也可以通过 S {\boldsymbol S} S的特征分解和噪声方差 σ 2 \sigma^2 σ2的估计(基于 S {\boldsymbol S} S的最小特征值)获得。或者,可以使用EM算法的迭代方法来生成以下完整数据对数似然
L c = ∑ n = 1 N ln { p ( t n , x n ) } = ∑ n = 1 N ln { ( 2 π σ 2 ) − d / 2 exp { − 1 2 σ 2 ∥ t n − W x n − μ ∥ 2 } ( 2 π ) − q / 2 exp { − 1 2 x n T x n } } L_c=\sum_{n=1}^{N} \ln\left\{p({\boldsymbol t}_n, {\boldsymbol x}_n ) \right\}\\ = \sum_{n=1}^{N} \ln\left\{{(2\pi {\sigma ^2})^{ - d/2}}\exp \left\{ { - \frac{1}{{2{\sigma ^2}}}{{\left\| {\boldsymbol t}_n - {\boldsymbol W} {\boldsymbol x}_n - {\boldsymbol \mu} \right\|}^2}} \right\} \right. \left. {(2\pi )^{ - q/2}}\exp \left\{ { - \frac{1}{2} {\boldsymbol x}_n^{\rm T} {\boldsymbol x}_n} \right\}\right\} Lc=∑n=1Nln{p(tn,xn)}=∑n=1Nln{(2πσ2)−d/2exp{−2σ21∥tn−Wxn−μ∥2}(2π)−q/2exp{−21xnTxn}}
在EM算法中,模型的后验均值和协方差为
⟨ x n ⟩ = M − 1 W T ( t n − μ ) \left\langle {{\boldsymbol x_{n}}} \right\rangle = {\boldsymbol M }^{ - 1}{\boldsymbol W }^{\rm T} \left( {{\boldsymbol t_n} - {\boldsymbol \mu }} \right) ⟨xn⟩=M−1WT(tn−μ)
⟨ x n x n T ⟩ = σ 2 M − 1 + ⟨ x n ⟩ ⟨ x n ⟩ T \left\langle {{\boldsymbol x_{n}}{\boldsymbol x_{n}^{\rm T}}} \right\rangle = \sigma ^2 {\boldsymbol M }^{ - 1} + \left\langle {{\boldsymbol x_{n}}} \right\rangle {\left\langle {{\boldsymbol x_{n}}} \right\rangle ^{\rm T}} ⟨xnxnT⟩=σ2M−1+⟨xn⟩⟨xn⟩T
混合概率PCA
为了能够对更复杂的数据进行建模,引入了混合概率PCA(MPPCA。通过在K个局部PCA模型预测输出,它为处理非线性和缺失数据提供了更强大的基础。根据概率规则,考虑K个局部PCA模型的混合,而不是用单一模型来表示系统:
p ( t ) = ∑ i = 1 K p ( i ) p ( t ∣ i ) = ∑ i = 1 K π i p ( t ∣ i ) ) p(\boldsymbol{t})= \sum_{i=1}^{K}p(i)p(\boldsymbol{t}|i) =\sum_{i=1}^{K}\pi_ip(\boldsymbol{t}|i)) p(t)=∑i=1Kp(i)p(t∣i)=∑i=1Kπip(t∣i))
约束为
π
i
≥
0
{\pi _i} \ge 0
πi≥0,
∑
π
i
=
1
\sum {{\pi _i} = 1}
∑πi=1。
p
(
i
)
p(i)
p(i) 为选择第
i
i
i个模型的概率,每个
p
(
t
∣
i
)
p(\boldsymbol{t}|i)
p(t∣i),PCA模型为
t
=
W
i
x
+
μ
i
+
ξ
i
,
i
=
1
,
.
.
.
,
K
\boldsymbol {t} = {\boldsymbol W}_i {\boldsymbol x} + \boldsymbol{\mu}_i+ {\boldsymbol \xi}_i, \ \ i=1, ..., K
t=Wix+μi+ξi, i=1,...,K。 每个局部PCA模型的后验均值和协方差为
⟨ x n ( i ) ⟩ = M i − 1 W i T ( t n − μ i ) ∈ ℜ q \left\langle {{\boldsymbol x^{(i)}_{n}}} \right\rangle = {\boldsymbol M }_i^{ - 1}{\boldsymbol W }_i^{\rm T}\left( {{\boldsymbol t_n} - {\boldsymbol \mu_i }} \right) \in \Re ^{q} ⟨xn(i)⟩=Mi−1WiT(tn−μi)∈ℜq
⟨ x n ( i ) x n ( i ) T ⟩ = σ i 2 M i − 1 + ⟨ x n ( i ) ⟩ ⟨ x n ( i ) ⟩ T ∈ ℜ q × q \left\langle {{\boldsymbol x^{(i)}_{n}}} {{\boldsymbol x^{(i)}_{n}}}^{\rm T} \right\rangle = \sigma_i ^2 {\boldsymbol M }_i^{ - 1} + \left\langle {{\boldsymbol x^{(i)}_{n}}} \right\rangle {\left\langle {{\boldsymbol x^{(i)}_{n}}} \right\rangle ^{\rm T}} \in \Re ^{q\times q} ⟨xn(i)xn(i)T⟩=σi2Mi−1+⟨xn(i)⟩⟨xn(i)⟩T∈ℜq×q
局部模型数量确定
各局部模型参数受到局部模型数量K影响,本文用Baye Ying-Yang 确定最优K,准则如下:
K
∘
=
arg
min
i
H
(
i
)
{K^ \circ } = \arg \mathop {\min }\limits_i H\left( i \right)
K∘=argiminH(i)
H
(
i
)
≡
−
1
N
∑
n
=
1
N
∑
i
=
1
K
p
(
i
∣
t
n
,
θ
)
ln
(
p
(
t
n
∣
i
)
)
−
∑
i
=
1
K
π
i
ln
π
i
H\left( i \right) \equiv - \frac{1}{N}\sum\limits_{n = 1}^N {\sum\limits_{i = 1}^K {p\left( {i|{\boldsymbol t_n},{\boldsymbol \theta} } \right)\ln \left( {p\left( {{\boldsymbol t_n}|i} \right)} \right) - \sum\limits_{i = 1}^K {{\pi _i}\ln {\pi _i}} } }
H(i)≡−N1n=1∑Ni=1∑Kp(i∣tn,θ)ln(p(tn∣i))−i=1∑Kπilnπi
其中, θ {\boldsymbol \theta} θ 包含模型的所有参数,即 θ = { W i , μ i , σ i 2 , π i } i = 1 , . . . , K {\boldsymbol \theta}= \left\{ {\boldsymbol W_i},{\boldsymbol \mu_i},\sigma _i^2, {\pi_i} \right \}_{i = 1, ...,K} θ={Wi,μi,σi2,πi}i=1,...,K
在MPPCA 中,K 和 θ {\boldsymbol \theta} θ 同时优化,用最大期望方法求解参数。