数据学习(10)·最大期望算法·因子分析模型(下)

作者课堂笔记摘录,有问题请联系 humminwang@163.com

1 因子分析(Factor Analysis)

内容参考 http://blog.csdn.net/stdcoutzyx/article/details/37559995
高斯混合模型,当训练数据样本数目小于样本维度的时候,因为协方差矩阵的非奇异性,导致不能得到概率密度函数问题,对于其他模型来说,样本数小于样本维度,也容易引发过拟合的问题。
解决办法:加强模型假设,比如对协方差矩阵的限制。第二个就是降低模型的复杂度,提出一个更少参数模型,如因子分析。
限制协方差矩阵的方法:比如假设协方差矩阵为对角矩阵,更强的假设是协方差矩阵为对角且对角线上的值都相等。当需要估计完整协方差矩阵时,样本数目必须大于样本维度,但是当有对角假设时,样本数目大于1就可以估算出限制的协方差矩阵。

高斯分布矩阵表示:

设有三个变量 x 1 ∈ R r , x 2 ∈ R s , x ∈ R r + s x_1\in R^r,x_2\in R^s,x\in R^{r+s} x1Rr,x2Rs,xRr+s.
x = [ x 1 x 2 ] x=\begin{bmatrix}x_1\\x_2\end{bmatrix} x=[x1x2]
假设 x ∼ N ( μ , Σ ) x\sim \N(\mu,\Sigma) xN(μ,Σ),所以:
μ = [ μ 1 μ 2 ] , Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] \mu=\begin{bmatrix}\mu_1\\\mu_2\end{bmatrix},\quad \Sigma=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}\\\Sigma_{21}&\Sigma_{22}\end{bmatrix} μ=[μ1μ2],Σ=[Σ11Σ21Σ12Σ22]
其中 x 1 x_1 x1的边际分布可以得到:
E [ x 1 ] = μ 1 , C o v ( x 1 ) = E [ ( x 1 − μ 1 ) ( x 1 − μ 1 ) T ] = Σ 11 E[x_1]=\mu_1,\quad Cov(x_1)=E[(x_1-\mu_1)(x_1-\mu_1)^T]=\Sigma_{11} E[x1]=μ1,Cov(x1)=E[(x1μ1)(x1μ1)T]=Σ11
所以对x我们可以得到:
C o v ( x ) = Σ = [ Σ 11 Σ 12 Σ 21 Σ 22 ] = E [ ( x − μ ) ( x − μ ) T ] Cov(x)=\Sigma=\begin{bmatrix}\Sigma_{11}&\Sigma_{12}\\\Sigma_{21}&\Sigma_{22}\end{bmatrix}=E[(x-\mu)(x-\mu)^T] Cov(x)=Σ=[Σ11Σ21Σ12Σ22]=E[(xμ)(xμ)T]
. . . = E [ [ x 1 − μ 1 x 2 − μ 2 ] [ x 1 − μ 1 x 2 − μ 2 ] T ] = E [ ( x 1 − μ 1 ) ( x 1 − μ 1 ) T ( x 1 − μ 1 ) ( x 2 − μ 2 ) T ( x 2 − μ 2 ) ( x 1 − μ 1 ) T ( x 2 − μ 2 ) ( x 2 − μ 2 ) T ] ...=E[\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}\begin{bmatrix}x_1-\mu_1\\x_2-\mu_2\end{bmatrix}^T]=E\begin{bmatrix}(x_1-\mu_1)(x_1-\mu_1)^T&(x_1-\mu_1)(x_2-\mu_2)^T\\(x_2-\mu_2)(x_1-\mu_1)^T&(x_2-\mu_2)(x_2-\mu_2)^T\end{bmatrix} ...=E[[x1μ1x2μ2][x1μ1x2μ2]T]=E[(x1μ1)(x1μ1)T(x2μ2)(x1μ1)T(x1μ1)(x2μ2)T(x2μ2)(x2μ2)T]
在给定 X 2 X_2 X2 x 1 x_1 x1的概率是:
p ( x 1 ∣ x 2 ) = p ( x 1 , x 2 ) p ( x 2 ) = p ( x ) p ( x 2 ) p(x_1|x_2)=\frac{p(x_1,x_2)}{p(x_2)}=\frac{p(x)}{p(x_2)} p(x1x2)=p(x2)p(x1,x2)=p(x2)p(x)
x 1 ∣ x 2 ∼ N ( μ 1 ∣ 2 , Σ 1 ∣ 2 ) x_1|x_2\sim \N(\mu_{1|2},\Sigma_{1|2}) x1x2N(μ12,Σ12)
μ 1 ∣ 2 = μ 1 + Σ 12 Σ 22 − 1 ( x 2 − μ 2 ) \mu_{1|2}=\mu_1+\Sigma_{12}\Sigma_{22}^{-1}(x_2-\mu_2) μ12=μ1+Σ12Σ221(x2μ2)
Σ 1 ∣ 2 = Σ 11 − Σ 12 Σ 22 − 1 Σ 21 \Sigma_{1|2}=\Sigma_{11}-\Sigma_{12}\Sigma_{22}^{-1}\Sigma_{21} Σ12=Σ11Σ12Σ221Σ21


因子分析模型

因子分析模型的定义如下:
假设隐变量 z ∼ N ( 0 , I ) , z ∼ R d , ( d &lt; n ) z\sim N(0,I),z\sim R^d,(d&lt;n) zN(0,I),zRd,(d<n).再假设训练样本 x x x由隐含变量 z z z生成,即 x = μ + Λ z + ε x=\mu+\Lambda z+\varepsilon x=μ+Λz+ε. 其中 ε ∼ N ( 0 , Ψ ) \varepsilon\sim N(0,\Psi) εN(0,Ψ). z z z已知的时候,上式 x x x的产生分布 x ∣ z ∼ N ( μ + Λ z , Ψ ) x|z\sim N(\mu+\Lambda z,\Psi) xzN(μ+Λz,Ψ)
其中 Ψ \Psi Ψ是对角矩阵。
因子分析模型可以从训练数据的生成过程来理解:

  • <1> 在一个低维空间内用均值为0,协方差为单位矩阵的多元高斯分布生成m个隐变量 z ( i ) z^{(i)} z(i) z ( i ) z^{(i)} z(i)是d维向量,m是样本数目。
  • <2> 然后使用变换矩阵 Λ \Lambda Λ z z z映射到n维空间 Λ z \Lambda z Λz。此时因子 z z z的均值为0,映射后的均值仍然是 0 0 0.
  • <3> 再将n维向量 Λ z \Lambda z Λz加上一个均值 μ \mu μ,对应的意义是将变换后的 z z z的均值在n维空间上平移。
  • <4> 由于真实的样例x会有误差,因此在此变换的基础上再加上误差 ε ∼ N ( 0 , Ψ ) \varepsilon \sim N(0,\Psi) εN(0,Ψ).

因子分析模型推导

模型:
z ∼ N ( 0 , I ) z\sim N(0,I) zN(0,I)
ε ∼ N ( 0 , Ψ ) \varepsilon \sim N(0,\Psi) εN(0,Ψ)
x = μ + Λ z + ε x=\mu+\Lambda z+\varepsilon x=μ+Λz+ε
其中 ε , z \varepsilon,z ε,z互相独立。
使用高斯分布矩阵表示法对模型进行分析,方法认为 z , x z,x z,x符合多元高斯分布,即:
[ z x ] ∼ N ( μ z x , Σ ) \begin{bmatrix}z\\x\end{bmatrix}\sim N(\mu_{zx},\Sigma) [zx]N(μzx,Σ)
求解 μ z x , Σ \mu_{zx},\Sigma μzx,Σ.
求解 Σ \Sigma Σ需要计算 Σ z z , Σ z x , Σ x z , Σ x x \Sigma_{zz},\Sigma_{zx},\Sigma_{xz},\Sigma_{xx} Σzz,Σzx,Σxz,Σxx
Σ z z = E [ ( z − E [ z ] ) ( z − E [ z ] ) T ] \Sigma_{zz}=E[(z-E[z])(z-E[z])^T] Σzz=E[(zE[z])(zE[z])T]
有定义可知 Σ z z = C o v ( z ) = I \Sigma_{zz}=Cov(z)=I Σzz=Cov(z)=I, z z z ε \varepsilon ε独立。
Σ z x = Σ x z = E [ ( z − E [ z ] ) ( x − E [ x ] ) T ] = E [ z ( μ + Λ z + ε − μ ) T ] = E [ z z T ] Λ T + E [ z ε T ] = Λ T \Sigma_{zx}=\Sigma_{xz}=E[(z-E[z])(x-E[x])^T]=E[z(\mu+\Lambda z+\varepsilon-\mu)^T]=E[zz^T]\Lambda^T+E[z\varepsilon^T]=\Lambda^T Σzx=Σxz=E[(zE[z])(xE[x])T]=E[z(μ+Λz+εμ)T]=E[zzT]ΛT+E[zεT]=ΛT
Σ x x = E [ ( x − E [ x ] ) ( x − E [ x ] ) T ] = E [ ( Λ z + ε ) ( Λ z + ε ) T ] = E [ Λ z z T Λ T + ε z T Λ T + Λ z ε T + ε ε T ] = Λ E [ z z T ] Λ T + E [ ε ε T ] = Λ Λ T + Ψ \Sigma_{xx}=E[(x-E[x])(x-E[x])^T]=E[(\Lambda z+\varepsilon)(\Lambda z+\varepsilon)^T]=E[\Lambda zz^T\Lambda^T+\varepsilon z^T\Lambda^T+\Lambda z\varepsilon^T+\varepsilon\varepsilon^T]=\Lambda E[zz^T]\Lambda^T+E[\varepsilon\varepsilon^T]=\Lambda\Lambda ^T+\Psi Σxx=E[(xE[x])(xE[x])T]=E[(Λz+ε)(Λz+ε)T]=E[ΛzzTΛT+εzTΛT+ΛzεT+εεT]=ΛE[zzT]ΛT+E[εεT]=ΛΛT+Ψ
得:
[ z x ] ∼ N ( [ 0 μ ] , [ I Λ T Λ Λ Λ T + Ψ ] ) \begin{bmatrix}z\\x\end{bmatrix}\sim N(\begin{bmatrix}0\\\mu\end{bmatrix},\begin{bmatrix}I&amp;\Lambda^T\\\Lambda&amp;\Lambda\Lambda^T+\Psi\end{bmatrix}) [zx]N([0μ],[IΛΛTΛΛT+Ψ])
所以我们得到 x x x的边际分布为:
x ∼ N ( μ , Λ Λ T + Ψ ) x\sim N(\mu,\Lambda\Lambda^T+\Psi) xN(μ,ΛΛT+Ψ)
对于一个训练集, { x ( 1 ) , . . . . , x ( m ) } \{x^{(1)},....,x^{(m)}\} {x(1),....,x(m)},可以得出似然函数,但是用最大化似然函数的方法求参数很复杂,因为含有隐变量,因此我们用EM算法。

EM算法求解因子分析模型

E − S t e p : Q i ( z ( i ) ∣ x ( i ) ; μ , Λ , Ψ ) E-Step:Q_i(z^{(i)}|x^{(i)};\mu,\Lambda,\Psi) EStep:Qi(z(i)x(i);μ,Λ,Ψ)
通过之前的高斯分布矩阵写法,我们可以计算条件分布概率期望和方差。
μ z ( i ) ∣ x ( i ) = Λ T ( Λ Λ T + Ψ ) − 1 ( x ( i ) − μ ) \mu_{z^{(i)}|x^{(i)}}=\Lambda^T(\Lambda\Lambda^T+\Psi)^{-1}(x^{(i)}-\mu) μz(i)x(i)=ΛT(ΛΛT+Ψ)1(x(i)μ)
Σ z ( i ) ∣ x ( i ) = I − Λ T ( Λ Λ T + Ψ ) − 1 Λ \Sigma_{z^{(i)}|x^{(i)}}=I-\Lambda^T(\Lambda\Lambda^T+\Psi)^{-1}\Lambda Σz(i)x(i)=IΛT(ΛΛT+Ψ)1Λ
带入公式 就可得到 Q i ( z ( i ) ∣ x ( i ) ) Q_i(z^{(i)}|x^{(i)}) Qi(z(i)x(i))的概率密度函数,即:
Q i ( z ( i ) ∣ x ( i ) ) = 1 ( 2 π ) n / 2 ∣ Σ z ( i ) ∣ x ( i ) ∣ 1 / 2 e x p ( − 1 2 ( x ( i ) − μ z ( i ) ∣ x ( i ) ) Σ z ( i ) ∣ x ( i ) − 1 ( x ( i ) − μ z ( i ) ∣ x ( i ) ) T ) Q_i(z^{(i)}|x^{(i)})=\frac{1}{(2\pi)^{n/2}|\Sigma_{z^{(i)}|x^{(i)}}|^{1/2}}exp(-\frac{1}{2}(x^{(i)}-\mu_{z^{(i)}|x^{(i)}})\Sigma_{{z^{(i)}}|x^{(i)}}^{-1}(x^{(i)}-\mu_{z^{(i)}|x^{(i)}})^T) Qi(z(i)x(i))=(2π)n/2Σz(i)x(i)1/21exp(21(x(i)μz(i)x(i))Σz(i)x(i)1(x(i)μz(i)x(i))T)
M − S t e p : M-Step: MStep:最大化下列公式来求取参数 μ , Λ , Ψ \mu,\Lambda,\Psi μ,Λ,Ψ.
∑ i = 1 m ∫ Q i ( z ( i ) ) l o g p ( z ( i ) , x ( i ) ; μ , Λ , Ψ ) Q i ( z ( i ) ) d z ( i ) \sum_{i=1}^m \int Q_i(z^{(i)})log\frac{p(z^{(i)},x^{(i)};\mu,\Lambda,\Psi)}{Q_i(z^{(i)})}dz^{(i)} i=1mQi(z(i))logQi(z(i))p(z(i),x(i);μ,Λ,Ψ)dz(i)
= ∑ i = 1 m ∫ Q i ( z ( i ) ) [ l o g p ( x ( i ) ∣ z ( i ) ; μ , Λ , Ψ ) + l o g p ( z ( i ) ) − l o g Q i ( z ( i ) ) ] d z ( i ) =\sum_{i=1}^m \int Q_i(z^{(i)})[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})]dz^{(i)} =i=1mQi(z(i))[logp(x(i)z(i);μ,Λ,Ψ)+logp(z(i))logQi(z(i))]dz(i)
= ∑ i = 1 m E z ( i ) ∼ Q i [ l o g p ( x ( i ) ∣ z ( i ) ; μ , Λ , Ψ ) + l o g p ( z ( i ) ) − l o g Q i ( z ( i ) ) ] =\sum_{i=1}^mE_{z^{(i)}\sim Q_i}[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})] =i=1mEz(i)Qi[logp(x(i)z(i);μ,Λ,Ψ)+logp(z(i))logQi(z(i))]
上面公式中第一步先利用条件概率,将log函数分解开。第二步将积分转变为求z服从Q分布的时候,函数 l o g p ( x ( i ) ∣ z ( i ) ; μ , Λ , Ψ ) + l o g p ( z ( i ) ) − l o g Q i ( z ( i ) ) logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)}) logp(x(i)z(i);μ,Λ,Ψ)+logp(z(i))logQi(z(i))的期望。
Λ \Lambda Λ求解:
▽ Λ ∑ i = 1 m E [ l o g p ( x ( i ) ∣ z ( i ) ; μ , Λ , Ψ ) + l o g p ( z ( i ) ) − l o g Q i ( z ( i ) ) ] \bigtriangledown_\Lambda \sum_{i=1}^mE[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)+logp(z^{(i)})-logQ_i(z^{(i)})] Λi=1mE[logp(x(i)z(i);μ,Λ,Ψ)+logp(z(i))logQi(z(i))]
= ▽ Λ ∑ i = 1 m E [ l o g p ( x ( i ) ∣ z ( i ) ; μ , Λ , Ψ ) ] =\bigtriangledown_\Lambda \sum_{i=1}^mE[logp(x^{(i)}|z^{(i)};\mu,\Lambda,\Psi)] =Λi=1mE[logp(x(i)z(i);μ,Λ,Ψ)]
去除与参数 Λ \Lambda Λ无关的项。
▽ Λ ∑ i = 1 m E [ l o g ( 1 ( 2 π ) n / 2 ∣ Ψ ∣ 1 / 2 e x p ( − 1 2 ( x ( i ) − μ − Λ z ( i ) ) Ψ − 1 ( x ( i ) − μ − Λ z ( i ) ) T ) ] \bigtriangledown_\Lambda\sum_{i=1}^mE[log(\frac{1}{(2\pi)^{n/2}|\Psi|^{1/2}}exp(-\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T)] Λi=1mE[log((2π)n/2Ψ1/21exp(21(x(i)μΛz(i))Ψ1(x(i)μΛz(i))T)]
期望为 μ + Λ z ( i ) \mu+\Lambda z^{(i)} μ+Λz(i),方差为 Ψ \Psi Ψ.
= ▽ Λ ∑ i = 1 m E [ − 1 2 l o g ∣ Ψ ∣ − n 2 l o g ( 2 π ) − 1 2 ( x ( i ) − μ − Λ z ( i ) ) Ψ − 1 ( x ( i ) − μ − Λ z ( i ) ) T ] =\bigtriangledown_\Lambda\sum_{i=1}^mE[-\frac{1}{2}log|\Psi|-\frac{n}{2}log(2\pi)-\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T] =Λi=1mE[21logΨ2nlog(2π)21(x(i)μΛz(i))Ψ1(x(i)μΛz(i))T]
= ▽ Λ ∑ i = 1 m − E [ 1 2 ( x ( i ) − μ − Λ z ( i ) ) Ψ − 1 ( x ( i ) − μ − Λ z ( i ) ) T ] =\bigtriangledown_\Lambda\sum_{i=1}^m-E[\frac{1}{2}(x^{(i)}-\mu-\Lambda z^{(i)})\Psi^{-1}(x^{(i)}-\mu-\Lambda z^{(i)})^T] =Λi=1mE[21(x(i)μΛz(i))Ψ1(x(i)μΛz(i))T]
= ∑ i = 1 m ▽ Λ E [ − t r ( 1 2 z ( i ) T Λ T Ψ − 1 Λ z ( i ) ) + t r ( z ( i ) T Λ T Ψ − 1 ( x ( i ) − μ ) ) ] =\sum_{i=1}^m\bigtriangledown_\Lambda E[-tr(\frac{1}{2}{z^{(i)}}^T\Lambda^T\Psi^{-1}\Lambda z^{(i)})+tr({z^{(i)}}^T\Lambda^T\Psi^{-1}(x^{(i)}-\mu))] =i=1mΛE[tr(21z(i)TΛTΨ1Λz(i))+tr(z(i)TΛTΨ1(x(i)μ))]
利用矩阵迹的性质 t r ( a ) = a tr(a)=a tr(a)=a.
= ∑ i = 1 m ▽ Λ E [ − t r ( 1 2 Λ T Ψ − 1 Λ z ( i ) z ( i ) T ) + t r ( Λ T Ψ − 1 ( x ( i ) − μ ) z ( i ) T ) ] =\sum_{i=1}^m\bigtriangledown_\Lambda E[-tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)+tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)] =i=1mΛE[tr(21ΛTΨ1Λz(i)z(i)T)+tr(ΛTΨ1(x(i)μ)z(i)T)]
利用矩阵迹的性质 t r ( A B ) = B A tr(AB)=BA tr(AB)=BA.
= ∑ i = 1 m ( ▽ Λ E [ − t r ( 1 2 Λ T Ψ − 1 Λ z ( i ) z ( i ) T ) ] + ▽ Λ E [ t r ( Λ T Ψ − 1 ( x ( i ) − μ ) z ( i ) T ) ] ) =\sum_{i=1}^m(\bigtriangledown_\Lambda E[-tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)]+\bigtriangledown_\Lambda E[tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)]) =i=1m(ΛE[tr(21ΛTΨ1Λz(i)z(i)T)]+ΛE[tr(ΛTΨ1(x(i)μ)z(i)T)])
= ∑ i = 1 m ( E [ − ▽ Λ t r ( 1 2 Λ T Ψ − 1 Λ z ( i ) z ( i ) T ) ] + E [ ▽ Λ t r ( Λ T Ψ − 1 ( x ( i ) − μ ) z ( i ) T ) ] ) =\sum_{i=1}^m( E[-\bigtriangledown_\Lambda tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)]+ E[\bigtriangledown_\Lambda tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)]) =i=1m(E[Λtr(21ΛTΨ1Λz(i)z(i)T)]+E[Λtr(ΛTΨ1(x(i)μ)z(i)T)])
求导与期望交换位置。
= ∑ i = 1 m ( E [ − ▽ Λ T t r ( 1 2 Λ T Ψ − 1 Λ z ( i ) z ( i ) T ) T ] + E [ ▽ Λ T t r ( Λ T Ψ − 1 ( x ( i ) − μ ) z ( i ) T ) T ] ) =\sum_{i=1}^m( E[-\bigtriangledown_\Lambda^T tr(\frac{1}{2}\Lambda^T\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T)^T]+ E[\bigtriangledown_\Lambda^T tr(\Lambda^T\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)^T]) =i=1m(E[ΛTtr(21ΛTΨ1Λz(i)z(i)T)T]+E[ΛTtr(ΛTΨ1(x(i)μ)z(i)T)T])
利用矩阵迹的性质 ▽ Λ T f ( A ) = ( ▽ Λ f ( A ) T ) \bigtriangledown_\Lambda^T f(A)=(\bigtriangledown_\Lambda f(A)^T) ΛTf(A)=(Λf(A)T).
= ∑ i = 1 m ( E [ − 1 2 ( 2 z ( i ) z ( i ) T Λ T Ψ − 1 ) T ] + E [ ( ( Ψ − 1 ( x ( i ) − μ ) z ( i ) T ) T ) T ] ) =\sum_{i=1}^m(E[-\frac{1}{2}(2 z^{(i)}{z^{(i)}}^T\Lambda^T\Psi^{-1})^T]+ E[((\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T)^T)^T]) =i=1m(E[21(2z(i)z(i)TΛTΨ1)T]+E[((Ψ1(x(i)μ)z(i)T)T)T])
第一项利用矩阵 ▽ Λ t r ( A B A T C ) = C A B + C T A B T \bigtriangledown_\Lambda tr(ABA^TC)=CAB+C^TAB^T Λtr(ABATC)=CAB+CTABT
第二项利用 ▽ Λ t r ( A B ) = B T \bigtriangledown_\Lambda tr(AB)=B^T Λtr(AB)=BT
= ∑ i = 1 m ( E [ − Ψ − 1 Λ z ( i ) z ( i ) T ] + E [ Ψ − 1 ( x ( i ) − μ ) z ( i ) T ] ) =\sum_{i=1}^m(E[-\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T]+E[\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T]) =i=1m(E[Ψ1Λz(i)z(i)T]+E[Ψ1(x(i)μ)z(i)T])
∑ i = 1 m ( E [ − Ψ − 1 Λ z ( i ) z ( i ) T + Ψ − 1 ( x ( i ) − μ ) z ( i ) T ] ) \sum_{i=1}^m(E[-\Psi^{-1}\Lambda z^{(i)}{z^{(i)}}^T+\Psi^{-1}(x^{(i)}-\mu){z^{(i)}}^T]) i=1m(E[Ψ1Λz(i)z(i)T+Ψ1(x(i)μ)z(i)T])
打开期望。将最后结果设为0,化简。
∑ i = 1 m Λ E z ( i ) ∼ Q i [ z ( i ) z ( i ) T ] = ∑ i = 1 m ( x ( i ) − μ ) E z ( i ) ∼ Q i [ z ( i ) T ] \sum_{i=1}^m\Lambda E_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T]=\sum_{i=1}^m(x^{(i)}-\mu)E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T] i=1mΛEz(i)Qi[z(i)z(i)T]=i=1m(x(i)μ)Ez(i)Qi[z(i)T]
Λ = ( ∑ i = 1 m ( x ( i ) − μ ) E z ( i ) ∼ Q i [ z ( i ) T ] ) ( ∑ i = 1 m E z ( i ) ∼ Q i [ z ( i ) z ( i ) T ] ) − 1 \Lambda=(\sum_{i=1}^m(x^{(i)}-\mu)E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T])(\sum_{i=1}^mE_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T])^{-1} Λ=(i=1m(x(i)μ)Ez(i)Qi[z(i)T])(i=1mEz(i)Qi[z(i)z(i)T])1
E z ( i ) ∼ Q i [ z ( i ) T ] = μ z ( i ) ∣ x ( i ) T E_{z^{(i)}\sim Q_i}[{z^{(i)}}^T]=\mu^T_{z^{(i)}|x^{(i)}} Ez(i)Qi[z(i)T]=μz(i)x(i)T
E z ( i ) ∼ Q i [ z ( i ) z ( i ) T ] = μ z ( i ) ∣ x ( i ) μ z ( i ) ∣ x ( i ) T + Σ z ( i ) ∣ x ( i ) E_{z^{(i)}\sim Q_i}[ z^{(i)}{z^{(i)}}^T]=\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}} Ez(i)Qi[z(i)z(i)T]=μz(i)x(i)μz(i)x(i)T+Σz(i)x(i)
使用性质 C o v ( X ) = E [ X X T ] − E [ X ] E [ X T ] Cov(X)=E[XX^T]-E[X]E[X^T] Cov(X)=E[XXT]E[X]E[XT].
最后 Λ = ( ∑ i = 1 m ( x ( i ) − μ ) μ z ( i ) ∣ x ( i ) T ) ( ∑ i = 1 m μ z ( i ) ∣ x ( i ) μ z ( i ) ∣ x ( i ) T + Σ z ( i ) ∣ x ( i ) ) − 1 \Lambda=(\sum_{i=1}^m(x^{(i)}-\mu)\mu^T_{z^{(i)}|x^{(i)}})(\sum_{i=1}^m\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}})^{-1} Λ=(i=1m(x(i)μ)μz(i)x(i)T)(i=1mμz(i)x(i)μz(i)x(i)T+Σz(i)x(i))1
μ \mu μ Ψ \Psi Ψ,同理求解。
μ = 1 m ∑ i = 1 m x ( i ) \mu=\frac{1}{m}\sum_{i=1}^mx^{(i)} μ=m1i=1mx(i)
Ψ = 1 m ∑ i = 1 m x ( i ) x ( i ) T − x ( i ) μ z ( i ) ∣ x ( i ) T Λ T − Λ μ z ( i ) ∣ x ( i ) x ( i ) T + Λ ( μ z ( i ) ∣ x ( i ) μ z ( i ) ∣ x ( i ) T + Σ z ( i ) ∣ x ( i ) ) Λ T \Psi=\frac{1}{m}\sum_{i=1}^mx^{(i)}{x^{(i)}}^T-x^{(i)}\mu^T_{z^{(i)}|x^{(i)}}\Lambda^T-\Lambda\mu_{z^{(i)}|x^{(i)}}{x^{(i)}}^T+\Lambda(\mu_{z^{(i)}|x^{(i)}}\mu^T_{z^{(i)}|x^{(i)}}+\Sigma_{z^{(i)}|x^{(i)}})\Lambda^T Ψ=m1i=1mx(i)x(i)Tx(i)μz(i)x(i)TΛTΛμz(i)x(i)x(i)T+Λ(μz(i)x(i)μz(i)x(i)T+Σz(i)x(i))ΛT
取对角线上的元素即可。

  • 0
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值