第八讲:期望最大化算法(EM algorithm)

在前面的若干讲义中,我们已经讲过了期望最大化算法(EM algorithm),使用场景是对一个高斯混合模型进行拟合(fitting a mixture of Gaussians)。在本章里面,我们要给出期望最大化算法(EM algorithm)的更广泛应用,并且演示如何应用于一个大系列的具有潜在变量(latent variables)的估计问题(estimation problems)。我们的讨论从 Jensen 不等式(Jensen’s inequality)开始,这是一个非常有用的结论。

1. Jensen 不等式(Jensen’s inequality)

f f f 为一个函数,其定义域(domain)为整个实数域(set of real numbers)。这里要回忆一下,如果函数 f f f 的二阶导数 f ′ ′ ( x ) ≥ 0 f''(x) ≥ 0 f(x)0 (其中的 x ∈ R x ∈ R xR),则函数 f f f 为一个凸函数(convex function)。如果输入的为向量变量,那么这个函数就泛化了,这时候该函数的海森矩阵(hessian) H H H 就是一个半正定矩阵(positive semi-definite H ≥ 0)。如果对于所有的 x x x,都有二阶导数 f ′ ′ ( x ) > 0 f''(x) > 0 f(x)>0,那么我们称这个函数 f f f 是严格凸函数(对应向量值作为变量的情况,对应的条件就是海森矩阵必须为正定,写作 H > 0)。这样就可以用如下方式来表述 Jensen 不等式:

定理(Theorem):设 f f f 是一个凸函数,且设 X X X 是一个随机变量(random variable)。然后则有:

E [ f ( X ) ] ≥ f ( E X ) \begin{aligned} E[f(X)] ≥ f(EX)\end{aligned} E[f(X)]f(EX)

(译者注:函数的期望等于期望的函数值)

此外,如果函数 f f f 是严格凸函数,那么 E [ f ( X ) ] = f ( E X ) E[f(X)] = f(EX) E[f(X)]=f(EX), 当且仅当 X = E [ X ] X = E[X] X=E[X] 的概率(probability)为 1的时候成立(例如 X X X 是一个常数。)

还记得我们之前的约定(convention)吧,写期望(expectations)的时候可以偶尔去掉括号(parentheses),所以在上面的定理中, f ( E X ) = f ( E [ X ] ) f(EX) = f(E[X]) f(EX)=f(E[X])
为了容易理解这个定理,可以参考下面的图:

上图中, f f f 是一个凸函数,在图中用实线表示。另外 X X X 是一个随机变量,有 0.5 的概率(chance)取值为 a,另外有 0.5 的概率取值为 b(在图中 x 轴上标出了)。这样, X X X 的期望值就在图中所示的 a 和 b 的中点位置。

图中在 y y y 轴上也标出了 f ( a ) f(a) f(a), f ( b ) f(b) f(b) f ( E [ X ] ) f(E[X]) f(E[X])。接下来函数的期望值 E [ f ( X ) ] E[f(X)] E[f(X)] y y y 轴上就处于 f ( a ) f(a) f(a) f ( b ) f(b) f(b) 之间的中点的位置。如图中所示,在这个例子中由于 f f f 是凸函数,很明显 E [ f ( X ) ] ≥ f ( E X ) E[f(X)] ≥ f(EX) E[f(X)]f(EX)

顺便说一下,很多人都记不住不等式的方向,所以就不妨用画图来记住,这是很好的方法,还可以通过图像很快来找到答案。

回想一下,当且仅当 – f –f f 是严格凸函数([strictly] convex)的时候, f f f 是严格凹函数([strictly] concave)(例如,二阶导数 f ′ ′ ( x ) ≤ 0 f''(x) ≤ 0 f(x)0 或者其海森矩阵 H ≤ 0 H ≤ 0 H0)。Jensen 不等式也适用于凹函数(concave)f,但不等式的方向要反过来,也就是对于凹函数, E [ f ( X ) ] ≤ f ( E X ) E[f(X)] ≤ f(EX) E[f(X)]f(EX)

2. 期望最大化算法(EM algorithm)

假如我们有一个估计问题(estimation problem),其中由训练样本集 { x ( 1 ) , . . . , x ( m ) } \{x^{(1)}, ..., x^{(m)}\} { x(1),...,x(m)} 包含了 m 个独立样本。我们用模型 p ( x , z ) p(x, z) p(x,z) 对数据进行建模,拟合其参数(parameters),其中的似然函数(likelihood)如下所示:

ℓ ( θ ) = ∑ i = 1 m log ⁡ p ( x ; θ ) = ∑ i = 1 m log ⁡ ∑ z ( i ) = 1 k p ( x , z ; θ ) . \begin{aligned} \ell(\theta)&=\sum^m_{i=1}\log p(x;\theta)\\&=\sum^m_{i=1}\log\sum^k_{z^{(i)}=1}p(x,z;\theta).\end{aligned} (θ)=i=1mlogp(x;θ)=i=1mlogz(i)=1kp(x,z;θ).

然而,确切地找到对参数 θ θ θ 的最大似然估计(maximum likelihood estimates)可能会很难。此处的 z ( i ) z^{(i)} z(i) 是一个潜在的随机变量(latent random variables);通常情况下,如果 z ( i ) z^{(i)} z(i) 事先得到了,然后再进行最大似然估计,就容易多了。

这种环境下,使用期望最大化算法(EM algorithm)就能很有效地实现最大似然估计(maximum likelihood estimation)。明确地对似然函数 ℓ ( θ ) \ell(\theta) (θ)进行最大化可能是很困难的,所以我们的策略就是使用一种替代,在 E − s t e p E-step Estep 中构建一个 ℓ \ell 的下限(lower-bound),然后在 M − s t e p M-step Mstep 中对这个下限进行优化。

对于每个 i i i,设 Q i Q_i Qi 是某个对 z z z 的分布, Σ z Q i ( z ) = 1 , Q i ( z ) ≥ 0 \Sigma_z Q_i(z) = 1, Q_i(z) ≥ 0 ΣzQi(z)=1,Qi(z)0。则有下列各式1

(1) ∑ i log ⁡ p ( x ( i ) ; θ ) = ∑ i log ⁡ ∑ z ( i ) p ( x ( i ) , z ( i ) ; θ ) = ∑ i log ⁡ ∑ z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) ≥ ∑ i ∑ z ( i ) Q i ( z ( i ) ) p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) \begin{aligned} \sum_i\log p(x^{(i)};\theta)&=\sum_{i}\log\sum_{z^{(i)}}p(x^{(i)},z^{(i)};\theta)& \tag{1} \\&=\sum_{i}\log\sum_{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} \\&\geq \sum_i\sum_{z^{(i)}}Q_i(z^{(i)})\frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} \end{aligned} ilogp(x(i);θ)=ilogz(i)p(x(i),z(i);θ)=ilogz(i)Qi(z(i))Qi(z(i))p(x(i),z(i);θ)iz(i)Qi(z(i))Qi(z(i))p(x(i),z(i);θ)(1)

上面推导(derivation)的最后一步使用了 Jensen 不等式(Jensen’s inequality)。其中的 f ( x ) = log ⁡ x f(x) = \log x f(x)=logx 是一个凹函数(concave function),因为其二阶导数 f ′ ′ ( x ) = − 1 / x 2 &lt; 0 f&#x27;&#x27;(x) = −1/x^2 &lt; 0 f(x)=1/x2<0 在整个定义域(domain) x ∈ R + x ∈ R^+ xR+ 上都成立。

此外,上式的求和中的单项:

∑ z ( i ) Q i ( z ( i ) ) [ p ( x ( i ) , z ( i ) ; θ ) Q i ( z ( i ) ) ] \begin{aligned} \sum_{z^{(i)}}Q_i(z^{(i)}) \left[ \frac{p(x^{(i)},z^{(i)};\theta)}{Q_i(z^{(i)})} \right] \end{aligned} z(i)Qi(z(i))[Qi(z(i))p(x(i),z(i);θ)]

是变量(quantity) [ p ( x ( i ) ,

  • 4
    点赞
  • 2
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
刚找到的书,第二版的.. 【原书作者】: Geoffrey J. McLachlan, Thriyambakam Krishnan 【ISBN 】: ISBN-10: 0471201707 / ISBN-13: 978-0471201700 【页数 】:360 【开本 】 : 【出版社】 :Wiley-Interscience 【出版日期】:March 14, 2008 【文件格式】:DJVU(请去网上下载windjview阅读 【摘要或目录】: Review "...should be comprehensible to graduates with statistics as their major subject." (Quarterly of Applied Mathematics, Vol. LIX, No. 3, September 2001) --This text refers to the Hardcover edition. Book Description The EM Algorithm and Extensions remains the only single source to offer a complete and unified treatment of the theory, methodology, and applications of the EM algorithm. The highly applied area of statistics here outlined involves applications in regression, medical imaging, finite mixture analysis, robust statistical modeling, survival analysis, and repeated-measures designs, among other areas. The text includes newly added and updated results on convergence, and new discussion of categorical data, numerical differentiation, and variants of the EM algorithm. It also explores the relationship between the EM algorithm and the Gibbs sampler and Markov Chain Monte Carlo methods. About Authors Geoffrey J. McLachlan, PhD, DSc, is Professor of Statistics in the Department of Mathematics at The University of Queensland, Australia. A Fellow of the American Statistical Association and the Australian Mathematical Society, he has published extensively on his research interests, which include cluster and discriminant analyses, image analysis, machine learning, neural networks, and pattern recognition. Dr. McLachlan is the author or coauthor of Analyzing Microarray Gene Expression Data, Finite Mixture Models, and Discriminant Analysis and Statistical Pattern Recognition, all published by Wiley. Thriyambakam Krishnan, PhD, is Chief Statistical Architect, SYSTAT Software at Cranes Software International Limited in Bangalore, India. Dr. Krishnan has over forty-five years of research, teaching, consulting, and software development experience at the Indian Statistical Institute (ISI). His research interests include biostatistics, image analysis, pattern recognition, psychometry, and the EM algorithm. 目录 Preface to the Second Edition. Preface to the First Edition. List of Examples. 1. General Introduction. 1.1 Introduction. 1.2 Maximum Likelihood Estimation. 1.3 Newton-Type Methods. 1.4 Introductory Examples. 1.5 Formulation of the EM Algorithm. 1.6 EM Algorithm for MAP and MPL Estimation. 1.7 Brief Summary of the Properties of EM Algorithm. 1.8 History of the EM Algorithm. 1.9 Overview of the Book. 1.10 Notations. 2. Examples of the EM Algorithm. 2.1 Introduction. 2.2 Multivariate Data with Missing Values. 2.3 Least Square with the Missing Data. 2.4 Example 2.4: Multinomial with Complex Cell Structure. 2.5 Example 2.5: Analysis of PET and SPECT Data. 2.6 Example 2.6: Multivariate t-Distribution (Known D.F.). 2.7 Finite Normal Mixtures. 2.8 Example 2.9: Grouped and Truncated Data. 2.9 Example 2.10: A Hidden Markov AR(1) Model. 3. Basic Theory of the EM Algorithm. 3.1 Introduction. 3.2 Monotonicity of a Generalized EM Algorithm. 3.3 Monotonicity of a Generalized EM Algorithm. 3.4 Convergence of an EM Sequence to a Stationary Value. 3.5 Convergence of an EM Sequence of Iterates. 3.6 Examples of Nontypical Behavior of an EM (GEM) Sequence. 3.7 Score Statistic. 3.8 Missing Information. 3.9 Rate of Convergence of the EM Algorithm. 4. Standard Errors and Speeding up Convergence. 4.1 Introduction. 4.2 Observed Information Matrix. 4.3 Approximations to Observed Information Matrix: i.i.d. Case. 4.4 Observed Information Matrix for Grouped Data. 4.5 Supplemented EM Algorithm. 4.6 Bookstrap Approach to Standard Error Approximation. 4.7 Baker’s, Louis’, and Oakes’ Methods for Standard Error Computation. 4.8 Acceleration of the EM Algorithm via Aitken’s Method. 4.9 An Aitken Acceleration-Based Stopping Criterion. 4.10 conjugate Gradient Acceleration of EM Algorithm. 4.11 Hybrid Methods for Finding the MLE. 4.12 A GEM Algorithm Based on One Newton-Raphson Algorithm. 4.13 EM gradient Algorithm. 4.14 A Quasi-Newton Acceleration of the EM Algorithm. 4.15 Ikeda Acceleration. 5. Extension of the EM Algorithm. 5.1 Introduction. 5.2 ECM Algorithm. 5.3 Multicycle ECM Algorithm. 5.4 Example 5.2: Normal Mixtures with Equal Correlations. 5.5 Example 5.3: Mixture Models for Survival Data. 5.6 Example 5.4: Contingency Tables with Incomplete Data. 5.7 ECME Algorithm. 5.8 Example 5.5: MLE of t-Distribution with the Unknown D.F. 5.9 Example 5.6: Variance Components. 5.10 Linear Mixed Models. 5.11 Example 5.8: Factor Analysis. 5.12 Efficient Data Augmentation. 5.13 Alternating ECM Algorithm. 5.14 Example 5.9: Mixtures of Factor Analyzers. 5.15 Parameter-Expanded EM (PX-EM) Algorithm. 5.16 EMS Algorithm. 5.17 One-Step-Late Algorithm. 5.18 Variance Estimation for Penalized EM and OSL Algorithms. 5.19 Incremental EM. 5.20 Linear Inverse problems. 6. Monte Carlo Versions of the EM Algorithm. 6.1 Introduction. 6.2 Monte Carlo Techniques. 6.3 Monte Carlo EM. 6.4 Data Augmentation. 6.5 Bayesian EM. 6.6 I.I.D. Monte Carlo Algorithm. 6.7 Markov Chain Monte Carlo Algorithms. 6.8 Gibbs Sampling. 6.9 Examples of MCMC Algorithms. 6.10 Relationship of EM to Gibbs Sampling. 6.11 Data Augmentation and Gibbs Sampling. 6.12 Empirical Bayes and EM. 6.13 Multiple Imputation. 6.14 Missing-Data Mechanism, Ignorability, and EM Algorithm. 7. Some Generalization of the EM Algorithm. 7.1 Introduction. 7.2 Estimating Equations and Estimating Functions. 7.3 Quasi-Score and the Projection-Solution Algorithm. 7.4 Expectation-Solution (ES) Algorithm. 7.5 Other Generalization. 7.6 Variational Bayesian EM Algorithm. 7.7 MM Algorithm. 7.8 Lower Bound Maximization. 7.9 Interval EM Algorithm. 7.10 Competing Methods and Some Comparisons with EM. 7.11 The Delta Algorithm. 7.12 Image Space Reconstruction Algorithm. 8. Further Applications of the EM Algorithm. 8.1 Introduction. 8.2 Hidden Markov Models. 8.3 AIDS Epidemiology. 8.4 Neural Networks. 8.5 Data Mining. 8.6 Bioinformatics. References. Author Index. Subject Index

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值