EM algorithm

If major in ML, you will see the EM algorithm which divide into two steps

  1. E step (Exception)
  2. M step (Maximum)

You will be confused only through the step of the algorithm. Let’s explore the inner theory.

Goal

The first thing is to get our aim. eg, we have input and output, we should make a model to fit the data. Our aim is to get the parameter of the model.

we denote θ \theta θ is the parameter, X X X is the input, Y Y Y is the output, Z Z Z is the latent variable.

So we use the condition model to get the P ( Y ∣ θ ) P(Y| \theta) P(Yθ), we will use maximum likelihood to solve it.
L ( θ ) = l o g P ( Y ∣ θ ) L(\theta) = logP(Y| \theta) L(θ)=logP(Yθ)
If there is no latent variable in the model, we can use maximum likelihood directly.

eg: throw a coin for five times, get three up and two down, to calculate the probability of up?

We will use the maximum likelihood to deal. First define the probability of up is θ \theta θ and the output(result) is Y Y Y. So the likelihood function is
L ( θ ) = P ( Y ∣ θ ) L(\theta) = P(Y|\theta) L(θ)=P(Yθ)

We can also get P ( Y ∣ θ ) P(Y|\theta) P(Yθ)
P ( y i ∣ θ ) = θ y i + ( 1 − θ ) 1 − y i P(y_i|\theta) =\theta ^{y_i}+(1-\theta)^{1-y_i} P(yiθ)=θyi+(1θ)1yi

To the end,
L ( θ ) = ∏ i = 1 n ( θ y i + ( 1 − θ ) 1 − y i ) L(\theta) = \prod_{i=1}^{n}(\theta ^{y_i}+(1-\theta)^{1-y_i}) L(θ)=i=1n(θyi+(1θ)1yi)

call by value, and m a x ( L ( θ ) ) max(L(\theta)) max(L(θ)) by making the derivative equal to zero, we get θ = 3 5 \theta = \frac{3}{5} θ=53

BUT in EM algorithm, there is a latent variable in model.

Such as
L ( θ ) = P ( Y , Z ∣ θ ) = P ( Y ∣ Z , θ ) P ( Z ∣ θ ) \begin{aligned} L(\theta)& = P(Y,Z|\theta) \\ & = P(Y|Z,\theta)P(Z|\theta) \end{aligned} L(θ)=P(Y,Zθ)=P(YZ,θ)P(Zθ)

Then
P ( Y ∣ θ ) = ∏ i = 1 n [ π p y i ( 1 − p ) 1 − y i + ( 1 − π ) q y i ( 1 − q ) 1 − y i ] P(Y| \theta) =\prod_{i=1}^{n}[\pi p^{y_i}(1-p)^{1-y_i}+(1-\pi) q^{y_i}(1-q)^{1-y_i}] P(Yθ)=i=1n[πpyi(1p)1yi+(1π)qyi(1q)1yi]

Our aim is to
θ ^ = arg ⁡ max ⁡ θ l o g P ( Y ∣ θ ) \hat \theta = \mathop{\arg \max_{\theta}}logP(Y|\theta) θ^=argθmaxlogP(Yθ)

Proof

We have already get
L ( θ ) = l o g P ( Y , Z ∣ θ ) = l o g P ( Y ∣ Z , θ ) P ( Z ∣ θ ) L(\theta) = logP(Y,Z|\theta) = logP(Y|Z,\theta)P(Z|\theta) L(θ)=logP(Y,Zθ)=logP(YZ,θ)P(Zθ)

In order to m a x ( L ( θ ) ) max(L(\theta)) max(L(θ)), we max it step by step. So we hope through each step, we can always increase the L ( θ ) L(\theta) L(θ)


Jensen Inequity
Let f be a convex function, and let X be a random variable.Then:
E ( f ( X ) ) ≥ f ( E ( X ) ) E(f(X))\ge f(E(X)) E(f(X))f(E(X))


[1] : 《统计学习方法》 李航
[2] : cs229 Andrew Ng http://cs229.stanford.edu/notes/cs229-notes8.pdf

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
刚找到的书,第二版的.. 【原书作者】: Geoffrey J. McLachlan, Thriyambakam Krishnan 【ISBN 】: ISBN-10: 0471201707 / ISBN-13: 978-0471201700 【页数 】:360 【开本 】 : 【出版社】 :Wiley-Interscience 【出版日期】:March 14, 2008 【文件格式】:DJVU(请去网上下载windjview阅读 【摘要或目录】: Review "...should be comprehensible to graduates with statistics as their major subject." (Quarterly of Applied Mathematics, Vol. LIX, No. 3, September 2001) --This text refers to the Hardcover edition. Book Description The EM Algorithm and Extensions remains the only single source to offer a complete and unified treatment of the theory, methodology, and applications of the EM algorithm. The highly applied area of statistics here outlined involves applications in regression, medical imaging, finite mixture analysis, robust statistical modeling, survival analysis, and repeated-measures designs, among other areas. The text includes newly added and updated results on convergence, and new discussion of categorical data, numerical differentiation, and variants of the EM algorithm. It also explores the relationship between the EM algorithm and the Gibbs sampler and Markov Chain Monte Carlo methods. About Authors Geoffrey J. McLachlan, PhD, DSc, is Professor of Statistics in the Department of Mathematics at The University of Queensland, Australia. A Fellow of the American Statistical Association and the Australian Mathematical Society, he has published extensively on his research interests, which include cluster and discriminant analyses, image analysis, machine learning, neural networks, and pattern recognition. Dr. McLachlan is the author or coauthor of Analyzing Microarray Gene Expression Data, Finite Mixture Models, and Discriminant Analysis and Statistical Pattern Recognition, all published by Wiley. Thriyambakam Krishnan, PhD, is Chief Statistical Architect, SYSTAT Software at Cranes Software International Limited in Bangalore, India. Dr. Krishnan has over forty-five years of research, teaching, consulting, and software development experience at the Indian Statistical Institute (ISI). His research interests include biostatistics, image analysis, pattern recognition, psychometry, and the EM algorithm. 目录 Preface to the Second Edition. Preface to the First Edition. List of Examples. 1. General Introduction. 1.1 Introduction. 1.2 Maximum Likelihood Estimation. 1.3 Newton-Type Methods. 1.4 Introductory Examples. 1.5 Formulation of the EM Algorithm. 1.6 EM Algorithm for MAP and MPL Estimation. 1.7 Brief Summary of the Properties of EM Algorithm. 1.8 History of the EM Algorithm. 1.9 Overview of the Book. 1.10 Notations. 2. Examples of the EM Algorithm. 2.1 Introduction. 2.2 Multivariate Data with Missing Values. 2.3 Least Square with the Missing Data. 2.4 Example 2.4: Multinomial with Complex Cell Structure. 2.5 Example 2.5: Analysis of PET and SPECT Data. 2.6 Example 2.6: Multivariate t-Distribution (Known D.F.). 2.7 Finite Normal Mixtures. 2.8 Example 2.9: Grouped and Truncated Data. 2.9 Example 2.10: A Hidden Markov AR(1) Model. 3. Basic Theory of the EM Algorithm. 3.1 Introduction. 3.2 Monotonicity of a Generalized EM Algorithm. 3.3 Monotonicity of a Generalized EM Algorithm. 3.4 Convergence of an EM Sequence to a Stationary Value. 3.5 Convergence of an EM Sequence of Iterates. 3.6 Examples of Nontypical Behavior of an EM (GEM) Sequence. 3.7 Score Statistic. 3.8 Missing Information. 3.9 Rate of Convergence of the EM Algorithm. 4. Standard Errors and Speeding up Convergence. 4.1 Introduction. 4.2 Observed Information Matrix. 4.3 Approximations to Observed Information Matrix: i.i.d. Case. 4.4 Observed Information Matrix for Grouped Data. 4.5 Supplemented EM Algorithm. 4.6 Bookstrap Approach to Standard Error Approximation. 4.7 Baker’s, Louis’, and Oakes’ Methods for Standard Error Computation. 4.8 Acceleration of the EM Algorithm via Aitken’s Method. 4.9 An Aitken Acceleration-Based Stopping Criterion. 4.10 conjugate Gradient Acceleration of EM Algorithm. 4.11 Hybrid Methods for Finding the MLE. 4.12 A GEM Algorithm Based on One Newton-Raphson Algorithm. 4.13 EM gradient Algorithm. 4.14 A Quasi-Newton Acceleration of the EM Algorithm. 4.15 Ikeda Acceleration. 5. Extension of the EM Algorithm. 5.1 Introduction. 5.2 ECM Algorithm. 5.3 Multicycle ECM Algorithm. 5.4 Example 5.2: Normal Mixtures with Equal Correlations. 5.5 Example 5.3: Mixture Models for Survival Data. 5.6 Example 5.4: Contingency Tables with Incomplete Data. 5.7 ECME Algorithm. 5.8 Example 5.5: MLE of t-Distribution with the Unknown D.F. 5.9 Example 5.6: Variance Components. 5.10 Linear Mixed Models. 5.11 Example 5.8: Factor Analysis. 5.12 Efficient Data Augmentation. 5.13 Alternating ECM Algorithm. 5.14 Example 5.9: Mixtures of Factor Analyzers. 5.15 Parameter-Expanded EM (PX-EM) Algorithm. 5.16 EMS Algorithm. 5.17 One-Step-Late Algorithm. 5.18 Variance Estimation for Penalized EM and OSL Algorithms. 5.19 Incremental EM. 5.20 Linear Inverse problems. 6. Monte Carlo Versions of the EM Algorithm. 6.1 Introduction. 6.2 Monte Carlo Techniques. 6.3 Monte Carlo EM. 6.4 Data Augmentation. 6.5 Bayesian EM. 6.6 I.I.D. Monte Carlo Algorithm. 6.7 Markov Chain Monte Carlo Algorithms. 6.8 Gibbs Sampling. 6.9 Examples of MCMC Algorithms. 6.10 Relationship of EM to Gibbs Sampling. 6.11 Data Augmentation and Gibbs Sampling. 6.12 Empirical Bayes and EM. 6.13 Multiple Imputation. 6.14 Missing-Data Mechanism, Ignorability, and EM Algorithm. 7. Some Generalization of the EM Algorithm. 7.1 Introduction. 7.2 Estimating Equations and Estimating Functions. 7.3 Quasi-Score and the Projection-Solution Algorithm. 7.4 Expectation-Solution (ES) Algorithm. 7.5 Other Generalization. 7.6 Variational Bayesian EM Algorithm. 7.7 MM Algorithm. 7.8 Lower Bound Maximization. 7.9 Interval EM Algorithm. 7.10 Competing Methods and Some Comparisons with EM. 7.11 The Delta Algorithm. 7.12 Image Space Reconstruction Algorithm. 8. Further Applications of the EM Algorithm. 8.1 Introduction. 8.2 Hidden Markov Models. 8.3 AIDS Epidemiology. 8.4 Neural Networks. 8.5 Data Mining. 8.6 Bioinformatics. References. Author Index. Subject Index

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值