Key points
- EM: An iterative technique to estimate probability models for data with missing components or information
- By iteratively “completing” the data and reestimating parameters
- PCA: Is actually a generative model for Gaussian data
- Data lie close to a linear manifold, with orthogonal noise
- A lienar autoencoder!
- Factor Analysis: Also a generative model for Gaussian data
- Data lie close to a linear manifold
- Like PCA, but without directional constraints on the noise (not necessarily orthogonal)
Generative models
Learning a generative model
- You are given some set of observed data X = { x } X=\{x\} X={ x}
- You choose a model P ( x ; θ ) P(x ; \theta) P(x;θ) for the distribution of x x x
- θ \theta θ are the parameters of the model
- Estimate the theta such that P ( x ; θ ) P(x ; \theta) P(x;θ) best “fits” the observations X = { x } X=\{x\} X={ x}
- How to define “best fits”?
- Maximum likelihood!
- Assumption: The data you have observed are very typical of the process
EM algorithm
- Tackle missing data and information problem in model estimation
- Let o o o are observed data
log P ( o ) = log ∑ h P ( h , o ) = log ∑ h Q ( h ) P ( h , o ) Q ( h ) \log P(o)=\log \sum_{h} P(h, o)=\log \sum_{h} Q(h) \frac{P(h, o)}{Q(h)} logP(o)=logh∑P(h,o)=logh∑Q(h)Q(h)P(h,o)
- The logarithm is a concave function, therefore
log ∑ h Q ( h ) P ( h , o ) Q ( h ) ≥ ∑ h