Expectation-Maximization
The EM (Expectation-Maximization) algorithm estimates the parameters of the multivariate probability density function in a form of the Gaussian mixture distribution with a specified number of mixtures.
Consider the set of the feature vectors {x1, x2,..., xN}: N
vectors from d
-dimensional Euclidean space drawn from a Gaussian mixture:
m
is the number of mixtures, p
k is the normal distribution density with the mean
ak
and covariance matrix
Sk
,
πk
is the weight of k-th mixture. Given the number of mixtures
m
and the samples
{xi, i=1..N}
the algorithm finds the
maximum-likelihood estimates (MLE) of the all the mixture parameters, i.e.
ak
,
Sk
and
πk
:
EM algorithm is an iterative procedure. Each iteration of it includes two steps. At the first step (Expectation-step, or E-step), we find a probability
pi,k
(denoted
αi,k
in the formula below) of sample
#i
to belong to mixture
#k
using the currently available mixture parameter estimates:
At the second step (Maximization-step, or M-step) the mixture parameter estimates are refined using the computed probabilities:
Alternatively, the algorithm may start with M-step when initial values for
pi,k
can be provided. Another alternative, when
pi,k
are unknown, is to use a simpler clustering algorithm to pre-cluster the input samples and thus obtain initial
pi,k
. Often (and in ML)
k-means algorithm is used for that purpose.
One of the main that EM algorithm should deal with is the large number of parameters to estimate. The majority of the parameters sits in covariation matrices, which are d×d
elements each (where d
is the feature space dimensionality). However, in many practical problems the covariation matrices are close to diagonal, or even to μk*I
, where I
is identity matrix and μk
is mixture-dependent "scale" parameter. So a robust computation scheme could be to start with the harder constraints on the covariation matrices and then use the estimated parameters as an input for a less constrained optimization problem (often a diagonal covariation matrix is already a good enough approximation).
References:
CvEMParams
Parameters of EM algorithm
struct CvEMParams { CvEMParams() : nclusters(10), cov_mat_type(CvEM::COV_MAT_DIAGONAL), start_step(CvEM::START_AUTO_STEP), probs(0), weights(0), means(0), covs(0) { term_crit=cvTermCriteria( CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, FLT_EPSILON ); } CvEMParams( int _nclusters, int _cov_mat_type=1/*CvEM::COV_MAT_DIAGONAL*/, int _start_step=0/*CvEM::START_AUTO_STEP*/, CvTermCriteria _term_crit=cvTermCriteria(CV_TERMCRIT_ITER+CV_TERMCRIT_EPS, 100, FLT_EPSILON), CvMat* _probs=0, CvMat* _weights=0, CvMat* _means=0, CvMat** _covs=0 ) : nclusters(_nclusters), cov_mat_type(_cov_mat_type), start_step(_start_step), probs(_probs), weights(_weights), means(_means), covs(_covs), term_crit(_term_crit) {} int nclusters; int cov_mat_type; int start_step; const CvMat* probs; const CvMat* weights; const CvMat* means; const CvMat** covs; CvTermCriteria term_crit; };
-
nclusters
- The number of mixtures. Some of EM implementation could determine the optimal number of mixtures within a specified value range, but that is not the case in ML yet. cov_mat_type
-
The type of the mixture covariation matrices; should be one of the following:
CvEM::COV_MAT_GENERIC
- a covariation matrix of each mixture may be arbitrary symmetrical positively defined matrix, so the number of free parameters in each matrix is aboutd
2/2. It is not recommended to use this option, unless there is pretty accurate initial estimation of the parameters and/or a huge number of training samples.
CvEM::COV_MAT_DIAGONAL
- a covariation matrix of each mixture may be arbitrary diagonal matrix with positive diagonal elements, that is, non-diagonal elements are forced to be 0's, so the number of free parameters isd
for each matrix. This is most commonly used option yielding good estimation results.
CvEM::COV_MAT_SPHERICAL
- a covariation matrix of each mixture is a scaled identity matrix,μk*I
, so the only parameter to be estimated isμk
. The option may be used in special cases, when the constraint is relevant, or as a first step in the optimization (e.g. in case when the data is preprocessed with PCA). The results of such preliminary estimation may be passed again to the optimization procedure, this time withcov_mat_type=CvEM::COV_MAT_DIAGONAL
.
start_step
-
The initial step the algorithm starts from; should be one of the following:
CvEM::START_E_STEP
- the algorithm starts with E-step. At least, the initial values of mean vectors,CvEMParams::means
must be passed. Optionally, user may also provide initial values for weights (CvEMParams::weights
) and/or covariation matrices (CvEMParams::covs
).
CvEM::START_M_STEP
- the algorithm starts with M-step. The initial probabilitiespi,k
must be provided.
CvEM::START_AUTO_STEP
- No values are required from user, k-means algorithm is used to estimate initial mixtures parameters.
term_crit
-
Termination criteria of the procedure. EM algorithm stops either after a certain number of iterations (
term_crit.num_iter
), or when the parameters change too little (no more thanterm_crit.epsilon
) from iteration to iteration.
probs
-
Initial probabilities
pi,k
; used (and must be not NULL) only whenstart_step=CvEM::START_M_STEP
.
weights
-
Initial mixture weights
πk
; used (if not NULL) only whenstart_step=CvEM::START_E_STEP
.
covs
-
Initial mixture covariation matrices
Sk
; used (if not NULL) only whenstart_step=CvEM::START_E_STEP
.
means
-
Initial mixture means
ak
; used (and must be not NULL) only whenstart_step=CvEM::START_E_STEP
.
The structure has 2 constructors, the default one represents a rough rule-of-thumb, with another one it is possible to override a variety of parameters, from a single number of mixtures (the only essential problem-dependent parameter), to the initial values for the mixture parameters.
CvEM
EM model
class CV_EXPORTS CvEM : public CvStatModel { public: // Type of covariation matrices enum { COV_MAT_SPHERICAL=0, COV_MAT_DIAGONAL=1, COV_MAT_GENERIC=2 }; // The initial step enum { START_E_STEP=1, START_M_STEP=2, START_AUTO_STEP=0 }; CvEM(); CvEM( const CvMat* samples, const CvMat* sample_idx=0, CvEMParams params=CvEMParams(), CvMat* labels=0 ); virtual ~CvEM(); virtual bool train( const CvMat* samples, const CvMat* sample_idx=0, CvEMParams params=CvEMParams(), CvMat* labels=0 ); virtual float predict( const CvMat* sample, CvMat* probs ) const; virtual void clear(); int get_nclusters() const { return params.nclusters; } const CvMat* get_means() const { return means; } const CvMat** get_covs() const { return covs; } const CvMat* get_weights() const { return weights; } const CvMat* get_probs() const { return probs; } protected: virtual void set_params( const CvEMParams& params, const CvVectors& train_data ); virtual void init_em( const CvVectors& train_data ); virtual double run_em( const CvVectors& train_data ); virtual void init_auto( const CvVectors& samples ); virtual void kmeans( const CvVectors& train_data, int nclusters, CvMat* labels, CvTermCriteria criteria, const CvMat* means ); CvEMParams params; double log_likelihood; CvMat* means; CvMat** covs; CvMat* weights; CvMat* probs; CvMat* log_weight_div_det; CvMat* inv_eigen_values; CvMat** cov_rotate_mats; };
CvEM::train
Estimates Gaussian mixture parameters from the sample set
void CvEM::train( const CvMat* samples, const CvMat* sample_idx=0, CvEMParams params=CvEMParams(), CvMat* labels=0 );
Unlike many of ML models, EM is an unsupervised learning algorithm and it does not take responses (class labels or the function values) on input. Instead, it computes MLE of Gaussian mixture parameters from the input sample set, stores all the parameters inside the stucture: pi,k
in probs
, ak
in means
Sk
in covs[k]
, πk
in weights
and optionally computes the output "class label" for each sample: labelsi=arg maxk(pi,k), i=1..N
(i.e. indices of the most-probable mixture for each sample).
The trained model can be used further for prediction, just like any other classifier. The model trained is similar to the normal bayes classifier.
Example. Clustering random samples of multi-gaussian distribution using EM
#include "ml.h" #include "highgui.h" int main( int argc, char** argv ) { const int N = 4; const int N1 = (int)sqrt((double)N); const CvScalar colors[] = {{{0,0,255}},{{0,255,0}},{{0,255,255}},{{255,255,0}}}; int i, j; int nsamples = 100; CvRNG rng_state = cvRNG(-1); CvMat* samples = cvCreateMat( nsamples, 2, CV_32FC1 ); CvMat* labels = cvCreateMat( nsamples, 1, CV_32SC1 ); IplImage* img = cvCreateImage( cvSize( 500, 500 ), 8, 3 ); float _sample[2]; CvMat sample = cvMat( 1, 2, CV_32FC1, _sample ); CvEM em_model; CvEMParams params; CvMat samples_part; cvReshape( samples, samples, 2, 0 ); for( i = 0; i < N; i++ ) { CvScalar mean, sigma; // form the training samples cvGetRows( samples, &samples_part, i*nsamples/N, (i+1)*nsamples/N ); mean = cvScalar(((i%N1)+1.)*img->width/(N1+1), ((i/N1)+1.)*img->height/(N1+1)); sigma = cvScalar(30,30); cvRandArr( &rng_state, &samples_part, CV_RAND_NORMAL, mean, sigma ); } cvReshape( samples, samples, 1, 0 ); // initialize model's parameters params.covs = NULL; params.means = NULL; params.weights = NULL; params.probs = NULL; params.nclusters = N; params.cov_mat_type = CvEM::COV_MAT_SPHERICAL; params.start_step = CvEM::START_AUTO_STEP; params.term_crit.max_iter = 10; params.term_crit.epsilon = 0.1; params.term_crit.type = CV_TERMCRIT_ITER|CV_TERMCRIT_EPS; // cluster the data em_model.train( samples, 0, params, labels ); #if 0 // the piece of code shows how to repeatedly optimize the model // with less-constrained parameters (COV_MAT_DIAGONAL instead of COV_MAT_SPHERICAL) // when the output of the first stage is used as input for the second. CvEM em_model2; params.cov_mat_type = CvEM::COV_MAT_DIAGONAL; params.start_step = CvEM::START_E_STEP; params.means = em_model.get_means(); params.covs = (const CvMat**)em_model.get_covs(); params.weights = em_model.get_weights(); em_model2.train( samples, 0, params, labels ); // to use em_model2, replace em_model.predict() with em_model2.predict() below #endif // classify every image pixel cvZero( img ); for( i = 0; i < img->height; i++ ) { for( j = 0; j < img->width; j++ ) { CvPoint pt = cvPoint(j, i); sample.data.fl[0] = (float)j; sample.data.fl[1] = (float)i; int response = cvRound(em_model.predict( &sample, NULL )); CvScalar c = colors[response]; cvCircle( img, pt, 1, cvScalar(c.val[0]*0.75,c.val[1]*0.75,c.val[2]*0.75), CV_FILLED ); } } //draw the clustered samples for( i = 0; i < nsamples; i++ ) { CvPoint pt; pt.x = cvRound(samples->data.fl[i*2]); pt.y = cvRound(samples->data.fl[i*2+1]); cvCircle( img, pt, 1, colors[labels->data.i[i]], CV_FILLED ); } cvNamedWindow( "EM-clustering result", 1 ); cvShowImage( "EM-clustering result", img ); cvWaitKey(0); cvReleaseMat( &samples ); cvReleaseMat( &labels ); return 0; }