For the well-known HMM, it is a generative model. In different scenarios and and assumption, there are many methods to infer the parameter.
Case 1:
When we know the prior probability P(y), transition probability P(y(i) | y(i-1)), emission probability P(x|y),
it is easy to use forward/backward method to infer the joint probability, P(x1, x2, ... xn, y1, y2,...yn).
Moreover, the first-order HMM model could be infered by Viterbi algorithm more efficiently.
The same as 2nd-order HMM and 3rd-order, etc..
Case 2: X is discreet random variable
When we could just observer the x state but not the hidden state, y, EM algorithm could be applied.
some parameters should be infered, including p(y), p(y(i) | y(i-1) ), as well as p(x|y).
Then Viterbi algorithm could be used.
Unsupervised Learning 101: the EM for the HMM, Karl Stratos
Case 3: X is continuous random variable
When we could just observer the x state but not the hidden state, y, but we know the assumption that the emission function p(x|y) is mixture Gaussian distribution. (Because the p(x|y) is continuous value, so we use mixture Gaussian distribution to estimate it)
It could also use EM to infer.
And this kind of model is called GMM-HMM.
Assumption: p(x|y) is Gaussian distribution
Hidden Markov Models and Gaussian Mixture Models, Steve Renals and Peter Bell
Why use GMM to estimate the p(x|y)?
some notes:
1. Differences between Baum-Welch algorithm and Viterbi algorithm.
Viterbi: avoid repeat summation
2.