1. Maximun likelihood
Suppose there is a sample x1, x2, …, xn of n independent and identically distributed observations, coming from a distribution with an unknown pdf ƒ_0(.). It is however surmised that the function ƒ0 belongs to a certain family of distributions { ƒ(·|θ), θ ∈ Θ }, called the parametric model, so that ƒ_0 = ƒ(·|θ_0). The value θ0 is unknown and is referred to as the "true value" of the parameter. It is desirable to find an estimatorscriptstylehattheta which would be as close to the true value θ0 as possible. Both the observed variables xi and the parameter θ can be vectors.
To use the method of maximum likelihood, one first specifies the joint density function for all observations. For an iid sample, this joint density function is
[f(x_1,x_2,ldots,x_n;|;theta) = f(x_1|theta)cdot f(x_2|theta)cdots f(x_n|theta)].
最大似然的两个弱点:
1. 假设每个样本iid
2.取log-likelihood变成sum之后,会发现如果MLE的值会随不匹配程度成指数级数度降低。这样对outlier和noise比较sensitive(因此不适用于短文本,因为它要求所有词都很好匹配)
2. conditional MLE
[f(Y|X,theta) = f(y_1|x_1,theta)cdot f(y_2|x_2,theta)cdots f(y_n|x_3,theta)].
只需假设Y|X独立,而无需假设X也独立。