gmm em详细步骤_gmm附录m的em步骤全导数

本文详细介绍了在应用EM算法到高斯混合模型(GMM)时,M步如何导出更新参数的闭式表达式。讨论了混合权重、均值向量和协方差矩阵的推导过程,强调了在M步后,混合权重是通过归一化每个类别的个体权重总和得到的,均值向量是数据点的期望值,而协方差矩阵是前一次协方差矩阵的加权版本。
摘要由CSDN通过智能技术生成

gmm em详细步骤

This article is an extension of “Gaussian Mixture Models and Expectation-Maximization (A full explanation)”. If you didn’t read it, this article might not be very useful.

本文是“高斯混合模型和期望最大化(全面解释) ”的扩展。 如果您没有阅读过,这篇文章可能不会很有用。

The goal here is to derive the closed-form expressions necessary for the update of the parameters during the Maximization step of the EM algorithm applied to GMMs. This material was written as a separate article in order not to overload the main one.

此处的目标是导出在应用于GMM的EM算法的“最大化”步骤中更新参数所必需的闭式表达式。 为了避免使主要内容超载,该材料被单独撰写。

Ok so recall that during the M-Step, we want to maximize the following lower bound with respect to Θ :

好的,我们回想一下,在M步中,我们希望针对Θ最大化以下下限:

The lower bound is defined to be a concave function easy to optimize. So we are going perform a direct optimization procedure, that is, finding the parameters for which the partial derivatives are null. Also as we already said in the main article, we have to fulfill two constraints. The first one is that the sum of mixture weights must sum up to one and the second that the covariance matrix must be positive semidefinite.

下限定义为易于优化的凹函数。 因此,我们将执行直接优化过程,即查找偏导数为空的参数。 同样,正如我们在主要文章中已经说过的那样,我们必须满足两个约束条件。 第一个是混合权重的总和必须等于一个,第二个是协方差矩阵必须是正半确定的。

Now recall that we initially defined the probability model using a mixture of Gaussian densities like so:

现在回想一下,我们最初是使用混合的高斯密度来定义概率模型的,如下所示:

Image for post

This initial definition was given prior to the introduction of the latent variable t that allows us to define the probability of a specific observation belonging to a specific Gaussian. With the latent variable introduced, we now write:

在引入潜在变量t之前给出了初始定义,该变量允许我们定义属于特定高斯的特定观测值的概率。 引入了潜在变量后,我们现在编写:

Image for post

So all in all, we have to resolve the following optimization problem:

因此,总而言之,我们必须解决以下优化问题:

Image for post

Now the second constraint regarding the covariance matrix that must be positive semidefinite is not really a constraint while performing direct optimization. We are not updating the parameters in an iterative fashion. The covariance matrix will be found using a closed-form expression so there is no need to worry about this second constraint.

现在,关于协方差矩阵的第二个约束必须为正半定,在执行直接优化时实际上不是约束。 我们不是以迭代方式更新参数。 协方差矩阵将使用封闭形式的表达式找到,因此无需担心第二个约束。

The first constraint regarding the mixture weights is handled by the introduction of a Lagrangian multiplier. So the optimization objective becomes:

有关混合权重的第一个约束是通过引入拉格朗日乘数来处理的。 因此,优化目标变为:

Image for post

In order to resolve this problem, we have to found the parameters for which the partial derivatives are null:

为了解决此问题,我们必须找到偏导数为空的参数:

Image for post

First, let’s start with the equation with respect to λ:

首先,让我们从关于λ的方程开始:

Image for post

Ok, so we are back to the definition of the constraint. No surprise here but not very useful.

好的,我们回到约束的定义。 这里不足为奇,但不是很有用。

混合权重推导 (Mixture weights derivation)

Now let’s move on with the equation with respect to α_j. Note that in the following derivations, we use the fact that deriving for α_k with k different than j results in a constant (this basically means that the summation over the K clusters can be ignored):

现在让我们继续关于α_j的方程。 请注意,在以下推导中,我们使用这样的事实:对k不同于j的α_k进行推导会得出一个常数(这基本上意味着可以忽略K个簇的总和):

Image for post

Now in order to get rid of the λ, we use the definition of the constraint regarding the mixtures weights:

现在,为了摆脱λ,我们使用关于混合权重的约束定义:

Image for post

And this gives us the final closed-form expressions for the mixture weights:

这为混合物权重提供了最终的封闭式表达式:

Image for post

This tells us that for each class, after the M-Step, the mixture weight will be the sum of all the individual weights for that class normalized by the sum of all the individual weights for all the classes. And this makes perfect sense, if all the observations put a little weight on a specific class compared to the other classes than this class will have a small overall weight, and vice-versa.

这告诉我们,对于每个类别,在M步之后,混合权重将是该类别的所有单个权重的总和,并通过所有类别的所有单个权重的总和进行归一化。 如果所有的观察结果都比其他类别的权重要小,则这很合理,反之亦然。

均值矢量推导 (Mean vector derivation)

Now let’s move on the derivation of the mean vector μ. This one is a little trickier because, this time, we are computing the partial derivative for vectors and matrices.

现在让我们继续推导平均向量μ。 这有点棘手,因为这一次,我们正在计算向量和矩阵的偏导数。

Image for post

Ok so now we have simplified this expression as much as possible by extracting out every constant part with respect to mu. In order to go through this derivation, we are going to use a specific identity that can be found in Chapter 2 of the essential Matrix cookbook. It states that for any symmetric matrix W, any vector x and any scalar s:

好的,现在我们通过提取关于mu的每个常量部分来尽可能简化该表达式。 为了进行这种推导,我们将使用一个特定的标识,该标识可以在基本Matrix Cookbook的第2章中找到。 它指出对于任何对称矩阵W,任何向量x和任何标量s:

Image for post

The covariance matrix Σ is symmetric so its inverse is also and we can write:

协方差矩阵Σ是对称的,因此它的逆数也是,我们可以这样写:

Image for post

This looks perfectly right. Out of the M-Step, the mean value for a specific Gaussian distribution will be the expected value of the data points with respect to the variational distribution q normalized by the sum of all the weights.

这看起来完全正确。 在M阶中,特定高斯分布的平均值将是数据点相对于通过所有权重之和归一化的变化分布q的期望值。

协方差矩阵推导 (Covariance matrix derivation)

Finally, let’s derive the analytical update expression for the covariance matrix Σ:

最后,让我们导出协方差矩阵Σ的解析更新表达式:

Image for post

Once again, the expression is simplified as possible. Now as you can see we have to derive the inverse of the covariance matrix which can be quite tricky. So instead, we are going to make a change of variable in order to derive with respect to the inverse of the matrix and then inverse the final expression. So we state that:

再一次,表达被尽可能简化。 现在您可以看到,我们必须导出协方差矩阵的逆,这可能非常棘手。 因此,相反,我们将对变量进行更改,以便针对矩阵的逆进行推导,然后对最终表达式进行逆。 因此,我们声明:

Image for post

Also, we make use of the following two identities derived from the matrix cookbook:

另外,我们利用从矩阵食谱中获得的以下两个身份:

Image for post

Finally, we have:

最后,我们有:

Image for post

And this looks pretty dam right! The covariance matrix after the M-Step is a reweighted version of the previous covariance matrix.

这看起来很合适! M步之后的协方差矩阵是先前协方差矩阵的重新加权版本。

End of the appendix

附录末

翻译自: https://medium.com/@biarnes.adrien/em-of-gmm-appendix-m-step-full-derivations-4ae95cdd40c9

gmm em详细步骤

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值