这个是我目前看到的比较完整的高斯,我还不会用。注释的可能有误或者不够专业。先放在这里。里面可能涉及到其他audiotool工具里面的函数,所以没办法单独使用。需要去下载完整的库
voicebox--wav文件
http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.zip
function [m,v,w,g,f,pp,gg]=gaussmix_note(x,c,l,m0,v0,w0,wx)
%GAUSSMIX fits a gaussian mixture pdf to a set of data observations [m,v,w,g,f]=(x,c,l,m0,v0,w0,wx)
%CHINESE note by Xu__Jiayu
%对于观测数据集的适应高斯混合分布密度函数
%使用样例Usage:
% (1) [m,v,w]=gaussmix(x,[],[],k);创建k个高斯混合,diag对角协方差 % create GMM with k mixtures and diagonal covariances
% (2) [m,v,w]=gaussmix(x,[],[],k,'v');创建k个高斯混合 ,方阵协方差(行列相等) % create GMM with k mixtures and full covariances
%
% Inputs: n data values, k mixtures, p parameters, l loops
%%
% X(n,p)待观测数据,n行,每行有p的属性 Input data vectors, one per row.
%%
% c(1) 归一化的最小方差([]默认为1/n^2) Minimum variance of normalized data (Use [] to take default value of 1/n^2)
%%
% L 整数部分为迭代次数上限,小数部分为近似然估计阈值(类似精确度),默认100.0001 The integer portion of l gives a maximum loop count. The fractional portion gives
% an optional stopping threshold. Iteration will cease if the increase in
% log likelihood density per data point is less than this value. Thus l=10.001 will
% stop after 10 iterations or when the increase in log likelihood falls below
% 0.001.
% As a special case, if L=0, then the first three outputs are omitted.
% Use [] to take default value of 100.0001
%%
% M0 使用样例中的k,创建的k个高斯混合(或者传入的是已经分类好的数据的质心矩阵)
% Number of mixtures required (or initial mixture means - see below)
%%
% V0 模式设置 Initialization mode:
% ******************** 'm'|'f'|'p'三选一,默认'f'
% 'm' MO传入的是已经分类好的数据的质心M0 contains the initial centres
% 'f'[默认]k个高斯混合质心从数据中抽取k个 Initialize with K randomly selected data points [default]
% 'p'随机分区抽取质心 Initialize with centroids and variances of random partitions
% ******************** 'k'|'h'二选一,默认'h'
% 'k'利用kmeans算法聚类分类 k-means algorithm ('kf' and 'kp' determine initialization)
% 'h'[默认]调合均值算法聚类分类 k-harmonic means algorithm ('hf' and 'hp' determine initialization) [default]
% ********************'s'对数据不进行标准差=(sqrt(方差))缩放 do not scale data during initialization to have equal variances
% ********************'v'方阵协方差(行列相等),当[]或没有设置为diag对角协方差full covariance matrices,
% ********************v0不为字符串,为方差矩阵
% Mode 'hf' [the default] generally gives the best results but 'f' is faster and often OK
%%
% W0(k,1) 初始化k个混合高斯的权重,权重和需要归一化 Initial mixture weights, one per mixture. The weights should sum to unity.
%%
% WX(n,1) 观测数据的权重 Data point weights
%%
% Alternatively, initial values for M0, V0 and W0 can be given explicitly:
%
% M0(k,p) k个混合高斯的质心,每行代表一个 Initial mixture means, one row per mixture.
% V0(k,p) k个混合高斯的方差(对角方差),每行代表一个 Initial mixture variances, one row per mixture.
% or V0(p,p,k) k个混合高斯的的方差(方阵方差),每个矩阵代表一个 one full-covariance matrix per mixture
% W0(k,1) 初始化k个混合高斯的权重,权重和需要归一化 Initial mixture weights, one per mixture. The weights should sum to unity.
% WX(n,1) 观测数据的权重 Data point weights
%%
% Outputs: (Note that M, V and W are omitted if L==0)
%
% M(k,p) k个混合高斯的均值,每行一个 Mixture means, one row per mixture. (omitted if L==0)
% V(k,p) k个混合高斯的方差,每行一个 Mixture variances, one row per mixture. (omitted if L==0)
% or V(p,p,k)k个混合高斯的方差,每个矩阵一个 if full covariance matrices in use (i.e. either 'v' option or V0(p,p,k) specified)
% W(k,1) k个混合高斯的权重,权重加和需要归一 Mixture weights, one per mixture. The weights will sum to unity. (omitted if L==0)
% G 输入数据点的平均对数概率,拟合过程中归一化标准化确实部分。exp(g)Average log probability of the input data points.
% F 表明拟合情况好坏,值越高效果越好(线性判断LDA也叫Linear Discriminant) Fisher's Discriminant measures how well the data divides into classes.
% It is the ratio of the between-mixture variance to the average mixture variance: a
% high value means the classes (mixtures) are well separated.
% PP(n,1) 每个观测点的对数概率Log probability of each data point
% GG(l+1,1) 从一开始到迭代结束的平均对数概率Average log probabilities at the beginning of each iteration and at the end
%%
% 这个拟合程序使用了很多初始化方法去创建初始的高斯的质心。并且使用EM(估算极大化)算法来改进高斯。
% 因为EM算法是一成不变的,初始化程序使用了随机数,当你对同一个数据使用了很多次将不会得到确切的答案
% The fitting procedure uses one of several initialization methods to create an initial guess
% for the mixture centres and then uses the EM (expectation-maximization) algorithm to refine
% the guess. Although the EM algorithm is deterministic, the initialization procedures use
% random numbers and so the routine will not give identical answers if you call it multiple
% times with the same input data.
% Bugs/Suggestions
% (1) Allow processing in chunks by outputting/reinputting an array of sufficient statistics
% (2) Other initialization options:
% 'l' LBG algorithm
% 'm' Move-means (dog-rabbit) algorithm
% (3) Allow updating of weights-only, not means/variances
% Copyright (C) Mike Brookes 2000-2009
% Version: $Id: gaussmix.m 7784 2016-04-15 11:09:50Z dmb $
%
% VOICEBOX is a MATLAB toolbox for speech processing.
% Home page: http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html
%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% This program is free software; you can redistribute it and/or modify
% it under the terms of the GNU General Public License as published by
% the Free Software Foundation; either version 2 of the License, or
% (at your option) any later version.
%
% This program is distributed in the hope that it will be useful,
% but WITHOUT ANY WARRANTY; without even the implied warranty of
% MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
% GNU General Public License for more details.
%
% You can obtain a copy of the GNU General Public License from
% http://www.gnu.org/copyleft/gpl.html or by writing to
% Free Software Foundation, Inc.,675 Mass Ave, Cambridge, MA 02139, USA.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[n,p]=size(x);%获取样本点/观测点 有n个,每个有p个属性
wn=ones(n,1);%初始化每个观测点的权重均为1,列数组,n个1
mx0=sum(x,1)/n;%初始化观测点p个属性的均值,行向量,p个均值 % calculate mean and variance of input data in each dimension
vx0=sum(x.^2,1)/n-mx0.^2;%初始化观测点p个属性的方差,行向量,p个方差
sx0=sqrt(vx0);%初始化观测点p个属性的标准差,行向量,p个标准差
sx0(sx0==0)=1; %防止除以0值 % do not divide by zero when scaling
scaled=0; % data is not yet scaled
memsize=voicebox('memsize'); % set memory size to use
%%
if isempty(c)%归一化最小方差设置
c=1/n^2;
else
c=c(1); % just to prevent legacy code failing
end
fulliv=0; % initial variance is not full
%%
if isempty(l)%迭代次数或精度设置
l=100+1e-4; % max loop count + stopping threshold
end
%%
%没有聚类分类v0且没有聚类质心m0且没有聚类权重w0 或者 v0是聚类分类而不是聚类方差
if nargin<5 || isempty(v0) || ischar(v0) % no initial values specified for m0, v0, w0
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% No initialvalues given, so we must use k-means or equivalent
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
if nargin<6
if nargin<5 || isempty(v0)%
v0='hf'; %默认采用k调和平均和对待观测数据聚类分析算法,默认数据点随机取k个点作为聚类质心 % default initialization mode: hf
end
wx=wn; %初始化观测点权重默认ones(n,1) % no data point weights
else
wx=w0(:); %初始化为输入参数中的观测点权重 % data point weights
end
%%%%%%
if any(v0=='m')
k=size(m0,1);%m0传入的值质心,计算需要几个高斯混合
else
k=m0;%传入的是数值,取需要随机取的高斯个数
end
%%%%%%
fv=any(v0=='v');%如果k个高斯方差为方阵形式,那么fv为true==1 % full covariance matrices requested
%%
%begin待观测数据点比k个高斯的数目少,不需要迭代的情况
if n<=k %观测数据点数小于高斯个数,为每个观测数据点设置一个高斯混合 % each data point can have its own mixture
xs=(x-mx0(wn,:))./sx0(wn,:); %取得每个观测点的每个属性在对应属性的偏离比。目前wn都取值为1(这里作用把p个属性均值、标准差扩充和观测点矩阵一样的矩阵),mx0各个属性均值(一行),sx0各个属性标准差(一行) % scale the data
m=xs(mod((1:k)-1,n)+1,:); %偏离比把n个数据多次匹配k个高斯 % just include all points several times
v=zeros(k,p); %k个高斯方差先清0,后面将重新设置 % will be set to floor later
w=zeros(k,1);%k个高斯权重先清零
w(1:n)=1/n;%k个高斯把具有一个点的权重设置为1/n
if l>0
l=0.1; %当前k个高斯要么有一个或没有观测数据,没有迭代的必要 % no point in iterating
end
%end待观测数据不需要迭代
else
%begin待观测数据比k个高斯数目多,必定需要迭代 % more points than mixtures
%begin是否进行缩比例
if any(v0=='s')
xs=x; %待观测数据不需要缩比例 % do not scale data during initialization
else
xs=(x-mx0(wn,:))./sx0(wn,:); %待观测数据需要缩比例 else scale now
if any(v0=='m')
m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:); %观测数据的均值同样需要缩比例 % scale specified means as well
end
end
%end是否缩比例
w=repmat(1/k,k,1); %初始化k个高斯权重一样均为1/k.为列向数组(k个1/k) Kx1 % all mixtures equally likely
%begin聚类分类--获得质心m为kxp】j为每个观测点的聚类类别nx1,置位1到k】e为每个。
if any(v0=='k') %有参数输入'k'--k均值算法 % k-means initialization
if any(v0=='m') %k均值算法+参数传入的质心
[m,e,j]=v_kmeans(xs,k,m);
elseif any(v0=='p')%k均值算法+随机分区抽取质心
[m,e,j]=v_kmeans(xs,k,'p');
else
[m,e,j]=v_kmeans(xs,k,'f');%k均值算法+随机抽取观测点为质心
end
elseif any(v0=='h') %有参数输入选择'h'--k调和均值算法 % k-harmonic means initialization
if any(v0=='m') %k调和均值算法+参数传入质心
[m,e,j]=kmeanhar(xs,k,[],4,m);
else
if any(v0=='p')%k调和均值算法+随机分区抽取质心
[m,e,j]=kmeanhar(xs,k,[],4,'p');
else
[m,e,j]=kmeanhar(xs,k,[],4,'f');%k调和均值算法+待观测点随机抽取质心
end
end
elseif any(v0=='p') %聚类分类没有输入参数,选择抽取质心‘p’随机分区抽取质心 , % Initialize using a random partition
j=ceil(rand(n,1)*k); %rand(n,1)抽取n个随机数在0到1的数。ceil只入不舍取整数 % allocate to random clusters
j(rnsubset(k,n))=1:k; %调整随机取得,是的至少每个聚类有一个数据点,rnsubset抽取k个1到n之间的正整数 % but force at least one point per cluster
for i=1:k
m(i,:)=mean(xs(j==i,:),1);%求每个聚类的均值--质心
end
else %聚类分类没有输入参数,有传入聚类质心
if any(v0=='m')
m=m0;%参数传入的质心 % use specified centres
else
m=xs(rnsubset(k,n),:); %随机抽取数据点为质心 % Forgy initialization: sample k centres without replacement [default]
end
[e,j]=v_kmeans(xs,k,m,0); %采用k均值聚类分类% find out the cluster allocation
end
%end聚类分类获得质心m
if any(v0=='s')
xs=(x-mx0(wn,:))./sx0(wn,:); %聚类分类以后,没有缩比例的进行缩比例 % scale data now if not done previously
end
v=zeros(k,p);%对角方差清零 % diagonal covariances
w=zeros(k,1);%权重清零(一列数组)
for i=1:k%k个高斯(聚类)方差统计,权重统计
ni=sum(j==i); %统计某个高斯(聚类)的观测点数 % number assigned to this centre
w(i)=(ni+1)/(n+k); %统计某个高斯(聚类)的权重(n+k)=(观测点总+质心个数总) % weight of this mixture
if ni %某个高斯的方差
v(i,:)=sum((xs(j==i,:)-repmat(m(i,:),ni,1)).^2,1)/ni;
else
v(i,:)=zeros(1,p);
end
end
end
else%聚类方差v0 聚类质心m0 聚类权重w0
%%%%%%%%%%%%%%%%%%%%%%%%
% use initial values given as input parameters
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
[k,p]=size(m0);%聚类质心采用参数输入的,并有次确定高斯个数k,待观测数据属性p
xs=(x-mx0(wn,:))./sx0(wn,:); %缩比例 % scale the data
m=(m0-mx0(ones(k,1),:))./sx0(ones(k,1),:);%求均值 % and the means
v=v0;%聚类方差采用参数输入的
w=w0;%聚类权值采用参数输入的
fv=ndims(v)>2 || size(v,1)>k; %根据输入方差样式判断是方阵方差还是对角方差 % full covariance matrix is supplied
if fv %方阵方差
mk=eye(p)==0; %对角线为0,其他为1 % off-diagonal elements
fulliv=any(v(repmat(mk,[1 1 k]))~=0); %any查看是否存在非0或非false ,% check if any are non-zero
if ~fulliv
v=reshape(v(repmat(~mk,[1 1 k])),p,k)'./repmat(sx0.^2,k,1); %方差中存在0,取对角方差,变成pXk的方差 % just pick out and scale the diagonal elements for now
else
v=v./repmat(sx0'*sx0,[1 1 k]); %方差中不存在0,方阵方差按照待观测数据方差进行缩比例 % scale the full covariance matrix
end
end
if nargin<7
wx=wn; %待观测数据没有设置权值,默认均设置为1 % no data point weights
end
end
%%
%前面聚类质心m,聚类权重w,聚类方差v,均求得,接下来进行高斯拟合
if length(wx)~=n %观测数据个数和对应权重个数不一致错误
error('%d datapoints but %d weights',n,length(wx));
end
lsx=sum(log(sx0));%待观测点标准差对数求和
xsw=xs.*repmat(wx,1,p); % 待观测点缩比例后的数据进行加权--加权后的新样本weighted data points
nwt=sum(wx); %当前待观测点的数据权重总和 number of data points counting duplicates
%%
%对角方差PXK,的高斯
if ~fulliv % initializing with diagonal covariance
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Diagonal Covariance matrices %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
v=max(v,c); %1/n^2 取聚类方差上限 % apply the lower bound
xs2=xs.^2.*repmat(wx,1,p); %待测元素平方*权重 % square and weight the data for variance calculations
% If data size is large then do calculations in chunks
%用于数据分块处理,防止数据过大,内存不足
nb=min(n,max(1,floor(memsize/(8*p*k)))); %【待观测数据个数】 和 【一次可取最大观测数据个数】内存大小bit/(8bit*p个属性*k个聚类) % chunk size for testing data points
nl=ceil(n/nb); %只入不舍。 总共要分的块数 % number of chunks
jx0=n-(nl-1)*nb; %取第一个块中的观测数据个数 ,保证后面模块数据个数皆为nb个 % size of first chunk
im=repmat(1:k,1,nb); im=im(:);%转为列
th=(l-floor(l))*n;%迭代次数整数部分
sd=(nargout > 3*(l~=0));%=1输出对数似然性值 % = 1 if we are outputting log likelihood values
lp=floor(l)+sd; %多一次迭代需要估计最后的G值 % extra loop needed to calculate final G value
lpx=zeros(1,n); %每个观测点的数据概率 % log probability of each data point
wk=ones(k,1);%各个聚类抽取
wp=ones(1,p);%各个属性抽取
wnb=ones(1,nb);%各个模块抽取
wnj=ones(1,jx0);%当前模块观测数据抽取
% EM loop
g=0; % dummy initial value for comparison
gg=zeros(lp+1,1);
ss=sd; % initialize stopping count (0 or 1)
for j=1:lp
%begin环境保护push
g1=g; %第j轮之前的对数补 % save previous log likelihood (2*pi factor omitted)
m1=m; %第j轮聚类质心 % save previous means, variances and weights
v1=v; %第j轮聚类方差
w1=w; %第j轮聚类权重
%end环境保护
%一个属性的一维正态分布f(x)=1 / ((2πv)^(1/2)) * e^(-(x-u)^2/(2v)).
%这里多个加权系数、加权未归一化多p个属性的多维正态分布f(x)=w / ((2pπv)^(1/2)) * e^(-(x-u)^2/(2v)).
vi=-0.5*v.^(-1);
%-1/(2v) 正态分布的指数部分的一部分 % data-independent scale factor in exponent
lvm=log(w)-0.5*sum(log(v),2);
%(log(w/(v^(1/2))))正太分布的系数部分的部分加上权重的对数形式 log of external scale factor (excluding -0.5*p*log(2pi) term)
% first do partial chunk (of length jx0)
jx=jx0;%当前模块观测点数据个数
ii=1:jx; % indices of data points in this chunk
kk=repmat(ii,k,1); %扩展每个观测点对应k个聚类 % kk(jx,k): one row per data point, one column per mixture
km=repmat(1:k,1,jx); %扩展每个聚类对应jx个观测点 % km(jx,k): one row per data point, one column per mixture
py=reshape(sum((xs(kk(:),:)-m(km(:),:)).^2.*vi(km(:),:),2),k,jx)+lvm(:,wnj); % py(k,jx) pdf of each point with each mixture
%(-(x-u)^2/(2v))+(log(w/(v^(1/2))))
%当前块的观测数据到质心的欧拉距离平方
%正态分布的指数部分+系数部分
%py k x jx 每个k 对应的jx的距离
mx=max(py,[],1); %按列获得每个观测数据相对于质心最大的 % mx(1,jx) find normalizing factor for each data point to prevent underflow when using exp()
px=exp(py-mx(wk,:)); %-mx(wk,:)使得px取值在0到1,想当于峰值改变最大为1.将在lpx补上
%exp([-(x-u)^2/(2v)] + [log(w/(v^(1/2)))] - max() )=
%w/(v^*(1/2)) * exp([-(x-u)^2/(2v)] - max() )
% find normalized probability of each mixture for each datapoint
ps=sum(px,1); %每个数据点对于k个聚类的距离规格化和向量1 x jx % total normalized likelihood of each data point
px=px./ps(wk,:);%归一化,每个聚类(高斯)概率总和为1 。在lpx中补上 % relative mixture probabilities for each data point (columns sum to 1)
lpx(ii)=log(ps)+mx;%拟合正态分布,缺失的幅值,归一化缺失
%-------
pk=px*wx(ii); %观测数据 权重 拟合一次后更新(多次混合的乘) % pk(k,1) effective number of data points for each mixture (could be zero due to underflow)
sx=px*xsw(ii,:);%数据拟合一次后 加权的样本 更新
sx2=px*xs2(ii,:);%数据拟合一次后 加权样本平方 更新
for il=2:nl %其他模块数据循环计算 % process the data points in chunks
ix=jx+1;%当前模块数据开始index
jx=jx+nb; %当前模块最大上限index % increment upper limit
ii=ix:jx; %当前块的观测数据范围index % indices of data points in this chunk
kk=repmat(ii,k,1);
py=reshape(sum((xs(kk(:),:)-m(im,:)).^2.*vi(im,:),2),k,nb)+lvm(:,wnb);
mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp()
px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint
ps=sum(px,1); % total normalized likelihood of each data point
px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1)
lpx(ii)=log(ps)+mx;
%-------
pk=pk+px*wx(ii); % pk(k,1) effective number of data points for each mixture (could be zero due to underflow)
sx=sx+px*xsw(ii,:);
sx2=sx2+px*xs2(ii,:);
end
g=lpx*wx;%对数补加权 % total log probability summed over all data points
gg(j)=g; %迭代次数的每个g保存 % save log prob at each iteration
w=pk/nwt; %总观测数据权重 更新 % normalize to get the weights
if pk %不存在0 ,防止除以0错误 % if all elements of pk are non-zero
m=sx./pk(:,wp); %根据属性比聚类(高斯)质心 % calculate mixture means
v=sx2./pk(:,wp); %各个属性比更新聚类(高斯) % and variances
else
wm=pk==0; %找到观测点权重为0的 % mask indicating mixtures with zero weights
nz=sum(wm); %统计个数 % number of zero-weight mixtures
[vv,mk]=sort(lpx); % find the lowest probability data points
m=zeros(k,p); % initialize means and variances to zero (variances are floored later)
v=m;%质心方差清0
m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points
w(wm)=1/n; % set these weights non-zero
w=w*n/(n+nz); % normalize so the weights sum to unity
wm=~wm; % mask for non-zero weights
m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and variances for mixtures with a non-zero weight
v(wm,:)=sx2(wm,:)./pk(wm,wp);
end
v=max(v-m.^2,c); %聚类方差更新 % apply floor to variances
if g-g1<=th && j>1
if ~ss, break; end %迭代结束 % stop
ss=ss-1; %继续循环迭代 % stop next time
end
end%end EM loop
if sd && ~fv % sd 根据输出参数个数和是否为对角方差判断是否需要计算迭代前一轮的近似情况 f we need to calculate the final probabilities
pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point
gg=gg(1:j)/n-0.5*p*log(2*pi)-lsx; % average log prob at each iteration
g=gg(end);
% gg' % *** DEBUG ***
m=m1; %返回迭代之前的质心 % back up to previous iteration
v=v1;
w=w1;
mm=sum(m,1)/k;
f=(m(:)'*m(:)-k*mm(:)'*mm(:))/sum(v(:));
end
if ~fv %根据输入是对角方差还是方阵方差进行调整输出方差
m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means
v=v.*repmat(sx0.^2,k,1); % and variances
else%这里是计算对角方差,由于带入是方阵,把对角调整为[p,p,k]的方阵方差,只有对角线为非0值
v1=v;
v=zeros(p,p,k);
mk=eye(p)==1; % mask for diagonal elements
v(repmat(mk,[1 1 k]))=v1'; % set from v1
end
end
%%
%方阵方差的高斯
if fv % check if full covariance matrices were requested
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Full Covariance matrices %
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
pl=p*(p+1)/2;
lix=1:p^2;
cix=repmat(1:p,p,1);
rix=cix';
lix(cix>rix)=[]; % index of lower triangular elements
cix=cix(lix); % index of lower triangular columns
rix=rix(lix); % index of lower triangular rows
dix=find(rix==cix);
lixi=zeros(p,p);
lixi(lix)=1:pl;
lixi=lixi';
lixi(lix)=1:pl; % reverse index to build full matrices
v=reshape(v,p^2,k);
v=v(lix,:)'; % lower triangular in rows
% If data size is large then do calculations in chunks
nb=min(n,max(1,floor(memsize/(24*p*k)))); % chunk size for testing data points
nl=ceil(n/nb); % number of chunks
jx0=n-(nl-1)*nb; % size of first chunk
%
th=(l-floor(l))*n;
sd=(nargout > 3*(l~=0)); % = 1 if we are outputting log likelihood values
lp=floor(l)+sd; % extra loop needed to calculate final G value
%
lpx=zeros(1,n); % log probability of each data point
wk=ones(k,1);
wp=ones(1,p);
wpl=ones(1,pl); % 1 index for lower triangular matrix
wnb=ones(1,nb);
wnj=ones(1,jx0);
% EM loop
g=0; % dummy initial value for comparison
gg=zeros(lp+1,1);
ss=sd; % initialize stopping count (0 or 1)
vi=zeros(p*k,p); % stack of k inverse cov matrices each size p*p
vim=zeros(p*k,1); % stack of k vectors of the form inv(v)*m
mtk=vim; % stack of k vectors of the form m
lvm=zeros(k,1);
wpk=repmat((1:p)',k,1);
for j=1:lp
g1=g; % save previous log likelihood (2*pi factor omitted)
m1=m; % save previous means, variances and weights
v1=v;
w1=w;
for ik=1:k
% these lines added for debugging only
% vk=reshape(v(k,lixi),p,p);
% condk(ik)=cond(vk);
%%%%%%%%%%%%%%%%%%%%
[uvk,dvk]=eig(reshape(v(ik,lixi),p,p)); % convert lower triangular to full and find eigenvalues
dvk=max(diag(dvk),c); % apply variance floor to eigenvalues
vik=-0.5*uvk*diag(dvk.^(-1))*uvk'; % calculate inverse
vi((ik-1)*p+(1:p),:)=vik; % vi contains all mixture inverses stacked on top of each other
vim((ik-1)*p+(1:p))=vik*m(ik,:)'; % vim contains vi*m for all mixtures stacked on top of each other
mtk((ik-1)*p+(1:p))=m(ik,:)'; % mtk contains all mixture means stacked on top of each other
lvm(ik)=log(w(ik))-0.5*sum(log(dvk)); % vm contains the weighted sqrt of det(vi) for each mixture
end
%
% % first do partial chunk
%
jx=jx0;
ii=1:jx;
xii=xs(ii,:).';
py=reshape(sum(reshape((vi*xii-vim(:,wnj)).*(xii(wpk,:)-mtk(:,wnj)),p,jx*k),1),k,jx)+lvm(:,wnj);
mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp()
px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint
ps=sum(px,1); % total normalized likelihood of each data point
px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1)
lpx(ii)=log(ps)+mx;
pk=px*wx(ii); % effective number of data points for each mixture (could be zero due to underflow)
sx=px*xsw(ii,:);
sx2=px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation (lower tri cov matrix as a row)
for il=2:nl
ix=jx+1;
jx=jx+nb; % increment upper limit
ii=ix:jx;
xii=xs(ii,:).';
py=reshape(sum(reshape((vi*xii-vim(:,wnb)).*(xii(wpk,:)-mtk(:,wnb)),p,nb*k),1),k,nb)+lvm(:,wnb);
mx=max(py,[],1); % find normalizing factor for each data point to prevent underflow when using exp()
px=exp(py-mx(wk,:)); % find normalized probability of each mixture for each datapoint
ps=sum(px,1); % total normalized likelihood of each data point
px=px./ps(wk,:); % relative mixture probabilities for each data point (columns sum to 1)
lpx(ii)=log(ps)+mx;
pk=pk+px*wx(ii); % effective number of data points for each mixture (could be zero due to underflow)
sx=sx+px*xsw(ii,:); % accumulator for mean calculation
sx2=sx2+px*(xsw(ii,rix).*xs(ii,cix)); % accumulator for variance calculation
end
g=lpx*wx; % total log probability summed over all data points
gg(j)=g; % save convergence history
w=pk/nwt; % w(k,1) normalize to get the column of weights
if pk % if all elements of pk are non-zero
m=sx./pk(:,wp); % find mean and mean square
v=sx2./pk(:,wpl);
else
wm=pk==0; % mask indicating mixtures with zero weights
nz=sum(wm); % number of zero-weight mixtures
[vv,mk]=sort(lpx); % find the lowest probability data points
m=zeros(k,p); % initialize means and variances to zero (variances are floored later)
v=zeros(k,pl);
m(wm,:)=xs(mk(1:nz),:); % set zero-weight mixture means to worst-fitted data points
w(wm)=1/n; % set these weights non-zero
w=w*n/(n+nz); % normalize so the weights sum to unity
wm=~wm; % mask for non-zero weights
m(wm,:)=sx(wm,:)./pk(wm,wp); % recalculate means and variances for mixtures with a non-zero weight
v(wm,:)=sx2(wm,:)./pk(wm,wpl);
end
v=v-m(:,cix).*m(:,rix); % subtract off mean squared
if g-g1<=th && j>1
if ~ss, break; end % stop
ss=ss-1; % stop next time
end
end
if sd % we need to calculate the final probabilities
pp=lpx'-0.5*p*log(2*pi)-lsx; % log of total probability of each data point
gg=gg(1:j)/nwt-0.5*p*log(2*pi)-lsx; % average log prob at each iteration
g=gg(end);
% gg' % *** DEBUG ONLY ***
m=m1; % back up to previous iteration
v=zeros(p,p,k); % reserve spave for k full covariance matrices
trv=0; % sum of variance matrix traces
for ik=1:k % loop for each mixture to apply variance floor
[uvk,dvk]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors
dvk=max(diag(dvk),c); % apply variance floor to eigenvalues
v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix
trv=trv+sum(dvk); % add trace to the sum
end
w=w1;
mm=sum(m,1)/k;
f=(m(:)'*m(:)-k*mm(:)'*mm(:))/trv;
else
v1=v; % lower triangular form
v=zeros(p,p,k); % reserve spave for k full covariance matrices
for ik=1:k % loop for each mixture to apply variance floor
[uvk,dvk,]=eig(reshape(v1(ik,lixi),p,p)); % convert lower triangular to full and find eigenvectors
dvk=max(diag(dvk),c); % apply variance floor
v(:,:,ik)=uvk*diag(dvk)*uvk'; % reconstitute full matrix
end
end
m=m.*sx0(ones(k,1),:)+mx0(ones(k,1),:); % unscale means
v=v.*repmat(sx0'*sx0,[1 1 k]);
end
if l==0 % suppress the first three output arguments if l==0
m=g;
v=f;
w=pp;
end