一 问题的引出
我们都知道极大似然估计是一种很有效的估计分布参数的方法,例如,我们先验性地假设某班级男性同学的身高是服从于某一个一维高斯分布,并且通过简单随机抽样,我们获取到了 N N N个样本.现在我们想通过这些样本估计出该高斯分布的分布参数(即均值和方差),于是我们可以建立似然函数并对该两个参量求其导数令其为 0 0 0,可以求得其参数.但是现在考虑一种新的问题,假设我们所采集到的样本数据不是单独的男性身高数据,而是男性和女性样本数据的混合,并且假设女性样本数据也服从于某一参数的高斯分布,现在的问题是估计出这两个高斯分布的参数以及男女样本各占的比例.
对于这个问题,显然想要估计这两个高斯分布的参数就必须要先知道男女在总样本所占比例,而要想知道男女在总样本所占比例,就必须要知道这两个高斯分布的参数,就好比鸡生蛋,蛋生鸡的问题.因此单纯的极大似然估计是没办法解决这种高斯混合模型( G M M GMM GMM),因此下一节我们引入EM算法,这是一种能够解决含有隐变量的参数估计问题.
二 EM算法原理
对于极大似然估计,即式(2-1-1)将其改写成式(2-1-2),我们的目标是求解出参数,使得似然函数
l
(
θ
)
l(\theta)
l(θ)达到最大,但是我们无法对式(2-1-2)进行解析求解,因此我们先给
θ
\theta
θ一个初值
θ
0
\theta_0
θ0,然后构造出一个
l
(
θ
)
l(\theta)
l(θ)的下界函数
r
(
x
∣
θ
)
r(x|\theta)
r(x∣θ),找到这个下界函数的最大值,再找到其对应的
θ
\theta
θ,再继续迭代.至于该方法的收敛性,这里不做证明,感兴趣的读者可以查阅相关资料.
l
(
θ
)
=
∑
i
=
1
N
l
o
g
∑
z
p
(
x
(
i
)
,
z
;
θ
)
(2-1-1)
l(\theta)=\sum_{i=1}^Nlog\sum_zp(x^{(i)},z;\theta) \tag{2-1-1}
l(θ)=i=1∑Nlogz∑p(x(i),z;θ)(2-1-1)
式(2-1-3)给出了其基本思路,其中
Q
i
(
z
(
i
)
)
Q_i(z^(i))
Qi(z(i))是
z
z
z的某个分布,并且使用了
J
e
n
s
e
n
Jensen
Jensen不等式.
l
(
θ
)
=
∑
i
=
1
N
l
o
g
∑
z
p
(
x
(
i
)
,
z
;
θ
)
=
∑
i
=
1
N
l
o
g
∑
z
(
j
)
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
=
∑
i
=
1
N
l
o
g
∑
z
(
j
)
Q
j
(
z
(
j
)
)
l
o
g
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
Q
j
(
z
(
j
)
)
≥
∑
i
=
1
N
∑
z
(
j
)
Q
j
(
z
(
j
)
)
l
o
g
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
Q
j
(
z
(
j
)
)
(2-1-3)
l(\theta)=\sum_{i=1}^Nlog\sum_zp(x^{(i)},z;\theta)=\sum_{i=1}^Nlog\sum_{z(j)}p(x^{(i)},z^{(j)};\theta)= \sum_{i=1}^Nlog\sum_{z^{(j)}}Q_j(z^{(j)})log\frac{p(x^{(i)},z^{(j)};\theta)}{Q_j(z^{(j)})}\\ {\geq}\sum_{i=1}^N\sum_{z^{(j)}}Q_j(z^{(j)})log\frac{p(x^{(i)},z^{(j)};\theta)}{Q_j(z^{(j)})} \tag{2-1-3}
l(θ)=i=1∑Nlogz∑p(x(i),z;θ)=i=1∑Nlogz(j)∑p(x(i),z(j);θ)=i=1∑Nlogz(j)∑Qj(z(j))logQj(z(j))p(x(i),z(j);θ)≥i=1∑Nz(j)∑Qj(z(j))logQj(z(j))p(x(i),z(j);θ)(2-1-3)
当
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
Q
j
(
z
(
j
)
)
\frac{p(x^{(i)},z^{(j)};\theta)}{Q_j(z^{(j)})}
Qj(z(j))p(x(i),z(j);θ)等于常数
c
c
c时,等号成立,并且由于
Q
i
(
z
(
i
)
)
Q_i(z^(i))
Qi(z(i))是
z
z
z的某个分布,因此有
∑
z
(
j
)
Q
j
(
z
(
j
)
)
=
1
\sum_{z^{(j)}}{Q_j(z^{(j)})}=1
∑z(j)Qj(z(j))=1
于是
Q
j
(
z
(
j
)
)
{Q_j(z^{(j)})}
Qj(z(j))可以取式(2-1-4),于是求解似然函数的极大值的过程为式(2-1-5)
Q
j
(
z
(
j
)
)
=
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
∑
z
(
j
)
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
=
p
(
x
(
i
)
,
z
(
j
)
;
θ
)
p
(
x
(
i
)
;
θ
)
=
p
(
z
(
j
)
∣
x
(
i
)
;
θ
)
(2-1-4)
{Q_j(z^{(j)})}=\frac{p(x^{(i)},z^{(j)};\theta)}{\sum_{z^{(j)}}p(x^{(i)},z^{(j)};\theta)}= \frac{p(x^{(i)},z^{(j)};\theta)}{p(x^{(i)};\theta)}=p(z^{(j)}|x^{(i)};\theta) \tag{2-1-4}
Qj(z(j))=∑z(j)p(x(i),z(j);θ)p(x(i),z(j);θ)=p(x(i);θ)p(x(i),z(j);θ)=p(z(j)∣x(i);θ)(2-1-4)
1. θ ∗ = a r g m a x ∑ i = 1 N ∑ z ( j ) Q j ( z ( j ) ) l o g p ( x ( i ) , z ( j ) ; θ ) Q j ( z ( j ) ) 2. Q j ( z ( j ) ) = p ( z ( j ) ∣ x ( i ) ; θ ∗ ) (2-1-5) 1.\theta^{*}=argmax\sum_{i=1}^N\sum_{z^{(j)}}Q_j(z^{(j)})log\frac{p(x^{(i)},z^{(j)};\theta)}{Q_j(z^{(j)})}\\ 2.{Q_j(z^{(j)})}=p(z^{(j)}|x^{(i)};\theta^{*}) \tag{2-1-5} 1.θ∗=argmaxi=1∑Nz(j)∑Qj(z(j))logQj(z(j))p(x(i),z(j);θ)2.Qj(z(j))=p(z(j)∣x(i);θ∗)(2-1-5)
三 G M M GMM GMM模型的求解
对于
G
M
M
GMM
GMM模型,我们需要估计的参数有:每一种高维高斯分布的均值向量和协方差矩阵,以及每一种类别的先验概率.
记
μ
i
\mu_i
μi代表第
i
i
i个类别服从高斯分布的均值向量,
∑
(
i
)
\sum_{}^{(i)}
∑(i)代表第
i
i
i个类别服从的高斯分布的协方差矩阵,
ϕ
i
\phi_i
ϕi代表第
i
i
i种类别的先验概率,下面对上述参数进行求解,使得其能够让似然函数达到极大值.记
W
j
(
i
)
=
Q
j
(
z
(
i
)
=
j
)
=
p
(
z
(
i
)
=
j
∣
x
(
i
)
;
ϕ
,
∑
,
μ
)
W_j^{(i)}=Q_j(z^{(i)=j})=p(z^{(i)}=j|x^{(i)};\phi,\sum_{},\mu)
Wj(i)=Qj(z(i)=j)=p(z(i)=j∣x(i);ϕ,∑,μ).
于是式(2-1-5)的第一步可以写为式(3-1-1)
(
ϕ
∗
,
∑
∗
,
μ
∗
)
=
a
r
g
m
a
x
∑
i
=
1
N
∑
z
(
j
)
Q
j
(
z
(
j
)
)
l
o
g
p
(
x
(
i
)
,
z
(
j
)
;
ϕ
,
∑
,
μ
)
Q
j
(
z
(
j
)
)
=
a
r
g
m
a
x
∑
i
=
1
N
∑
z
Q
j
(
z
=
j
)
)
l
o
g
p
(
x
(
i
)
∣
z
=
j
;
ϕ
,
∑
,
μ
)
p
(
z
(
j
)
=
j
;
ϕ
,
∑
,
μ
)
Q
j
(
z
=
j
)
=
a
r
g
m
a
x
∑
i
=
1
N
∑
j
=
1
k
W
j
(
i
)
l
o
g
1
(
2
π
)
n
2
∣
∑
∣
1
2
e
x
p
(
−
1
2
(
x
(
i
)
−
μ
j
)
T
∑
(
j
)
−
1
(
x
(
i
)
−
μ
j
)
)
ϕ
j
W
j
(
i
)
(3-1-1)
(\phi^{*},\sum_{}^{*},\mu^{*})=argmax\sum_{i=1}^N\sum_{z^{(j)}}Q_j(z^{(j)})log\frac{p(x^{(i)},z^{(j)};\phi,\sum_{},\mu)}{Q_j(z^{(j)})}\\ =argmax\sum_{i=1}^N\sum_{z}Q_j(z=j))log\frac{p(x^{(i)}|z={j};\phi,\sum_{},\mu)p(z^{(j)}=j;\phi,\sum_{},\mu)}{Q_j(z={j})}\\ =argmax\sum_{i=1}^N\sum_{j=1}^{k}W_j^{(i)}log\frac{\frac{1}{{(2\pi)}^\frac{n}{2}|\sum_{}|^\frac{1}{2}}exp(-\frac{1}{2}(x^{(i)}-\mu_j)^{T}\sum_{}^{(j)^{-1}}(x^{(i)}-\mu_j))\phi_j}{W_j^{(i)}} \tag{3-1-1}
(ϕ∗,∑∗,μ∗)=argmaxi=1∑Nz(j)∑Qj(z(j))logQj(z(j))p(x(i),z(j);ϕ,∑,μ)=argmaxi=1∑Nz∑Qj(z=j))logQj(z=j)p(x(i)∣z=j;ϕ,∑,μ)p(z(j)=j;ϕ,∑,μ)=argmaxi=1∑Nj=1∑kWj(i)logWj(i)(2π)2n∣∑∣211exp(−21(x(i)−μj)T∑(j)−1(x(i)−μj))ϕj(3-1-1)
然后对式分别对
ϕ
,
∑
,
μ
\phi,\sum_{},\mu
ϕ,∑,μ求偏导令其为
0
0
0可以得到式(3-1-2),这里以求解
ϕ
\phi
ϕ为例(式3-1-3),其余请读者自己推导.
μ
j
=
∑
i
=
0
N
W
j
(
i
)
x
(
i
)
∑
i
=
0
N
W
j
(
i
)
ϕ
j
=
∑
i
=
0
N
W
j
(
i
)
N
∑
(
j
)
=
∑
i
=
0
N
W
j
(
i
)
(
x
(
i
)
−
μ
j
)
(
x
(
i
)
−
μ
j
)
T
∑
i
=
0
N
W
j
(
i
)
(3-1-2)
\mu_j=\frac{\sum_{i=0}^NW_j^{(i)}x^{(i)}}{\sum_{i=0}^NW_j^{(i)}}\\ \phi_j=\frac{\sum_{i=0}^NW_j^{(i)}}{N}\\ \sum_{}^{(j)}=\frac{\sum_{i=0}^NW_j^{(i)}(x^{(i)}-\mu_j)(x^{(i)}-\mu_j)^{T}}{\sum_{i=0}^NW_j^{(i)}} \tag{3-1-2}
μj=∑i=0NWj(i)∑i=0NWj(i)x(i)ϕj=N∑i=0NWj(i)∑(j)=∑i=0NWj(i)∑i=0NWj(i)(x(i)−μj)(x(i)−μj)T(3-1-2)
对于
ϕ
\phi
ϕ的目标函数为
∑
i
=
1
N
∑
j
=
1
k
W
j
(
i
)
l
o
g
ϕ
j
且
∑
j
=
1
k
ϕ
j
=
1
\sum_{i=1}^N\sum_{j=1}^kW_j^{(i)}log\phi_j且\sum_{j=1}^{k}\phi_j=1
∑i=1N∑j=1kWj(i)logϕj且∑j=1kϕj=1
令
L
=
∑
i
=
1
N
∑
j
=
1
k
W
j
(
i
)
l
o
g
ϕ
j
+
λ
(
∑
j
=
1
k
ϕ
j
−
1
)
∂
L
∂
ϕ
j
=
∑
i
=
1
N
W
j
(
i
)
1
ϕ
j
+
λ
=
0
⇒
∑
i
=
1
N
W
j
(
i
)
+
λ
ϕ
j
=
0
⇒
∑
j
=
1
k
∑
i
=
1
N
W
j
(
i
)
+
λ
∑
j
=
1
k
ϕ
j
=
0
λ
=
−
N
⇒
ϕ
j
=
∑
i
=
0
N
W
j
(
i
)
N
(3-1-3)
令L=\sum_{i=1}^N\sum_{j=1}^kW_j^{(i)}log\phi_j+\lambda(\sum_{j=1}^{k}\phi_j-1)\\ \frac{\partial{L}}{\partial\phi_j}=\sum_{i=1}^NW_j^{(i)}\frac{1}{\phi_j}+\lambda=0\\ \Rightarrow{\sum_{i=1}^{N}W_j^{(i)}+\lambda\phi_j}=0\Rightarrow{\sum_{j=1}^k\sum_{i=1}^{N}W_j^{(i)}+\lambda\sum_{j=1}^{k}\phi_j}=0\\ \lambda=-N\Rightarrow\phi_j=\frac{\sum_{i=0}^NW_j^{(i)}}{N} \tag{3-1-3}
令L=i=1∑Nj=1∑kWj(i)logϕj+λ(j=1∑kϕj−1)∂ϕj∂L=i=1∑NWj(i)ϕj1+λ=0⇒i=1∑NWj(i)+λϕj=0⇒j=1∑ki=1∑NWj(i)+λj=1∑kϕj=0λ=−N⇒ϕj=N∑i=0NWj(i)(3-1-3)
四 C语言实现
注:模型所使用的数据集第一列代表观测样本的身高,第二列代表观测样体重,但是性别未知,因此我们要计算出每一个样本所属的性别以及其对应的高斯分布的参数.
#include<stdio.h>
#include<stdlib.h>
#include<math.h>
#define N 114
#define p 2
#define PI 3.1415926
float Gausian_Pdf(float *mul, int mul_size, float **stdev, float *data);//计算高维高斯分布概率密度
float Matrix_Det(float **array, int n);//计算方阵的行列式
float **Matrix_Inv(float **array1, int n);//计算矩阵的逆矩阵
void Vector_Sub(float *src, float *dst, float value, int n);//向量相减并存储
int Matrix_Free(float **tmp, int m, int n);//释放申请的矩阵内存
float Vector_Sum(float *array, int n);//对向量求和
void Vector_Dot_Vector(float **matrix, float **array1, int n, float *array2, int m, float *w);
//用于更新高纬高斯分布的协方差矩阵
void swap(float *src, float *dst, int n);//交换两向量内容
void Test_Accuary(float **Data, int n, float **mul, float ***stdev);//测试测试集的准确率
float **Read_Data(char *filename);//读取数据
int main(void)
{
char *filename = "F:\\C PROGRAM\\BP_C\\BP\\data.txt";
float **X_Data = Read_Data(filename);
float **mul = (float **)malloc(p * sizeof(float *));
float *phi = (float *)malloc(p * sizeof(float));
float ***stdev = (float ***)malloc(p * sizeof(float**));
float *w1 = (float *)malloc(N * sizeof(float));
float *w2 = (float *)malloc(N * sizeof(float));
float **w = (float **)malloc(p * sizeof(float *));
w[0] = w1; w[1] = w2;
for (int i = 0; i < p; i++)
{
mul[i] = (float *)malloc(p * sizeof(float));
stdev[i] = (float **)malloc(p * sizeof(float *));
}
for (int i = 0; i < p; i++)
{
for (int j = 0; j < p; j++)
{
stdev[i][j] = (float *)malloc(p * sizeof(float));
}
}
float error = 0.001;
float temp = 0;
float sum = 0;
int max_epoch = 50;
int epoch=0;
int i, j;
//初始化参数
phi[0] = 0.4; phi[1] = 1 - phi[0];
mul[0][0] = 175;mul[0][1] = 75;mul[1][0] = 160;mul[1][1] = 60;
stdev[0][0][0] = 30; stdev[0][1][0] = 7; stdev[0][0][1] = 7; stdev[0][1][1] = 30;
stdev[1][0][0] = 30; stdev[1][0][1] = 5; stdev[1][1][0] = 5; stdev[1][1][1] = 30;
while (1)
{
if (epoch > max_epoch)
{
printf("迭代结束\n");
break;
}
//E step
for (i = 0; i < N; i++)
{
temp = 0;
for (j = 0; j < p; j++)
{
temp=temp+phi[0]*Gausian_Pdf(mul[j], p, stdev[j], X_Data[i]);
}
w1[i] = phi[0]*(Gausian_Pdf(mul[0], p, stdev[0], X_Data[i]))/temp;
w2[i] = 1 - w1[i];
}
//break;
//M step
float temp_1 = 0;
float temp_2 = 0;
//更新均值
for (int k1 = 0; k1 < p; k1++)
{
if (k1 == 0)
{
for (j = 0; j < p; j++)
{
for (i = 0; i < N; i++)
{
temp_1 = temp_1 + w1[i] * X_Data[i][j];
}
mul[k1][j] = temp_1 / Vector_Sum(w1, N);
temp_1 = 0;
}
}
else
{
for (j = 0; j < p; j++)
{
for (i = 0; i < N; i++)
{
temp_2 = temp_2 + w2[i] * X_Data[i][j];
}
mul[k1][j] = temp_2 / Vector_Sum(w2, N);
temp_2 = 0;
}
}
}
//更新协方差矩阵,更新phi
for (i = 0; i < p; i++)
{
Vector_Dot_Vector(stdev[i], X_Data, p, mul[i], p, w[i]);
phi[i] = Vector_Sum(w[i], N)/N;
}
epoch++;
}
Test_Accuary(X_Data, p, mul, stdev);
for (i = 0; i < p; i++)
{
printf("样本%d均值:\n", i + 1);
for (j = 0; j < p; j++)
{
printf("%f ", mul[i][j]);
}
printf("\n");
}
for (i = 0; i < p; i++)
{
printf("样本%d的方差为:\n",i+1);
for (j = 0; j < p; j++)
{
for (int k = 0; k < p; k++)
{
printf("%f ", stdev[i][j][k]);
}
printf("\n");
}
}
return 0;
}
float **Read_Data(char *filename)
{
FILE *fp=NULL;
int i = 0;
int j = 0;
float data1;
float **Data = (float **)malloc(N * sizeof(float *));
for (i = 0; i < N; i++)
{
*(Data + i) = (float *)malloc(p * sizeof(float));
}
if ((fp = fopen(filename, "r")) == NULL)
{
printf("error: cannot open file: Data.txt\n\n\n");
return NULL;
}
i = 0;
while ((fscanf(fp, "%f", &data1) != -1))
{
Data[i][j] = data1;
j++;
if (j % 2 == 0)
{
i++;
j = 0;
}
}
if (fclose(fp))
{
printf("error: cannot close file: Data.txt\n");
return NULL;
}
return Data;
}
float Gausian_Pdf(float *mul, int mul_size, float **stdev, float *data)
{
//返回高纬高斯分布概率密度
//这个函数疑似有问题
float result, det; float sum = 0;
float **inv = Matrix_Inv(stdev, p);
float x[p];
int i, j, k;
//printf("well done\n");
det = Matrix_Det(stdev, p);
if (det == 0)
{
printf("NULL\n");
}
//printf("det=%f\n");
for (i = 0; i < p; i++)
{
x[i] = data[i] - mul[i];
}
for (i = 0; i < p; i++)
{
for (j = 0; j < p; j++)
{
sum = sum + inv[i][j] * x[i] * x[j];
}
}
//printf("sum=%f\n", sum);
sum = exp(-0.5*sum)/((2*PI)*sqrt(det));
//printf("sum1=%f\n", sum);
Matrix_Free(inv, p, p);
return sum;
}
float Matrix_Det(float **array, int n)
{
int i, flag_det, flag_zero, j;
flag_det = 0;
flag_zero = 0;
float temp, sum;
sum = 1.0;
//复制一份array
float **array_temp = (float **)malloc(n * sizeof(float *));
for (i = 0; i < n; i++)
{
array_temp[i] = (float *)malloc(n * sizeof(float));
}
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
array_temp[i][j] = array[i][j];
}
}
for (i = 0; i < n - 1; i++)
{
for (j = i; j < n; j++)
{
if (array[j][i] != 0)
{
if (j == i)
{
break;
}
else
{
swap(array[i], array[j], n);
flag_det++;
break;
}
}
else
{
flag_zero++;
if (flag_zero == n - i)
{
return 0;
}
}
}
flag_zero = 0;
for (j = i + 1; j < n; j++)
{
if (array[j][i] == 0)
{
continue;
}
else
{
Vector_Sub(array[j], array[i], array[j][i] / array[i][i], n);
}
}
}
for (i = 0; i < n; i++)
{
sum = sum*array[i][i] * pow(-1, flag_det);
}
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
array[i][j] = array_temp[i][j];
}
}
Matrix_Free(array_temp, n,n);
return sum;
}
float **Matrix_Inv(float **array1, int n)
{
int i, flag_det, flag_zero, j;
flag_det = 0;
flag_zero = 0;
float **array = (float **)malloc(n * sizeof(float*));
float **result = (float **)malloc(n * sizeof(float*));
float **array1_temp = (float **)malloc(n * sizeof(float*));
if (array == NULL)
{
printf("error in Matrix_Inv:申请空间失败\n");
return NULL;
}
for (i = 0; i < n; i++)
{
array[i] = (float *)malloc(2 * n * sizeof(float));
array1_temp[i] = (float *)malloc(n * sizeof(float));
result[i] = (float *)malloc(n * sizeof(float));
if (array[i] == NULL)
{
printf("error in Matrix_Inv:申请子空间失败\n");
return NULL;
}
}
//copy
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
array1_temp[i][j] = array1[i][j];
}
}
for (i = 0; i < n; i++)
{
for (j = n; j < 2 * n; j++)
{
if (i + n == j)
{
array[i][j] = 1;
}
else
{
array[i][j] = 0;
}
}
for (j = 0; j < n; j++)
{
array[i][j] = array1[i][j];
}
}
float temp = Matrix_Det(array1, n);
if (temp == 0)
{
printf("error in Matrix_Inv: 矩阵不可逆\n");
return NULL;
}
else
{
for (i = 0; i < n - 1; i++)
{
for (j = i; j < n; j++)
{
if (array[j][i] != 0)
{
if (j == i)
{
break;
}
else
{
swap(array[i], array[j], n);
flag_det++;
break;
}
}
else
{
flag_zero++;
if (flag_zero == n - i)
{
return 0;
}
}
}
flag_zero = 0;
for (j = i + 1; j < n; j++)
{
if (array[j][i] == 0)
{
continue;
}
else
{
Vector_Sub(array[j], array[i], array[j][i] / array[i][i], 2 * n);
}
}
}
}
for (i = 0; i < n; i++)
{
temp = array[i][i];
for (j = i; j < 2 * n; j++)
{
array[i][j] = array[i][j] / temp;
}
}
for (i = 0; i < n - 1; i++)
{
for (j = i + 1; j < n; j++)
{
Vector_Sub(array[i], array[j], array[i][j], 2 * n);
}
}
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
result[i][j] = array[i][j + n];
}
}
Matrix_Free(array, n, n);
for (i = 0; i < n; i++)
{
for (j = 0; j < n; j++)
{
array1[i][j] = array1_temp[i][j];
}
}
Matrix_Free(array1_temp, n, n);
return result;
}
void Vector_Sub(float *src, float *dst, float value, int n)
{
int i;
for (i = 0; i < n; i++)
{
src[i] = src[i] - dst[i] * value;
}
}
int Matrix_Free(float **tmp, int m, int n)
{
int i, j;
if (tmp == NULL)
{
return(1);
}
for (i = 0; i < m; i++)
{
if (tmp[i] != NULL)
{
free(tmp[i]);
tmp[i] = NULL;
}
}
if (tmp != NULL)
{
free(tmp);
tmp = NULL;
}
return(0);
}
float Vector_Sum(float *array, int n)
{
float sum = 0;
for (int i = 0; i < n; i++)
{
sum = sum + array[i];
}
return sum;
}
void Vector_Dot_Vector(float **matrix,float **array1, int n, float *array2, int m,float *w)
{
if (n != m)
return NULL;
float array_temp[p][p];
for (int i = 0; i < p; i++)
{
for (int j = 0; j < p; j++)
{
array_temp[i][j] = 0;
}
}
for (int i = 0; i < n; i++)
{
for (int j = 0; j < m; j++)
{
for (int k = 0; k < N; k++)
{
array_temp[i][j] = array_temp[i][j] + w[k] *
(array1[k][i] - array2[i])*(array1[k][j] - array2[j]);
}
}
}
for (int i = 0; i < p; i++)
{
for (int j = 0; j < p; j++)
{
matrix[i][j] = array_temp[i][j];
matrix[i][j] = matrix[i][j] / Vector_Sum(w, N);
}
}
//printf("sym=%f\n", Vector_Sum(w, N));
}
void swap(float *src, float *dst, int n)
{
int i;
float temp;
for (i = 0; i < n; i++)
{
temp = src[i];
src[i] = dst[i];
dst[i] = temp;
}
}
void Test_Accuary(float **Data, int n, float **mul, float ***stdev)
{
float temp[N][p];
float class1 = 0; float class2 = 0;
float a;
temp[0][0] = Gausian_Pdf(mul[0], n, stdev[0], Data[0]);
temp[0][1] = Gausian_Pdf(mul[1], n, stdev[1], Data[0]);
if (temp[0][0] < temp[0][1])
{
//顺序反了,交换顺序
swap(mul[0], mul[1], p);
swap(stdev[0][0], stdev[1][0],p);
swap(stdev[0][1], stdev[1][1], p);
}
for (int i = 0; i < N; i++)
{
for (int k1 = 0; k1 < n; k1++)
{
temp[i][k1] = Gausian_Pdf(mul[k1], n, stdev[k1], Data[i]);
}
if (temp[i][0] > temp[i][1] && i<=N/2)
{
class1++;
}
else if (temp[i][0] < temp[i][1] && i > N / 2)
{
class2++;
}
}
printf("accury=%f\n", (class1 + class2) / N);
}
五 结果
附数据
前57个是女生的数据,后57个是男生的数据
156 50
160 60
162 54
162 55
160.5 56
160 53
158 55
164 60
165 50
166 55
158 47.5
161 49
169 55
161 46
160 45
167 44
155 49
154 57
172 52
155 56
157 55
165 65
156 52
155 50
156 56
160 55
158 55
162 70
162 65
155 57
163 70
160 60
162 55
165 65
159 60
147 47
163 53
157 54
160 55
162 48
158 60
155 48
165 60
161 58
159 45
163 50
158 49
155 50
162 55
157 63
159 49
152 47
156 51
165 49
154 47
156 52
162 48
162 60
164 62
168 86
187 75
167 75
174 64
175 62
170 65
176 73
169 58
178 54
165 66
183 68
171 61
179 64
172 60
173 59
172 58
175 62
160 60
160 58
160 60
175 75
163 60
181 77
172 80
175 73
175 60
167 65
172 60
169 75
172 65
175 72
172 60
170 65
158 59
167 63
164 61
176 65
182 95
173 75
176 67
163 58
166 67
162 59
169 56
163 59
163 56
176 62
169 57
173 61
163 59
167 57
176 63
168 61
167 60
170 69