实验四、贝叶斯决策分类算法_allelectrinics顾客数据库标记类的训练元组-CSDN博客

本文链接：https://blog.csdn.net/Da1JueYv/article/details/136323857

实验目的：

（1）熟悉 VC++编程工具和朴素贝叶斯决策算法。
（2）对 AllElectronics 顾客数据库查询得到先验概率和类条件概率。
（3）在样本集上用 VC++编程工具编写用朴素贝叶斯算法分类的程序，对任务相关数据运行朴素贝叶斯分类算法，调试实验。
（4）写出实验报告

二、实验原理：

1 1 、先验概率和类条件概率

先验概率：先验概率定义为训练样本集中属于 C i 类的样本（元组）数 N i 与

总样本数 N 之比，记为

。

类条件概率：类条件概率定义为训练样本集中属于 C i 类中的具有特征 X 的

样本（元组）的个数 n i 与属于 C i 类的样本（元组）数 N i 之比，记为

。

2 2 、贝叶斯决策

贝叶斯决策（分类）法将样本（元组）分到 C i 类，当且仅当

其中，训练样本集中的样本（元组）可被分为 m 类。

三、实验内容：

1.实验内容

用贝叶斯分类器对已知的特征向量 X 分类：

1）由 AllElectronics 顾客数据库类标记的训练样本集（元组）编程计算先验概率 P(Ci)和类条件概率 P(X|Ci)，并在实验报告中指出关键代码的功能和实现方法；

2）应用贝叶斯分类法编程对特征向量 X 分类，并在实验报告中指出关键程序片段的功能和实现方法；

3）用检验样本估计分类错误率；

4）在实验报告中画出程序或例程的程序框图。

2.实验步骤

由于该分类问题是决定顾客是否倾向于购买计算机，即 C1 对应于

buys_computer=yes，C2 对应于 buys_computer=no，是两类的分类问题。

实验步骤如下：

1）确定特征属性及划分：浏览所给的数据库，找出划分的特征属性；

2）获取训练样本：即给定的 AllElectronics 顾客数据库类标记的训练样本集（元组）；

3）计算训练样本中每个类别的先验概率：P(Ci)，i=1，2；

4）计算训练样本中类条件概率：设特征（属性）向量为 X，编程计算类条件概率 P(X|Ci)，i=1，2；

5）使用分类器进行分类；

3.程序框图

关键代码

#include<iostream>

#include<string>

#include<fstream>

#include<algorithm>

using namespace std;

class Date //存储结构

{

public:

string age;

string income;

string student;

string credit;

string buy;

void print()

{

cout << age<< " "<< income << " "<<student<<" "<< credit <<" "<<buy<<endl;

}

};

void compare_date_plus(string date,string indate,string buy,int& tempy,int& tempn)

{

if(date==indate&&buy=="yes")

{tempy++;}

if(date==indate&&buy=="no")

{tempn++;}

}

int main()

{

char name1[50]="date.txt"; //读取数据并保存

ifstream infile;

cout<<"要打开的文件为：date.txt"<<endl;

infile.open(name1,ios::in);

if(infile.fail())

{

cout << "error open!"<< endl;

exit(1);

}

Date date[100];

int datesize=0;

string iage, iincome,istudent,icredit,ibuy; //输入的条件

int y=0,n=0,agey=0,agen=0;

int incomey = 0,incomen =0,studenty = 0,studentn = 0,credity = 0,creditn=0; //统计出现的条件概率和类条件概率

float p1,p2,p3,p4,p5,p6,p7,p8,p9,p10,px1,px2,px3,px4;

cout<<"the date is as follow:"<<endl;

cout << "age "<<"income "<<"student "<<"credit "<<"buy "<<endl;

while(!infile.eof())

{

infile >> date[datesize].age >> date[datesize].income >> date[datesize].student>> date[datesize].credit>> date[datesize].buy;

date[datesize].print();//输出要处理的数据

datesize++;

}

//条件概率

for(int j = 0;j<datesize;j++)

{

int temp=date[j].buy=="yes"?y++:n++;

}

p1 = (float)y/(float)datesize;

p2 = (float)n/(float)datesize;

cout<<"P(buys_computer = yes) = "<<y<<"/"<<datesize<<"="<<p1<<endl;

cout<<"P(buys_computer = no) = "<<n<<"/"<<datesize<<"="<<p2<<endl;

cout<<"输入数据："<<endl;

cout<<"age"<<'\t'<<"income"<<'\t'<<"student"<<'\t'<<"credit"<<endl;//类条件概率

cin>>iage>>iincome>>istudent>>icredit;

for(int k = 0;k<datesize;k++)

{

compare_date_plus(date[k].age,iage,date[k].buy,agey,agen);

compare_date_plus(date[k].income,iincome,date[k].buy,incomey,incomen);

compare_date_plus(date[k].student,istudent,date[k].buy,studenty,studentn);

compare_date_plus(date[k].credit,icredit,date[k].buy,credity,creditn);

}

p3=(float)agey/(float)y;

p4=(float)agen/(float)n;

p5=(float)incomey/(float)y;

p6=(float)incomen/(float)n;

p7=(float)studenty/(float)y;

p8=(float)studentn/(float)n;

p9=(float)credity/(float)y;

p10=(float)creditn/(float)n;

px1=p3*p5*p7*p9;

px2=p4*p6*p8*p10;

px3=px1*p1;

px4=px2*p2;

cout<<"P(age = "<<iage<<"|buy = yes = "<<agey<<"/"<<y<<"="<<p3<<endl;

cout<<"P(age = "<<iage<<"|buy = no = "<<agen<<"/"<<n<<"="<<p4<<endl;

cout<<"P(income = "<<iincome<<"|buy = yes = "<<incomey<<"/"<<y<<"="<<p5<<endl;

cout<<"P(income = "<<iincome<<"|buy = no = "<<incomen<<"/"<<n<<"="<<p6<<endl;

cout<<"P(student = "<<istudent<<"|buy = yes = "<<studenty<<"/"<<y<<"="<<p7<<endl;

cout<<"P(student = "<<istudent<<"|buy = no = "<<studentn<<"/"<<n<<"="<<p8<<endl;

cout<<"P(credit = "<<icredit<<"|buy = yes = "<<credity<<"/"<<y<<"="<<p9<<endl;

cout<<"P(ctedit = "<<icredit<<"|buy = no = "<<creditn<<"/"<<n<<"="<<p10<<endl;

cout<<"P(X|buy = yes) = "<<px1<<endl;

cout<<"P(X|buy = no) = "<<px2<<endl;

cout<<"P(X|buy = yes)P(buy = yes) = "<<px3<<endl;

cout<<"P(X|buy = no)P(buy = no) = "<<px4<<endl;

if(px3>px4)//得到结果

cout<<"朴素贝叶斯预测buy = yes"<<endl;

else

cout<<"朴素贝叶斯预测buy =no"<<endl;

infile.close();//关闭文件

system( "PAUSE ");

}

四、实验结果：

1.实验数据

处理结果

（基于数据文件对输入数据进行分析）
输入数据，程序根据贝叶斯决策分类算法来判断顾客是否倾向于购买计算机。

3. 实验结论
贝叶斯决策分类算法基于概率已知或计算可得的情况下，对于给定的数据集，通过联合概率分布，获得结果，速度快，算法简洁，分类稳定性较高。