数据集
数据引入
代码展示:
#引入包
import pandas as pd
import numpy as np
#引用数据
path = 'C:/Users/詹鹏程/Desktop/python/机器学习/bsy.csv'
data = pd.read_csv(path)
data.head()
#对x进行赋值
x = data.drop(['y'],axis=1)
print(x)
#对y值进行赋值
y = data.y
print(y)
运行结果:
Gender Age Status City Cost Device
0 0 0 0 1 1 0
1 1 1 1 2 1 0
2 0 1 2 1 0 0
3 1 2 1 0 0 1
4 1 1 0 1 0 0
5 1 2 1 2 1 0
6 0 1 0 0 0 0
7 1 1 0 0 1 0
8 0 1 1 0 0 0
9 0 1 0 0 0 0
10 0 0 0 2 1 0
0 1
1 1
2 1
3 1
4 1
5 0
6 0
7 1
8 0
9 0
10 0
Name: y, dtype: int64
基本模型及工具包的应用
代码部分:
#建立模型
from sklearn.naive_bayes import CategoricalNB
#建立模型实例
model = CategoricalNB()
#模型训练
model.fit(x,y)
y_pred_proa = model.predict_proba(x)
print(y_pred_proa)
#输出预测的y
y_pred = model.predict(x)
print(y_pred)
#进行模型准确率的计算
from sklearn.metrics import accuracy_score
Accuracy = accuracy_score(y,y_pred)
print(Accuracy)
#测试样本
x_test = np.array([[0,1,1,0,1,0]])
print(x_test)
#预测测试结果
y_test_pred = model.predict(x_test)
print(y_test_pred)
运行结果:
[[0.35628409 0.64371591]
[0.38935375 0.61064625]
[0.22791221 0.77208779]
[0.32079208 0.67920792]
[0.12410623 0.87589377]
[0.44352058 0.55647942]
[0.75895994 0.24104006]
[0.36174172 0.63825828]
[0.75895994 0.24104006]
[0.75895994 0.24104006]
[0.76856577 0.23143423]]
[1 1 1 1 1 1 0 1 0 0 0]
0.9090909090909091
[[0 1 1 0 1 0]]
[0]