python 实例 Naive Bayes 决策树（ID3 CART）

最新推荐文章于 2022-10-20 21:49:10 发布

BloodyBlondie

最新推荐文章于 2022-10-20 21:49:10 发布

阅读量406

点赞数

分类专栏： python 文章标签： python

本文链接：https://blog.csdn.net/weixin_45529837/article/details/106312773

版权

python 专栏收录该内容

13 篇文章 2 订阅

订阅专栏

I.准备

1.import...

In [8]:

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.mlab as mlab
import warnings
import numpy as np
import seaborn as sns
from sklearn.model_selection import train_test_split  
from sklearn.naive_bayes import GaussianNB
from sklearn import metrics
from sklearn import tree
plt.rcParams['font.sans-serif'] = ['SimHei']  # 绘图时可以显示中文
plt.rcParams['axes.unicode_minus']=False   # 绘图时显示负号
warnings.filterwarnings("ignore")  # 不要显示警告

2.read data

In [9]:

cancer = pd.read_excel('C:\\Users\\91333\\Documents\\semester6\\data science\\week3\\Week3_CancerDataset.xlsx')

3.粗略认识

In [10]:

cancer.head(5)

Out[10]:

	feature1	feature2	feature3	feature4	feature5	feature6	feature7	feature8	feature9	feature10	...	feature22	feature23	feature24	feature25	feature26	feature27	feature28	feature29	feature30
0	17.99	10.38	122.80	1001.0	0.11840	0.27760	0.3001	0.14710	0.2419	0.07871	...	17.33	184.60	2019.0	0.1622	0.6656	0.7119	0.2654	0.4601	0.11890
1	20.57	17.77	132.90	1326.0	0.08474	0.07864	0.0869	0.07017	0.1812	0.05667	...	23.41	158.80	1956.0	0.1238	0.1866	0.2416	0.1860	0.2750	0.08902
2	19.69	21.25	130.00	1203.0	0.10960	0.15990	0.1974	0.12790	0.2069	0.05999	...	25.53	152.50	1709.0	0.1444	0.4245	0.4504	0.2430	0.3613	0.08758
3	11.42	20.38	77.58	386.1	0.14250	0.28390	0.2414	0.10520	0.2597	0.09744	...	26.50	98.87	567.7	0.2098	0.8663	0.6869	0.2575	0.6638	0.17300
4	20.29	14.34	135.10	1297.0	0.10030	0.13280	0.1980	0.10430	0.1809	0.05883	...	16.67	152.20	1575.0	0.1374	0.2050	0.4000	0.1625	0.2364	0.07678

5 rows × 31 columns

In [11]:

cancer.shape

Out[11]:

(569, 31)

569个观测；30个feature,都为连续型数据；1个label，为0-1变量

In [12]:

cancer.isnull().sum().sum()  # 无缺失值

Out[12]:

II. Naive Bayes

1.画图观察

In [13]:

cancer_scaled = cancer.apply(lambda x: (x - np.min(x)) / (np.max(x) - np.min(x)))
plt.figure()
for i in range(cancer.shape[1]-1):
    sns.kdeplot(cancer_scaled.iloc[:,i], alpha=.7,label="")
plt.title("kdeplot of 30 features")
plt.show()

每一个feature的分布都比较近似正态分布，考虑高斯朴素贝叶斯

3. 划分训练集测试集

In [14]:

x_train, x_test, y_train, y_test = train_test_split(cancer.iloc[:,0:-1], cancer.iloc[:,-1], test_size=0.3, random_state=1)

4. training the model on training set

In [15]:

gnb = GaussianNB()
gnb.fit(x_train, y_train)

Out[15]:

GaussianNB(priors=None, var_smoothing=1e-09)

5. making predictions on the testing set

In [16]:

y_pred = gnb.predict(x_test)
print("Gaussian Naive Bayes model accuracy(in %):", metrics.accuracy_score(y_test, y_pred)*100)

Gaussian Naive Bayes model accuracy(in %): 94.73684210526315

III. Decision Tree

对于tree.DecisionTreeClassifier的criterion参数有两个选项，entropy对应ID3算法，gini对应CART算法，分别尝试；max_depth的参数是整数，数据特征数为30个，较少，不设置max_depth，默认为None。

1. ID3

In [17]:

clf = tree.DecisionTreeClassifier(criterion='entropy')
clf.fit(x_train, y_train)
y_pred_clf = clf.predict(x_test)
print("ID3 model accuracy(in %):", metrics.accuracy_score(y_test, y_pred_clf)*100)

ID3 model accuracy(in %): 90.05847953216374

2.CART

In [18]:

CART = tree.DecisionTreeClassifier(criterion='gini')
CART.fit(x_train, y_train)
y_pred_CART = CART.predict(x_test)
print("CART model accuracy(in %):", metrics.accuracy_score(y_test, y_pred_CART)*100)

CART model accuracy(in %): 93.56725146198829

综合三种算法，CART和高斯朴素贝叶斯的正确率都比较高，下面展示CART正确率的更多细节。

In [19]:

print(metrics.classification_report(y_pred_CART,y_test,target_names=['died','servived']))

              precision    recall  f1-score   support

        died       0.87      0.95      0.91        58
    servived       0.97      0.93      0.95       113

    accuracy                           0.94       171
   macro avg       0.92      0.94      0.93       171
weighted avg       0.94      0.94      0.94       171

BloodyBlondie

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python 实例 Naive Bayes 决策树（ID3 CART）

I.准备1.import...In[8]:import pandas as pdimport matplotlib.pyplot as pltimport matplotlib.mlab as mlabimport warningsimport numpy as npimport seaborn as snsfrom sklearn.model_selection import train_test_split from sklearn.naive_bayes import .
复制链接

扫一扫