朴素贝叶斯
0. 加载相关模块
from sklearn.datasets import load_iris
from sklearn.naive_bayes import GaussianNB
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
# 用于在jupyter中进行绘图
%matplotlib inline
1. 加载数据
iris = load_iris()
1.1 数据预览¶
print('特征名称:', iris.feature_names)
特征名称: ['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)']
print('类别:', iris.target_names)
类别: ['setosa' 'versicolor' 'virginica']
1.2 数据处理
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=1/4, random_state=0)
print('数据集样本数:{},训练集样本数:{},测试集样本数:{}'.format(len(X), len(X_train), len(X_test)))
数据集样本数:150,训练集样本数:112,测试集样本数:38
2. 建立模型
knn_model = KNeighborsClassifier(n_neighbors=5)
dt_model = DecisionTreeClassifier(max_depth=5)
gnb_model = GaussianNB()
models = {
'kNN': knn_model,
'DT': dt_model,
'GNB': gnb_model
}
3. 训练+测试模型
for model_name, model in models.items():
model.fit(X_train, y_train)
acc = model.score(X_test, y_test)
print('{}测试准确率:{:.3f}'.format(model_name, acc))
GNB测试准确率:1.000
DT测试准确率:0.974
kNN测试准确率:0.974