不同k值对KNN算法预测准确率的影响
本文通过KNN算法对鸢尾花分类案例,通过尝试集不同的k值来查看预测准确率和误差率的情况
from __future__ import print_function
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.model_selection import learning_curve
from sklearn.model_selection import cross_val_score
import matplotlib.pyplot as plt
iris = load_iris()
X = iris.data
y = iris.target
# 设定k值得范围为1-30
k_range = range(1, 31)
# 创建列表,保存遍历的精确度
k_scores = []
k_loss = []
# 循环看看每个n_neighbors对应的精确度
for k in k_range:
knn = KNeighborsClassifier(n_neighbors=k)
loss = -cross_val_score(knn, X, y, cv=10, scoring='neg_mean_squared_error') # for regression
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy') # for classification
k_scores.append(scores.mean())
k_loss.append(loss.mean())
# 绘制精确度的折线图
plt.figure(figsize=(10,6))
plt.plot(k_range, k_scores)
plt.xticks(range(30)[::1])
plt.xlabel('Value of K for KNN')
plt.ylabel('Accuracy')
plt.grid()
plt.show()
# 绘制误差的折线图
plt.figure(figsize=(10,6))
plt.plot(k_range, k_loss)
plt.xticks(range(30)[::1])
plt.xlabel('Value of K for KNN')
plt.ylabel('loss')
plt.grid()
plt.show()
总结:
由上图可知:预测结果的精确度在k值为13,18,20时对应的精确度最高。
说明:本文仅仅考虑了k值得选择对于KNN算法预测结果的影响。同时影响预测结果的因素还有距离函数(欧氏距离,曼哈顿距离)的选取。