Python机器学习：KNN算法04f分类准确度

最新推荐文章于 2023-07-27 15:49:41 发布

范德彪陕西分彪

最新推荐文章于 2023-07-27 15:49:41 发布

阅读量1.2k

点赞数 3

分类专栏： Python机器学习

本文链接：https://blog.csdn.net/weixin_46815330/article/details/110404559

版权

Python机器学习专栏收录该内容

73 篇文章 3 订阅

订阅专栏

在这里插入图片描述
引入相关包

import numpy as np
import matplotlib.pyplot as plt
import matplotlib
from sklearn import datasets

这次使用用手写数据集digits
首先加载数据集

digits = datasets.load_digits()

digits.keys()

其中data是数据（1797,64）target是数据标签(1797,1),images是(1797,8,8)

dict_keys(['data', 'target', 'frame', 'feature_names', 'target_names', 'images', 'DESCR'])

print(digits.DESCR)

.. _digits_dataset:

Optical recognition of handwritten digits dataset
--------------------------------------------------

**Data Set Characteristics:**

    :Number of Instances: 5620
    :Number of Attributes: 64
    :Attribute Information: 8x8 image of integer pixels in the range 0..16.
    :Missing Attribute Values: None
    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)
    :Date: July; 1998

This is a copy of the test set of the UCI ML hand-written digits datasets
https://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits

The data set contains images of hand-written digits: 10 classes where
each class refers to a digit.

Preprocessing programs made available by NIST were used to extract
normalized bitmaps of handwritten digits from a preprinted form. From a
total of 43 people, 30 contributed to the training set and different 13
to the test set. 32x32 bitmaps are divided into nonoverlapping blocks of
4x4 and the number of on pixels are counted in each block. This generates
an input matrix of 8x8 where each element is an integer in the range
0..16. This reduces dimensionality and gives invariance to small
distortions.

For info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.
T. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.
L. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,
1994.

.. topic:: References

  - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their
    Applications to Handwritten Digit Recognition, MSc Thesis, Institute of
    Graduate Studies in Science and Engineering, Bogazici University.
  - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.
  - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.
    Linear dimensionalityreduction using relevance weighted LDA. School of
    Electrical and Electronic Engineering Nanyang Technological University.
    2005.
  - Claudio Gentile. A New Approximate Maximal Margin Classification
    Algorithm. NIPS. 2000.

看看数据X

X = digits.data
print(X.shape)
print(X[:10])

(1797, 64)
[[ 0.  0.  5. 13.  9.  1.  0.  0.  0.  0. 13. 15. 10. 15.  5.  0.  0.  3.
  15.  2.  0. 11.  8.  0.  0.  4. 12.  0.  0.  8.  8.  0.  0.  5.  8.  0.
   0.  9.  8.  0.  0.  4. 11.  0.  1. 12.  7.  0.  0.  2. 14.  5. 10. 12.
   0.  0.  0.  0.  6. 13. 10.  0.  0.  0.]
 [ 0.  0.  0. 12. 13.  5.  0.  0.  0.  0.  0. 11. 16.  9.  0.  0.  0.  0.
   3. 15. 16.  6.  0.  0.  0.  7. 15. 16. 16.  2.  0.  0.  0.  0.  1. 16.
  16.  3.  0.  0.  0.  0.  1. 16. 16.  6.  0.  0.  0.  0.  1. 16. 16.  6.
   0.  0.  0.  0.  0. 11. 16. 10.  0.  0.]
 [ 0.  0.  0.  4. 15. 12.  0.  0.  0.  0.  3. 16. 15. 14.  0.  0.  0.  0.
   8. 13.  8. 16.  0.  0.  0.  0.  1.  6. 15. 11.  0.  0.  0.  1.  8. 13.
  15.  1.  0.  0.  0.  9. 16. 16.  5.  0.  0.  0.  0.  3. 13. 16. 16. 11.
   5.  0.  0.  0.  0.  3. 11. 16.  9.  0.]
 [ 0.  0.  7. 15. 13.  1.  0.  0.  0.  8. 13.  6. 15.  4.  0.  0.  0.  2.
   1. 13. 13.  0.  0.  0.  0.  0.  2. 15. 11.  1.  0.  0.  0.  0.  0.  1.
  12. 12.  1.  0.  0.  0.  0.  0.  1. 10.  8.  0.  0.  0.  8.  4.  5. 14.
   9.  0.  0.  0.  7. 13. 13.  9.  0.  0.]
 [ 0.  0.  0.  1. 11.  0.  0.  0.  0.  0.  0.  7.  8.  0.  0.  0.  0.  0.
   1. 13.  6.  2.  2.  0.  0.  0.  7. 15.  0.  9.  8.  0.  0.  5. 16. 10.
   0. 16.  6.  0.  0.  4. 15. 16. 13. 16.  1.  0.  0.  0.  0.  3. 15. 10.
   0.  0.  0.  0.  0.  2. 16.  4.  0.  0.]
 [ 0.  0. 12. 10.  0.  0.  0.  0.  0.  0. 14. 16. 16. 14.  0.  0.  0.  0.
  13. 16. 15. 10.  1.  0.  0.  0. 11. 16. 16.  7.  0.  0.  0.  0.  0.  4.
   7. 16.  7.  0.  0.  0.  0.  0.  4. 16.  9.  0.  0.  0.  5.  4. 12. 16.
   4.  0.  0.  0.  9. 16. 16. 10.  0.  0.]
 [ 0.  0.  0. 12. 13.  0.  0.  0.  0.  0.  5. 16.  8.  0.  0.  0.  0.  0.
  13. 16.  3.  0.  0.  0.  0.  0. 14. 13.  0.  0.  0.  0.  0.  0. 15. 12.
   7.  2.  0.  0.  0.  0. 13. 16. 13. 16.  3.  0.  0.  0.  7. 16. 11. 15.
   8.  0.  0.  0.  1.  9. 15. 11.  3.  0.]
 [ 0.  0.  7.  8. 13. 16. 15.  1.  0.  0.  7.  7.  4. 11. 12.  0.  0.  0.
   0.  0.  8. 13.  1.  0.  0.  4.  8.  8. 15. 15.  6.  0.  0.  2. 11. 15.
  15.  4.  0.  0.  0.  0.  0. 16.  5.  0.  0.  0.  0.  0.  9. 15.  1.  0.
   0.  0.  0.  0. 13.  5.  0.  0.  0.  0.]
 [ 0.  0.  9. 14.  8.  1.  0.  0.  0.  0. 12. 14. 14. 12.  0.  0.  0.  0.
   9. 10.  0. 15.  4.  0.  0.  0.  3. 16. 12. 14.  2.  0.  0.  0.  4. 16.
  16.  2.  0.  0.  0.  3. 16.  8. 10. 13.  2.  0.  0.  1. 15.  1.  3. 16.
   8.  0.  0.  0. 11. 16. 15. 11.  1.  0.]
 [ 0.  0. 11. 12.  0.  0.  0.  0.  0.  2. 16. 16. 16. 13.  0.  0.  0.  3.
  16. 12. 10. 14.  0.  0.  0.  1. 16.  1. 12. 15.  0.  0.  0.  0. 13. 16.
   9. 15.  2.  0.  0.  0.  0.  3.  0.  9. 11.  0.  0.  0.  0.  0.  9. 15.
   4.  0.  0.  0.  9. 12. 13.  3.  0.  0.]]

看看标签y

y = digits.target
print(y.shape)
print(y[:100])

(1797,)
[0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 9 5 5 6 5 0
 9 8 9 8 4 1 7 7 3 5 1 0 0 2 2 7 8 2 0 1 2 6 3 3 7 3 3 4 6 6 6 4 9 1 5 0 9
 5 2 8 2 0 0 1 7 6 3 2 1 7 4 6 3 1 3 9 1 7 6 8 4 3 1]

可视化图片

print(digits.images.shape)#直接做好了,不用reshape了..
plt.imshow(digits.images[100],cmap = "gist_gray"

在这里插入图片描述

some_digit = X[666]
print(y[666])

#把some_digit可视化一下
some_digit_img = some_digit.reshape(8,8)
plt.imshow(some_digit_img,cmap = "hot")#cmap可选参数：binary,gray,gist_yarg
plt.show()

在这里插入图片描述
用自定义的算法分离数据集

from knn.model_selection import train_test_split
X_train,X_test,y_train,y_test = train_test_split(X,y)

自定义的KNN

from knn.KNN import KNNClassifier
my_knn_clf = KNNClassifier(k = 3)

fit

my_knn_clf.fit(X_train,y_train)

y_predict = my_knn_clf.predict(X_test)
print(y_predict)
print(y_predict.shape)


[8 7 4 5 1 2 1 4 5 4 2 1 9 8 5 1 5 4 1 5 6 9 6 5 9 0 6 5 2 3 1 0 7 8 1 3 3
 5 0 4 2 8 0 0 3 3 9 7 5 6 3 7 9 0 5 8 3 9 6 3 4 0 8 1 3 8 1 2 8 5 3 1 5 5
 2 7 6 7 4 6 8 7 1 5 9 2 3 4 6 2 0 1 1 0 2 2 5 3 1 0 9 6 1 4 7 4 4 6 0 3 2
 4 3 3 9 2 7 6 8 0 7 1 5 8 0 6 7 1 8 7 7 6 8 2 8 0 3 3 1 0 1 5 2 3 1 4 3 8
 8 8 2 1 3 9 0 7 7 0 5 4 0 8 1 7 8 7 1 3 9 5 2 1 8 7 7 6 5 8 8 3 4 9 0 0 6
 0 0 4 9 7 6 0 6 2 6 1 8 9 8 9 7 0 2 3 2 7 1 4 1 2 7 8 4 6 0 1 9 0 5 6 2 0
 9 2 7 2 3 5 0 8 1 0 7 9 4 4 8 5 6 5 2 5 2 6 9 5 4 7 4 6 6 8 4 7 7 7 9 2 9
 9 7 8 5 7 1 9 2 1 2 1 0 0 9 6 5 5 9 8 5 2 5 5 3 0 5 9 3 5 7 9 3 3 5 8 3 8
 8 3 9 4 4 0 4 9 6 3 5 8 6 9 9 8 1 6 7 1 6 6 9 9 9 3 7 6 4 5 8 2 9 4 0 0 7
 7 6 3 8 4 9 1 8 4 5 7 5 1 5 8 6 9 1 5 6 3 0 1 6 6 3]
(359,)

准确率

sum(y_predict == y_test) / len(y_test)

0.9832869080779945

我们还可以使用我们封装的算法进行计算准确率
metric.py

import numpy as np

def accuracy_score(y_true,y_predict):
    """计算y_true和y_predict之间的准确率"""
    assert y_true.shape[0] == y_predict.shape[0]
    "the size of y_true must be equal to the size of y_predict"
    return sum(y_predict == y_true) / len(y_predict)

开始计算

from knn.metrics import accuracy_score
accuracy_score = accuracy_score(y_test,y_predict)
print(accuracy_score)

0.9832869080779945

如果我们不关心y_predict的值我们还可以给KNN里面封装一个score(本文最后附上KNN全部代码)

  def score(self,X_test,y_test):
        """根据测试数据集X_test和y_test 确认当前模型的准确度"""
        y_predict = self.predict(X_test)
        return accuracy_score(y_test,y_predict)

用knn里面的方法计算准确度

my_knn_clf.score(X_test,y_test)

结果是一样的

0.9832869080779945

接下来我们使用sklearn封装的方法进行计算

#使用scikit-learn中的accuracy_score
from sklearn.model_selection import train_test_split

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size = 0.2,random_state=666)#为了使实验过程可重复传入随机种子

from sklearn.neighbors import KNeighborsClassifier

my_knn_clf = KNeighborsClassifier(n_neighbors = 3)
my_knn_clf.fit(X_train,y_train)
y_predict = my_knn_clf.predict(X_test)

from sklearn.metrics import accuracy_score

accuracy_score(y_test,y_predict)

0.9888888888888889

如果不关心y_predict

my_knn_clf.score(X_test,y_test)

0.9888888888888889

`附KNN全部代码`

# -*- encoding: utf-8 -*-
"""
@File    : KNN.py
@Time    : 2020/11/30 11:14
@Author  : XD
@Email   : gudianpai@qq.com
@Software: PyCharm
"""
#重新整理我们的KNN代码使得其架构更像sklearn
import numpy as np
from collections import Counter
from knn.metrics import accuracy_score

class KNNClassifier():
    def __init__(self,k):
        """初始化KNN分类器，需要传入K的值"""
        assert k>=1,"k必须合法"
        self.k = k
        self._X_train = None#将我们训练的数据私有化，加 _
        self._y_train = None#私有化

    def fit(self,X_train, y_train):
        """根据训练数据集X_train和y_train训练分类器"""

        # shape[0]指的就是行数，也就是X_train的数据点的个数，k显然必须小于总体样本个数
        assert 1 <= self.k <= X_train.shape[0],'K必须在一个合理的范围'

        # 数据点的个数必须与数据标签个数相同
        assert X_train.shape[0] == y_train.shape[0],'数据点个数必须与数据标签个数相同'

        self._X_train = X_train
        self._y_train = y_train
        return self

    def predict(self,X_predict):
        """给定待遇测数据集X_predict，返回表示X_predict的结果向量"""

        #传入的数据不为空
        assert self._X_train is not None and self._y_train is not None,'must fit before predict!'

        #数据的维数必须一致
        assert X_predict.shape[1] == self._X_train.shape[1],'the feature number of X_predict must be equal to X_train'

        y_predict = [self._predict(x) for  x in X_predict]

        return np.array(y_predict)

    def _predict(self,x):
        """给定单个待测数据x,返回x的预测结果值"""

        # x的维数必须和数据集中的X_train的维数保持一致
        assert x.shape[0] == self._X_train.shape[1],"the feature number of X_predict must be equal to X_train"

        distances = [np.sqrt(np.sum((x_train - x) ** 2)) for x_train in self._X_train]
        nearest = np.argsort(distances)

        topK_y = [self._y_train[i] for i in nearest[:self.k]]
        votes = Counter(topK_y)
        return votes.most_common(1)[0][0]

    def score(self,X_test,y_test):
        """根据测试数据集X_test和y_test 确认当前模型的准确度"""
        y_predict = self.predict(X_test)
        return accuracy_score(y_test,y_predict)
    def __repr__(self):
        """显示属性"""
        return "KNN(k=%d)" % self.k

范德彪陕西分彪

关注

3
点赞
踩
5

收藏

觉得还不错? 一键收藏
1
评论
Python机器学习：KNN算法04f分类准确度

引入相关包import numpy as npimport matplotlib.pyplot as pltimport matplotlibfrom sklearn import datasets这次使用用手写数据集digits首先加载数据集digits = datasets.load_digits()digits.keys()其中data是数据（1797,64）target是数据标签(1797,1),images是(1797,8,8)dict_keys(['data', 't.
复制链接

扫一扫

专栏目录