利用基于线性假设的支持向量机分类器LinearSVC进行多类分类(复习2)

本文链接：https://blog.csdn.net/cymy001/article/details/79051943

本文是个人学习笔记，内容主要涉及SVC(Support Vector Classifier)对sklearn内置的digits邮票手写数字数据集进行线性多类分类。

支持向量机分类器(Support Vector Classifier)是根据训练样本的分布，搜索所以可能的线性分类器中最佳的那个，决定分类边界位置的样本并不是所有训练数据，是其中的两个类别空间的间隔最小的两个不同类别的数据点，即“支持向量”。从而可以在海量甚至高维度的数据中，筛选对预测任务最为有效的少数训练样本。（LogisticRegression模型在训练过程中考虑了所有训练样本对参数的影响）

准确率、召回率和 $F1$ 指标最先适用于二分类任务，对待多分类任务，训练过程的策略是逐一评估某个类别的准确率、召回率和 $F1$ 指标的性能，即把所有其他的类别看做阴性(负)样本，这样一来对于邮票手写数字问题就创造了10个二分类任务。

from sklearn.datasets import load_digits
digits=load_digits()
#该sklearn.datasets里的手写体数字图像数据共1797条,每幅图片由8*8=64的像素矩阵表示
digits.data.shape   
#Output:(1797, 64)

digits
#Output:{'DESCR': "Optical Recognition of Handwritten Digits Data Set\n===================================================\n\nNotes\n-----\nData Set Characteristics:\n    :Number of Instances: 5620\n    :Number of Attributes: 64\n    :Attribute Information: 8x8 image of integer pixels in the range 0..16.\n    :Missing Attribute Values: None\n    :Creator: E. Alpaydin (alpaydin '@' boun.edu.tr)\n    :Date: July; 1998\n\nThis is a copy of the test set of the UCI ML hand-written digits datasets\nhttp://archive.ics.uci.edu/ml/datasets/Optical+Recognition+of+Handwritten+Digits\n\nThe data set contains images of hand-written digits: 10 classes where\neach class refers to a digit.\n\nPreprocessing programs made available by NIST were used to extract\nnormalized bitmaps of handwritten digits from a preprinted form. From a\ntotal of 43 people, 30 contributed to the training set and different 13\nto the test set. 32x32 bitmaps are divided into nonoverlapping blocks of\n4x4 and the number of on pixels are counted in each block. This generates\nan input matrix of 8x8 where each element is an integer in the range\n0..16. This reduces dimensionality and gives invariance to small\ndistortions.\n\nFor info on NIST preprocessing routines, see M. D. Garris, J. L. Blue, G.\nT. Candela, D. L. Dimmick, J. Geist, P. J. Grother, S. A. Janet, and C.\nL. Wilson, NIST Form-Based Handprint Recognition System, NISTIR 5469,\n1994.\n\nReferences\n----------\n  - C. Kaynak (1995) Methods of Combining Multiple Classifiers and Their\n    Applications to Handwritten Digit Recognition, MSc Thesis, Institute of\n    Graduate Studies in Science and Engineering, Bogazici University.\n  - E. Alpaydin, C. Kaynak (1998) Cascading Classifiers, Kybernetika.\n  - Ken Tang and Ponnuthurai N. Suganthan and Xi Yao and A. Kai Qin.\n    Linear dimensionalityreduction using relevance weighted LDA. School of\n    Electrical and Electronic Engineering Nanyang Technological University.\n    2005.\n  - Claudio Gentile. A New Approximate Maximal Margin Classification\n    Algorithm. NIPS. 2000.\n",
# 'data': array([[  0.,   0.,   5., ...,   0.,   0.,   0.],
        [  0.,   0.,   0., ...,  10.,   0.,   0.],
        [  0.,   0.,   0., ...,  16.,   9.,   0.],
        ..., 
        [  0.,   0.,   1., ...,   6.,   0.,   0.],
        [  0.,   0.,   2., ...,  12.,   0.,   0.],
        [  0.,   0.,  10., ...,  12.,   1.,   0.]]),
# 'images': array([[[  0.,   0.,   5., ...,   1.,   0.,   0.],
         [  0.,   0.,  13., ...,  15.,   5.,   0.],
         [  0.,   3.,  15., ...,  11.,   8.,   0.],
         ..., 
         [  0.,   4.,  11., ...,  12.,   7.,   0.],
         [  0.,   2.,  14., ...,  12.,   0.,   0.],
         [  0.,   0.,   6., ...,   0.,   0.,   0.]],

        [[  0.,   0.,   0., ...,   5.,   0.,   0.],
         [  0.,   0.,   0., ...,   9.,   0.,   0.],
         [  0.,   0.,   3., ...,   6.,   0.,   0.],
         ..., 
         [  0.,   0.,   1., ...,   6.,   0.,   0.],
         [  0.,   0.,   1., ...,   6.,   0.,   0.],
         [  0.,   0.,   0., ...,  10.,   0.,   0.]],

        [[  0.,   0.,   0., ...,  12.,   0.,   0.],
         [  0.,   0.,   3., ...,  14.,   0.,   0.],
         [  0.,   0.,   8., ...,  16.,   0.,   0.],
         ..., 
         [  0.,   9.,  16., ...,   0.,   0.,   0.],
         [  0.,   3.,  13., ...,  11.,   5.,   0.],
         [  0.,   0.,   0., ...,  16.,   9.,   0.]],

        ..., 
        [[  0.,   0.,   1., ...,   1.,   0.,   0.],
         [  0.,   0.,  13., ...,   2.,   1.,   0.],
         [  0.,   0.,  16., ...,  16.,   5.,   0.],
         ..., 
         [  0.,   0.,  16., ...,  15.,   0.,   0.],
         [  0.,   0.,  15., ...,  16.,   0.,   0.],
         [  0.,   0.,   2., ...,   6.,   0.,   0.]],

        [[  0.,   0.,   2., ...,   0.,   0.,   0.],
         [  0.,   0.,  14., ...,  15.,   1.,   0.],
         [  0.,   4.,  16., ...,  16.,   7.,   0.],
         ..., 
         [  0.,   0.,   0., ...,  16.,   2.,   0.],
         [  0.,   0.,   4., ...,  16.,   2.,   0.],
         [  0.,   0.,   5., ...,  12.,   0.,   0.]],

        [[  0.,   0.,  10., ...,   1.,   0.,   0.],
         [  0.,   2.,  16., ...,   1.,   0.,   0.],
         [  0.,   0.,  15., ...,  15.,   0.,   0.],
         ..., 
         [  0.,   4.,  16., ...,  16.,   6.,   0.],
         [  0.,   8.,  16., ...,  16.,   8.,   0.],
         [  0.,   1.,   8., ...,  12.,   1.,   0.]]]),
# 'target': array([0, 1, 2, ..., 8, 9, 8]),
# 'target_names': array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])}

from distutils.version import LooseVersion as Version  
from sklearn import __version__ as sklearn_version  
from sklearn import datasets  
if Version(sklearn_version) < '0.18':  
    from sklearn.cross_validation import train_test_split  
else:  
    from sklearn.model_selection import train_test_split  
X_train,X_test,y_train,y_test = train_test_split(digits.data, digits.target, test_size=0.25, random_state=33)
y_train.shape
#Output:(1347,)
y_test.shape
#Output:(450,)

from sklearn.preprocessing import StandardScaler
from sklearn.svm import LinearSVC   #基于线性假设的支持向量机分类器LinearSVC
ss=StandardScaler()
X_train=ss.fit_transform(X_train)
X_test=ss.transform(X_test)

lsvc=LinearSVC()
lsvc.fit(X_train,y_train)
y_predict=lsvc.predict(X_test)

print('The Accuracy of Linear SVC is',lsvc.score(X_test,y_test))
#Output:The Accuracy of Linear SVC is 0.953333333333

from sklearn.metrics import classification_report
print(classification_report(y_test,y_predict,target_names=digits.target_names.astype(str)))

这里写图片描述