SVM支持向量机_案例:手写数字分类

本文介绍了使用Python进行手绘数字图像识别的过程,包括数据导入、特征值和目标值提取、数据归一化、数据集分割、PCA特征降维、SVM模型训练以及通过PCA优化模型以提高准确率。最终确定了最优模型参数并展示了模型性能。
摘要由CSDN通过智能技术生成

数据介绍

数据⽂件train.csv和test.csv包含从0到9的⼿绘数字的灰度图像。

  • 每个图像的⾼度为28个像素,宽度为28个像素,总共为784个像素。
  • 每个像素具有与其相关联的单个像素值,指示该像素的亮度或暗度,较⾼的数字意味着较暗。该像素值是0到255之间的 整数,包括0和255。
  • 训练数据集(train.csv)有785列。第⼀列称为“标签”,是⽤户绘制的数字。其余列包含关联图像的像素值。

导入模块

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm
from sklearn.model_selection import train_test_split

import warnings
warnings.filterwarnings('ignore')

获取数据

train = pd.read_csv("./data/train.csv")
train.head()
labelpixel0pixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8...pixel774pixel775pixel776pixel777pixel778pixel779pixel780pixel781pixel782pixel783
01000000000...0000000000
10000000000...0000000000
21000000000...0000000000
34000000000...0000000000
40000000000...0000000000

5 rows × 785 columns

train.shape

(42000, 785)

确定特征值\目标值

train_image = train.loc[:, 'pixel0':]
train_image.head()
pixel0pixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9...pixel774pixel775pixel776pixel777pixel778pixel779pixel780pixel781pixel782pixel783
00000000000...0000000000
10000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000000

5 rows × 784 columns

train_label = train.loc[:, 'label']
train_label.head()

0 1
1 0
2 1
3 4
4 0
Name: label, dtype: int64

查看具体图像

num = train_image.loc[0,].values.reshape(28, 28) # 取第一个图像
num
array([[  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0, 188, 255,  94,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0, 191, 250, 253,  93,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0, 123, 248, 253, 167,  10,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,  80, 247, 253, 208,  13,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,  29, 207, 253, 235,  77,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,  54, 209, 253, 253,  88,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,  93, 254, 253, 238, 170,  17,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
         23, 210, 254, 253, 159,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  16,
        209, 253, 254, 240,  81,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  27,
        253, 253, 254,  13,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  20, 206,
        254, 254, 198,   7,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0, 168, 253,
        253, 196,   7,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,  20, 203, 253,
        248,  76,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,  22, 188, 253, 245,
         93,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0, 103, 253, 253, 191,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,  89, 240, 253, 195,  25,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  15, 220, 253, 253,  80,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  94, 253, 253, 253,  94,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,  89, 251, 253, 250, 131,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0, 214, 218,  95,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0],
       [  0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,   0,
          0,   0]])
plt.imshow(num)
plt.axis("off")
plt.show()

在这里插入图片描述

def to_plot(n):
    num = train_image.loc[n,].values.reshape(28, 28)
    
    plt.imshow(num)
    plt.axis("off")
    plt.show()
to_plot(n=40)

在这里插入图片描述

数据基本处理

train_image.head()
pixel0pixel1pixel2pixel3pixel4pixel5pixel6pixel7pixel8pixel9...pixel774pixel775pixel776pixel777pixel778pixel779pixel780pixel781pixel782pixel783
00000000000...0000000000
10000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000000

5 rows × 784 columns

数据归一化处理

# 对数据特征值归一化处理
train_image = train_image.values / 255
train_label = train_label.values

数据集分割

x_train, x_val, y_train, y_val = train_test_split(train_image, train_label, train_size = 0.8, random_state=0)
print(x_train.shape, x_val.shape)

(33600, 784) (8400, 784)

特征降维和模型训练

import time
from sklearn.decomposition import PCA

# 多次使用pca,确定最后的最优模型

def n_components_analysis(n, x_train, y_train, x_val, y_val):
    # 记录开始时间
    start = time.time()
    
    # pca降维实现
    pca = PCA(n_components=n)
    print("特征降维,传递的参数为:{}".format(n))
    pca.fit(x_train)
    
    # 在训练集和测试集进行降维
    x_train_pca = pca.transform(x_train)
    x_val_pca = pca.transform(x_val)
    
    # 利用svc进行训练
    print("开始使用svc进行训练")
    ss = svm.SVC()
    ss.fit(x_train_pca, y_train)
    
    # 获取accuracy结果
    accuracy = ss.score(x_val_pca, y_val)
    
    # 记录结束时间
    end = time.time()
    print("准确率是:{}, 消耗时间是:{}s".format(accuracy, int(end-start)))
    
    return accuracy 
# 传递多个n_components,寻找合理的n_components:

n_s = np.linspace(0.70, 0.85, num=5)
accuracy = []

for n in n_s:
    tmp = n_components_analysis(n, x_train, y_train, x_val, y_val)
    accuracy.append(tmp)

特征降维,传递的参数为:0.7
开始使用svc进行训练
准确率是:0.9761904761904762, 消耗时间是:8s
特征降维,传递的参数为:0.7374999999999999
开始使用svc进行训练
准确率是:0.9779761904761904, 消耗时间是:8s
特征降维,传递的参数为:0.7749999999999999
开始使用svc进行训练
准确率是:0.9783333333333334, 消耗时间是:10s
特征降维,传递的参数为:0.8125
开始使用svc进行训练
准确率是:0.9798809523809524, 消耗时间是:12s
特征降维,传递的参数为:0.85
开始使用svc进行训练
准确率是:0.9803571428571428, 消耗时间是:14s

# 准确率可视化展示
plt.plot(n_s, np.array(accuracy), "r")
plt.show()

在这里插入图片描述

经过图形展示,选择合理的n_components, 最后综合考虑确定结果为:0.80

确定最优模型

pca = PCA(n_components=0.80) # 降维

pca.fit(x_train)
pca.n_components_

43

x_train_pca = pca.transform(x_train)
x_val_pca = pca.transform(x_val)
print(x_train_pca.shape, x_val_pca.shape)

(33600, 43) (8400, 43)

# 训练比较优的模型,计算accuracy

ss1 = svm.SVC()

ss1.fit(x_train_pca, y_train)

ss1.score(x_val_pca, y_val)

0.979047619047619

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

¥骁勇善战¥

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值