PCA提高mnist数据集的训练速度(利用SVM)

 先尝试不采用PCA降维的SVM训练模型:

from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
import datetime

(X_train,Y_train),(X_test,Y_test) = mnist.load_data()
X_train_1 = X_train.reshape(60000,784)
Y_train_1 = Y_train.reshape(-1,1)

starttime = datetime.datetime.now() #用来计算PCA+SVM总的计算时间

### 利用支持向量机训练
svc = SVC()  #这里利用默认参数就好,我试验过,默认参数的训练效果已经十分接近手动找最佳参数的效果了
x_train,x_test,y_train,y_test = train_test_split(X_train_1, Y_train_1, test_size = 0.25, random_state = 1)
y_train = y_train.reshape(-1,1).ravel() #最后加上.ravel(),不然jupyter notebook会报错
svc.fit(x_train,y_train)
accuracy = svc.score(x_test,y_test)
print("accuracy is ",accuracy)

endtime = datetime.datetime.now()
time = (endtime - starttime).seconds
print("time is ",time) 
# accuracy is  0.9766
# time is  189

再尝试采用PCA降维的SVM训练模型:

from keras.datasets import mnist
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.decomposition import PCA
from sklearn.svm import SVC
import datetime

(X_train,Y_train),(X_test,Y_test) = mnist.load_data()
X_train_1 = X_train.reshape(60000,784)
Y_train_1 = Y_train.reshape(-1,1)
#复制一份训练集,后面直接对原始数据降维
X_train_copy = X_train_1

starttime = datetime.datetime.now()  

###找到能够保留95%方差的n_components
pca_1=PCA(n_components=0.95, copy = False)  #copy = False指直接对数据集降维
                                            #调用PCA,会自动均值归一化
X_reduce = pca_1.fit_transform(X_train_1)
n_x = pca_1.n_components

###利用上面找到的n_components,降维
pca=PCA(n_components=n_x, copy = False)
X_reduce_fianl =pca.fit_transform(X_train_copy)
print("X_reduce_fianl :",X_reduce_fianl.shape)
# X_reduce_fianl : (60000, 154) ,在保留95%方差的情况下,从784降至154,效果赞!

### 利用支持向量机训练
svc = SVC()
x_train,x_test,y_train,y_test = train_test_split\
    (X_reduce_fianl, Y_train_1, test_size = 0.25, random_state = 1)
y_train = y_train.reshape(-1,1).ravel() 
svc.fit(x_train,y_train)
accuracy = svc.score(x_test,y_test)
print("accuracy is ",accuracy)

endtime = datetime.datetime.now()
time = (endtime - starttime).seconds
print("time is ",time)
# accuracy is  0.9806
# time is  67

可以看出来,利用PCA后,时间大幅降低65%!并且精确性还提高了。

分析:

  • 因为mnist每个数字的图像中,图像边界部分的像素几乎都是白色的,把这些像素删除并不会对识别有什么影响,从而可以大幅降低维度,以至于还能够保留95%的方差。特征大幅减少,学习算法当然能运行地更快,所以才会看到时间大幅减少。
  • 准确率反而提高了一点,我想可能是因为剔除了噪声。举个例子,有些图片有点模糊时,通过调节“锐化”,是不是感觉看的更清晰一些了?

 

 

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值