机器学习练习题:为MNIST数据集构建一个分类器,并在测试集上达成超过97% 的精度

参考代码:

from sklearn.datasets import fetch_mldata
from sklearn.model_selection import GridSearchCV
from sklearn.neighbors import KNeighborsClassifier
import numpy as np

import time

mnist = fetch_mldata('MNIST original', data_home='./mnist1')
X, y = mnist["data"], mnist["target"]
X_train, y_train = X[:60000], y[:60000]
shuffle_index = np.random.permutation(60000)
#print(shuffle_index)
X_traina, y_traina = X_train[shuffle_index], y_train[shuffle_index]
X_train, y_train = X_traina[:50000], y_traina[:50000]

t1 = time.time()
X_test, y_test = X[60000:], y[60000:]


param_grid = [{'n_neighbors':[3,4,5], 'weights':['uniform', 'distance']}]
knn_clf = KNeighborsClassifier()
#knn_clf = KNeighborsClassifier(n_neighbors=3,weights='distance')
grid_search = GridSearchCV(knn_clf, param_grid, cv=3, scoring='accuracy', n_jobs=4, verbose=2)



ret_dict1={0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
for i in range(len(y_train)):
    ret_dict1[int(y_train[i])] += 1

print(11, ret_dict1)

t2 = time.time()
#knn_clf.fit(X_train, y_train)
grid_search.fit(X_train, y_train)
t3 = time.time()
print('train over.')

#ret_dict1={0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
#for i in range(len(y_train)):
#    ret_dict1[int(y_train[i])] += 1
        
#print(11, ret_dict1)

#y_pred = knn_clf.predict(X_test)
y_pred = grid_search.predict(X_test)
ret_dict2={0:0, 1:0, 2:0, 3:0, 4:0, 5:0, 6:0, 7:0, 8:0, 9:0}
t4 = time.time()

for i in range(len(y_pred)):
    ret_dict2[int(y_pred[i])] += 1
        
print(22, ret_dict2)

from sklearn.metrics import precision_score, recall_score,confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print('\ncm=', cm)
ps = precision_score(y_test, y_pred, average=None)
print('\nps=', ps, np.average(ps))
rs = recall_score(y_test, y_pred, average=None)
print('\nrs=', rs, np.average(rs))
para = grid_search.best_params_
print('\nBest para:', para)
t5 = time.time()

print('Time cost:')
print('    Preprocess:', t2-t1)
print('    Train:', t3-t2)
print('    Predict:', t4-t3)
print('    Calc accu:', t5-t4)

输出:
(11, {0: 4923, 1: 5647, 2: 4923, 3: 5109, 4: 4927, 5: 4505, 6: 4958, 7: 5189, 8: 4863, 9: 4956})
Fitting 3 folds for each of 6 candidates, totalling 18 fits
[Parallel(n_jobs=4)]: Done 18 out of 18 | elapsed: 181.2min finished
train over.
(22, {0: 998, 1: 1173, 2: 1011, 3: 1008, 4: 973, 5: 901, 6: 968, 7: 1034, 8: 928, 9: 1006})
(’\ncm=’, array([[ 972, 1, 1, 0, 0, 1, 4, 1, 0, 0],
[ 0, 1131, 2, 0, 0, 0, 2, 0, 0, 0],
[ 9, 5, 994, 3, 2, 0, 0, 16, 3, 0],
[ 0, 1, 3, 976, 1, 15, 1, 6, 2, 5],
[ 1, 5, 0, 0, 946, 0, 4, 3, 0, 23],
[ 3, 0, 0, 8, 2, 866, 6, 1, 3, 3],
[ 6, 2, 0, 0, 3, 2, 945, 0, 0, 0],
[ 0, 19, 4, 0, 3, 0, 0, 992, 0, 10],
[ 3, 2, 5, 14, 5, 13, 5, 4, 918, 5],
[ 4, 7, 2, 7, 11, 4, 1, 11, 2, 960]]))
(’\nps=’, array([ 0.9739479 , 0.96419437, 0.98318497, 0.96825397, 0.97225077,
0.96115427, 0.97623967, 0.95938104, 0.98922414, 0.95427435]), 0.97021054523864747)
(’\nrs=’, array([ 0.99183673, 0.99647577, 0.96317829, 0.96633663, 0.96334012,
0.97085202, 0.98643006, 0.96498054, 0.94250513, 0.95143707]), 0.96973723812429768)
(’\nBest para:’, {‘n_neighbors’: 4, ‘weights’: ‘distance’})
Time cost:
(’ Preprocess:’, 0.009664058685302734)
(’ Train:’, 10876.40351510048)
(’ Predict:’, 478.908567905426)
(’ Calc accu:’, 0.1060020923614502)

准确率为:0.9702

训练集调整为6w时,结果:
(11, {0: 5923, 1: 6742, 2: 5958, 3: 6131, 4: 5842, 5: 5421, 6: 5918, 7: 6265, 8: 5851, 9: 5949})
Fitting 3 folds for each of 6 candidates, totalling 18 fits
[Parallel(n_jobs=4)]: Done 18 out of 18 | elapsed: 243.7min finished
train over.
(22, {0: 1000, 1: 1169, 2: 1011, 3: 1006, 4: 974, 5: 895, 6: 967, 7: 1036, 8: 931, 9: 1011})
(’\ncm=’, array([[ 973, 1, 1, 0, 0, 1, 3, 1, 0, 0],
[ 0, 1132, 2, 0, 0, 0, 1, 0, 0, 0],
[ 10, 5, 995, 2, 1, 0, 0, 16, 3, 0],
[ 0, 1, 3, 974, 1, 14, 1, 7, 4, 5],
[ 1, 5, 0, 0, 950, 0, 4, 3, 0, 19],
[ 4, 0, 0, 9, 2, 862, 7, 1, 3, 4],
[ 4, 2, 0, 0, 3, 3, 946, 0, 0, 0],
[ 0, 17, 4, 0, 3, 0, 0, 994, 0, 10],
[ 5, 2, 4, 14, 5, 11, 4, 4, 920, 5],
[ 3, 4, 2, 7, 9, 4, 1, 10, 1, 968]]))
(’\nps=’, array([ 0.973 , 0.96834902, 0.98417409, 0.96819085, 0.97535934,
0.96312849, 0.97828335, 0.95945946, 0.98818475, 0.95746785]), 0.97155972019459591)
(’\nrs=’, array([ 0.99285714, 0.99735683, 0.96414729, 0.96435644, 0.96741344,
0.96636771, 0.9874739 , 0.96692607, 0.94455852, 0.95936571]), 0.97108230526644035)
(’\nBest para:’, {‘n_neighbors’: 4, ‘weights’: ‘distance’})
Time cost:
(’ Preprocess:’, 0.012269020080566406)
(’ Train:’, 14639.71526813507)
(’ Predict:’, 552.7666699886322)
(’ Calc accu:’, 1.414395809173584)

精度有提升。

  • 0
    点赞
  • 9
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

weixin_41813620

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值