Digit Recognizer (Kaggle)

Classify handwritten digits using the famous MNIST data 

This competition is the first in a series of tutorial competitions designed to introduce people to Machine Learning.

The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is.  As the competition progresses, we will release tutorials which explain different machine learning algorithms and help you to get started.


The data for this competition were taken from the MNIST dataset. The MNIST ("Modified National Institute of Standards and Technology") dataset is a classic within the Machine Learning community that has been extensively studied.  More detail about the dataset, including Machine Learning algorithms that have been tried on it and their levels of success, can be found at http://yann.lecun.com/exdb/mnist/index.html.


手写体数字的识别,一个比较简单的问题。主要是特征太多,所以用PCA降维处理,然后用knn就可以得到一个准确率相当不错的结果了。

ipython notebook 下根据测试数据生成数字图案的代码:

%pylab
import pandas as pd

img = pd.read_csv('test.csv')

p1 = img.values[1]
pix = []
for i in range(28):
    pix.append([])
    for j in range(28):
        pix[i].append(p1[i*28+j])
        
plt.imshow(pix)

pca+knn 代码:

import csv
import numpy
import pandas as pd
from sklearn.neighbors import KNeighborsClassifier
from sklearn.ensemble import BaggingClassifier
from sklearn.cross_validation import cross_val_score
from sklearn.decomposition import PCA

input_df = pd.read_csv('train.csv', header=0)
submit_df  = pd.read_csv('test.csv',  header=0)

# merge the two DataFrames into one
df = pd.concat([input_df, submit_df])
df = df .reset_index()
df = df.drop('index', axis=1)
df = df.reindex_axis(input_df.columns, axis=1)


features = input_df.values[:, 1:]
labels = input_df.values[:,0]

pca = PCA(n_components = 64)
pca.fit(df.values[:,1:])
features = pca.transform(features)
pred_data = pca.transform(submit_df.values)

clf = KNeighborsClassifier().fit(features, labels)
#print cross_val_score(clf, features, labels)
output = clf.predict(pred_data).astype(int)
ids = range(1, 28001)
# write to csv file
predictions_file = open("KNN.csv", "wb")
open_file_object = csv.writer(predictions_file)
open_file_object.writerow(["ImageId","Label"])
open_file_object.writerows(zip(ids, output))
predictions_file.close()

print "done."


  • 0
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值