支持向量机手写数字识别
I have been sitting around on the MNIST data set for a while now. MNIST database is a large database of handwritten digits and these are provided in the Kaggle Knowledge Competition Digit Recognizer. I have been sitting on this data set for so long in fact, that the last thing I have written for it was last August. I wrote a Python script, that took the training data and created bmp image files of each data point. So you would end up with a folder with 42000 28 by 28 pixel images (about 74.5 MB of memory). I have uploaded it as a Gist here for those interested.
我已经坐了一段时间的MNIST数据集。 MNIST数据库是一个庞大的手写数字数据库,在Kaggle知识竞赛数字识别器中提供 。 实际上,我一直坐在这个数据集上已经有很长时间了,以至于我为此写的最后一本书是去年8月。 我编写了一个Python脚本,该脚本获取了训练数据并创建了每个数据点的bmp图像文件。 因此,您最终将得到一个包含42000 28 x 28像素图像(大约74.5 MB内存)的文件夹。 我已将其作为要点上传给了那些有兴趣的人。
![Digits from MNIST data set](http://ratherreadblog.com/wp-content/uploads/2016/02/number_banner_vectorized.png)
Digits from MNIST data set
MNIST数据集中的数字
But what I have done this weekend, was using the Linear Support Vector Classification implemented in the scikit-learn module to create a simple model, that determines the digit according to the given pixel data with an accuracy of 84% on the test data in the Kaggle Competition. My implementation is based on this example on using a SVM to recognize hand written digits.
但是,我本周末所做的工作是使用scikit