Python2.7+opencv2.4+numpy
opencv2.4只要将\opencv\build\pythonn2.7\cv2.pyd复制到\Python27\Lib\site-packages中就可以了
手写字符集在这里
http://yann.lecun.com/exdb/mnist/
60k个train,10k个test,28*28大小。先把图片上下左右的空白去掉,留下中间的方形空间,缩放到8*8加速。
SVM,knn,nn,boosting, RTrees的代码都是opencv python里现成的。直接调就好。
cv2里对这些input操作都必需要numpy了。需要注意下。。
读图时必需像这样
[numpy.float32(struct.unpack('B', item)[0])/numpy.float32(255) for item in byte]
显式的转换成numpy.float32的,否则上述SVM等分类器不支持float64。
Boosting在train 60k个item时候出错。其他分类器都没问题。
SVM没有调参数,事实上如果调参数SVM在10k个train的时候error rate就能到5%以内。
from cv2.cv import *
import cv2
import os
import struct
import numpy
class_n = 10
number_of_training_set = 2000 #0 for all, 60,000 max
number_of_test_set = 0 #0 for all, 10,000 max
trainimagepath = r'.\data\train-images.idx3-ubyte'
trainlabelpath = r'.\data\train-labels.idx1-ubyte'
testimagepath = r'.\data\t10k-images.idx3-ubyte'
testlabelpath = r'.\data\t10k-labels.idx1-ubyte'
def evalfun(method, y_val, test_labels, test_number_of_images):
count = 0
for item in range(test_number_of_images):
if y_val[item] == test_labels[item]:
count += 1
print method + ':' + str(float(count)/test_number_of_images)
def unroll_samples(samples):
sample_n, var_n = samples.shape
new_samples = numpy.zeros((sample_n * class_n, var_n+1), numpy.float32)
new_samples[:,:-1] = numpy.repeat(samples, class_n, axis=0)
new_samples[:,-1] = numpy.tile(numpy.arange(class_n), sample_n)
return new_samples
def unroll_responses(responses):
sample_n = len(responses