题目大意:手写数字的识别。每个数字由28*28的像素矩阵表示,也就是784个像素点。每个像素点的值between 0 and 255。
思路:knn在数字识别方面表现比较好,因为特征维数过多,kd_tree比较慢,所以我采用的是基于ball_tree的knn。每个像素的值都归一化,非0值都变成1。
工具:py2.7,sklearn,pycharm
# -*- coding: utf-8 -*-
import csv
import numpy as ny
from sklearn.neighbors import KNeighborsClassifier
def to_int(list):
n=len(list)
for i in range(n):
list[i]=int(list[i])
return list
# 归一化
def normalize(array):
n,m=array.shape
for i in range(n):
for j in range(m):
if array[i,j]!=0:
array[i,j]=1
return array
# 读取训练集
def load_train_data():
train_data=[]
train_label=[]
with open('E:\\data\\kaggle\\digit recognizer\\train.csv','rb') as file:
lines=csv.reader(file