开发平台 google colab + python3.6
package: panads,sklearn
panads 用来处理csv文件 教程链接
sklearn 是python 机器学习中常用的第三方模块 教程链接
knn 讲解以及使用 sklearn的教程链接
还是和以前一样 先处理colab的文件夹挂载问题
from google.colab import drive
drive.mount('/content/drive/')
import os
os.chdir("/content/drive/My Drive/kaggle")
再导入要用的package (matplotlib和seaborn没有用到,习惯性导入这些)
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
#将那些用matplotlib绘制的图显示在页面里而不是弹出一个窗口
%matplotlib inline
np.random.seed(2)
from sklearn.model_selection import train_test_split
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import KNeighborsClassifier
import itertools
导入数据
train_data = pd.read_csv("mnist/train.csv")
test_data = pd.read_csv("mnist/test.csv")
x_train = train_data.values[:,1:]
y_train = train_data.values[:,0]
test_value = test_data.values
定义knn算法
def knnClassfiyer(value,lable):
knnclf = KNeighborsClassifier()
knnclf.fit(value,np.ravel(lable))
return knnclf
训练和预测
knnclf = knnClassfiyer(x_train,y_train)
test_label = knnclf.predict(test_value)
以下是获取到的test_label(predict时间很长)
保存模型(kaggle需要交csv 包含ImageId,label)
test_label = pd.Series(test_label,name="Label")
submission = pd.concat([pd.Series(range(1,28001),name = "ImageId"),test_label],axis = 1)
submission.to_csv("mnist/Result_sklearn_KNN.csv",index=False)
使用kaggle api 提交 (需要先将kaggle.json 放入root下,可以参考colab和kaggle使用)
!cp /content/drive/'My Drive'/kaggle/kaggle.json /root/.kaggle
!kaggle competitions submit -c digit-recognizer -f mnist/Result_sklearn_KNN.csv -m "forth submit"
运行完会有Successfully submitted to Digit Recognizer,再到kaggle 中My submissions 中查看