1.完成手写字体识别kaggle比赛:https://www.kaggle.com/c/digit-recognizer
2.环境:谷歌云盘(内有GPU运算,速度比我自己电脑快好多倍)
3.手写字体文件
train.csv文件第一列是图像标签,后面784(28*28)列是像素。
test.csv文件是784列像素。
sample_submission.csv是提交的文件
4.谷歌云盘启动
from google.colab import drive
drive.mount('/content/drive/')
!ls "/content/drive/My Drive/"
import tensorflow as tf
tf.test.gpu_device_name()
!ls
# 切换目录
import os
os.chdir("/content/drive/My Drive/home")
!ls
5.预处理图像
# 预处理数据
import keras
import pandas as pd
import numpy as np
from keras.utils import np_utils
dataset = pd.read_csv('train.csv')
dataset = np.array(dataset)
x = dataset[:,1:]
x = x/255
x = x.reshape(-1,28,28,1)
y = dataset[:,0]
y = np_utils.to_categorical(y)
print(x.shape)
print(y.shape)
6.构建模型训练
from keras.layers.convolutional import Conv2D, MaxPooling2D
from keras.layers import *
from keras.models import Sequential
def creat_model():
model = Sequential()
model.add(Conv2D(32,(5,5), padding='same',input_shape=(28,28,1),activation='relu'))
model.add(Conv2D(64,(5,5), padding='same',activation='relu'))
model.add(MaxPooling2D((2,2)))
model.add(Dropout(0.3))
model.add(Flatten())
model.add(Dense(64,activation= 'relu'))
model.add(Dropout(0.3))
model.add(Dense(128,activation= 'relu'))
model.add(Dropout(0.3))
model.add(Dense(10,activation= 'softmax'))
model.compile(loss='categorical_crossentropy', optimizer='sgd', metrics=['accuracy'])
return model
model = creat_model()
print(model.summary())
model.fit(x=x,y=y,epochs=10,batch_size=20)
7.数据预测
dataset2 = pd.read_csv('test.csv')
dataset2 = np.array(dataset2)
dataset2 = dataset2.reshape(-1,28,28,1)
print(dataset2.shape)
y = model.predict(dataset2)
print(y.shape)
y = np.argmax(y,axis=1)
test = pd.read_csv('sample_submission.csv')
test = np.array(test)
for i in range(len(y)):
test[i][1] = y[i]
test = pd.DataFrame(test,columns=['ImageId', 'Label'])
print(test)
test.to_csv('sample_submission.csv',index=False)
8.比赛成绩
准确率98.428%,那个模型训练多一下次数准确率应该可以去到更好。
9.项目截图