Kaggle手写数字识别开源记录两层神经网络准确率99%

文:张一极

更多人工智能相关文章:http://siligence.ai

在kaggle中,参赛人数比较多的比较经典的就是手写数字识别大赛,作为深度学习的hello world,吸引了两千多支队伍的参赛,在今天我们分享的是不用kaggle自带的数据集,我们使用mnist的数据集进行训练,也就是我们刚开始就遇到的mnist的手写数字数据集进行模型训练和搭建,然后对于kaggle比赛中的28000条数据进行分析进行参赛。

首先,下载数据集:

kaggle的test.csv测试数据集自行下载

本文一切代码基于python+keras的神经网络搭建:

请将数据集和源代码放在同一个目录下:

首先导入keras层和模型函数:

from keras.datasets import mnist from keras import models from keras import layers import numpy
test = numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)
test = test.reshape((28000,28*28)) test = test.astype('float32')/255 (train_images,train_labels),(test_images,test_labels)=mnist.load_data() 同时读取mnist的数据集,这份数据集为keras自带,直接包括进来即可:

digit = test_images[0] import matplotlib.pyplot as plt plt.imshow(digit, cmap=plt.cm.binary) plt.show() 图片样式:

每一个数据单元都是这样子的形式,28*28像素对图片,对应不同的手写数字:

那么我们通常是用kaggle自己的数据集进行训练,结果不尽如人意,原因其实就是数据集过小了,对应的一些权重不完备,那我们今天就用mnist的60000数据集进行训练尝试:

以下是网络结构

network = models.Sequential() network.add(layers.Dense(512,activation = 'relu',input_shape = (28*28,))) network.add(layers.Dense(10,activation = 'softmax'))

教科书般的利用两个全连阶层,使用512个神经元,relu激活函数,对应的十个类别

network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) 网络编译

train_images = train_images.reshape((60000,2828)) train_images = train_images.astype('float32')/255 test_images = test_images.reshape((10000,2828)) test_images = test_images.astype('float32')/255 digit = test_images[0] from keras.utils import to_categorical print(test_labels) train_labels = to_categorical(train_labels,num_classes = 10) test_labels = to_categorical(test_labels,num_classes = 10)

数据集格式化,首先要变为60000个独立array,每一个都是28*28像素的图像,然后进行归一化为了数据分析结果更为精确,将数据处以255维持在(0,1)之间

模型训练和分类:

network.fit(train_images,train_labels,epochs = 10,batch_size = 128) results = network.predict_classes(test,batch_size=128) test_loss, test_acc = network.evaluate(test_images, test_labels) print('自带测试集效果', test_acc) pd.DataFrame( {"ImageId": range(1, len(results) + 1), "Label": results} ).to_csv('result.csv', index=False, header=True) print('ok,saved') 完成:

Epoch 1/10 60000/60000 [==============================] - 2s 35us/step - loss: 0.1215 - acc: 0.9646 Epoch 2/10 60000/60000 [==============================] - 2s 35us/step - loss: 0.0767 - acc: 0.9769 Epoch 3/10 60000/60000 [==============================] - 2s 34us/step - loss: 0.0554 - acc: 0.9832 Epoch 4/10 60000/60000 [==============================] - 2s 34us/step - loss: 0.0410 - acc: 0.9877 Epoch 5/10 60000/60000 [==============================] - 2s 35us/step - loss: 0.0309 - acc: 0.9910 Epoch 6/10 60000/60000 [==============================] - 2s 35us/step - loss: 0.0233 - acc: 0.9935 Epoch 7/10 60000/60000 [==============================] - 2s 34us/step - loss: 0.0178 - acc: 0.9948 Epoch 8/10 60000/60000 [==============================] - 2s 34us/step - loss: 0.0140 - acc: 0.9958 Epoch 9/10 60000/60000 [==============================] - 2s 35us/step - loss: 0.0107 - acc: 0.9969 Epoch 10/10 60000/60000 [==============================] - 2s 34us/step - loss: 0.0084 - acc: 0.9977 10000/10000 [==============================] - 1s 65us/step test_acc: 0.9829 ok,saved 保存完毕

将结果提交:

成绩大概在0.99左右,如果你有更好的参数和效果,可以在评论区提出你的观点,我们一起交流。

完整源码如下:

from keras.datasets import mnist from keras import models from keras import layers import numpy
test = numpy.loadtxt(open("test.csv","rb"),delimiter=",",skiprows=1)
test = test.reshape((28000,2828)) test = test.astype('float32')/255 (train_images,train_labels),(test_images,test_labels)=mnist.load_data() digit = test_images[0] import matplotlib.pyplot as plt plt.imshow(digit, cmap=plt.cm.binary) plt.show() network = models.Sequential() network.add(layers.Dense(512,activation = 'relu',input_shape = (2828,))) network.add(layers.Dense(10,activation = 'softmax')) network.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['accuracy']) train_images = train_images.reshape((60000,2828)) train_images = train_images.astype('float32')/255 test_images = test_images.reshape((10000,2828)) test_images = test_images.astype('float32')/255 digit = test_images[0] from keras.utils import to_categorical train_labels = to_categorical(train_labels,num_classes = 10) test_labels = to_categorical(test_labels,num_classes = 10) network.fit(train_images,train_labels,epochs = 10,batch_size = 128) results = network.predict_classes(test,batch_size=128) test_loss, test_acc = network.evaluate(test_images, test_labels) print('test_acc:', test_acc) pd.DataFrame( {"ImageId": range(1, len(results) + 1), "Label": results} ).to_csv('result.csv', index=False, header=True) print('ok,saved')

转载于:https://my.oschina.net/u/3999598/blog/3054889

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值