python数字识别kaggle论文_[Kaggle] Digit Recognizer 手写数字识别（卷积神经网络）

最新推荐文章于 2022-09-10 14:37:01 发布

weixin_39926193

最新推荐文章于 2022-09-10 14:37:01 发布

阅读量220

点赞数

文章标签： python数字识别kaggle论文

上一篇的简单神经网络，将28*28的图片展平了，每个像素在空间上的位置关系是没有考虑的，空间的信息丢失。

1. 使用 LeNet 预测

1.1 导入包

from keras import backend as K # 兼容不同后端的代码

from keras.models import Sequential

from keras.layers.convolutional import Conv2D

from keras.layers.convolutional import MaxPooling2D

from keras.layers.core import Activation

from keras.layers.core import Dense

from keras.layers.core import Flatten

from keras.utils import np_utils

from keras.optimizers import SGD, Adam, RMSprop

import numpy as np

%matplotlib inline

import matplotlib.pyplot as plt

import pandas as pd

1.2 建立 LeNet 模型

# 图片格式问题

# K.image_data_format() == 'channels_last'

# 默认是last是通道 K.set_image_dim_ordering("tf")

# K.image_data_format() == 'channels_first' # K.set_image_dim_ordering("th")

class LeNet:

@staticmethod

def build(input_shape, classes):

model = Sequential()

model.add(Conv2D(20,kernel_size=5,padding='same',

input_shape=input_shape,activation='relu'))

model.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

model.add(Conv2D(50,kernel_size=5,padding='same',activation='relu'))

model.add(MaxPooling2D(pool_size=(2,2),strides=(2,2)))

model.add(Flatten())

model.add(Dense(500, activation='relu'))

model.add(Dense(classes,activation='softmax'))

return model

1.3 读入数据

train = pd.read_csv('train.csv')

y_train_full = train['label']

X_train_full = train.drop(['label'], axis=1)

X_test_full = pd.read_csv('test.csv')

X_train_full.shape

输出：

(42000, 784)

数据格式转换，增加一个通道维度

X_train = np.array(X_train_full).reshape(-1,28,28) / 255.0

X_test = np.array(X_test_full).reshape(-1,28,28)/255.0

y_train = np_utils.to_categorical(y_train_full, 10) # 转成oh编码

X_train = X_train[:, :, :, np.newaxis]

# m,28,28 --> m, 28, 28, 1(单通道)

X_test = X_test[:, :, :, np.newaxis]

1.4 定义模型

model = LeNet.build(input_shape=(28, 28, 1), classes=10)

定义优化器，配置模型

opt = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)

model.compile(loss="categorical_crossentropy",

optimizer=opt, metrics=["accuracy"])

注意：标签不采用 one-hot 编码的话，这里使用 loss="sparse_categorical_crossentropy"

1.5 训练

history = model.fit(X_train, y_train, epochs=20, batch_size=128,

validation_split=0.2)

Epoch 1/20

263/263 [==============================] - 26s 98ms/step -

loss: 0.2554 - accuracy: 0.9235 -

val_loss: 0.0983 - val_accuracy: 0.9699

Epoch 2/20

263/263 [==============================] - 27s 103ms/step -

loss: 0.0806 - accuracy: 0.9761 -

val_loss: 0.0664 - val_accuracy: 0.9787

...

Epoch 20/20

263/263 [==============================] - 25s 97ms/step -

loss: 0.0182 - accuracy: 0.9953 -

val_loss: 0.0405 - val_accuracy: 0.9868

可以看见第2轮迭代结束，训练集准确率就 97.6%了，效果比之前的简单神经网络好很多

模型总结

model.summary()

Model: "sequential"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

conv2d (Conv2D) (None, 28, 28, 20) 520

_________________________________________________________________

max_pooling2d (MaxPooling2D) (None, 14, 14, 20) 0

_________________________________________________________________

conv2d_1 (Conv2D) (None, 14, 14, 50) 25050

_________________________________________________________________

max_pooling2d_1 (MaxPooling2 (None, 7, 7, 50) 0

_________________________________________________________________

flatten (Flatten) (None, 2450) 0

_________________________________________________________________

dense (Dense) (None, 500) 1225500

_________________________________________________________________

dense_1 (Dense) (None, 10) 5010

=================================================================

Total params: 1,256,080

Trainable params: 1,256,080

Non-trainable params: 0

_________________________________________________________________

绘制模型结构图

from keras.utils import plot_model

plot_model(model, './model.png', show_shapes=True)

1.6 绘制训练曲线

pd.DataFrame(history.history).plot(figsize=(8, 5))

plt.grid(True)

plt.gca().set_ylim(0, 1) # set the vertical range to [0-1]

plt.show()

1.7 预测提交

y_pred = model.predict(X_test)

pred = y_pred.argmax(axis=1).reshape(-1)

print(pred.shape)

image_id = pd.Series(range(1,len(pred)+1))

output = pd.DataFrame({'ImageId':image_id, 'Label':pred})

output.to_csv("submission_NN.csv", index=False)

LeNet 模型得分 0.98607，比上一篇的简单NN模型(得分 0.97546)，好了 1.061%

2. 使用 VGG16 迁移学习

VGG16 help 文档：

Help on function VGG16 in module tensorflow.python.keras.applications.vgg16:

VGG16(include_top=True, weights='imagenet', input_tensor=None, input_shape=None, pooling=None, classes=1000, classifier_activation='softmax')

Instantiates the VGG16 model.

Reference paper:

- [Very Deep Convolutional Networks for Large-Scale Image Recognition](

https://arxiv.org/abs/1409.1556) (ICLR 2015)

By default, it loads weights pre-trained on ImageNet. Check 'weights' for

other options.

This model can be built both with 'channels_first' data format

(channels, height, width) or 'channels_last' data format

(height, width, channels).

The default input size for this model is 224x224.

Caution: Be sure to properly pre-process your inputs to the application.

Please see `applications.vgg16.preprocess_input` for an example.

Arguments:

include_top: whether to include the 3 fully-connected

layers at the top of the network.

weights: one of `None` (random initialization),

'imagenet' (pre-training on ImageNet),

or the path to the weights file to be loaded.

input_tensor: optional Keras tensor

(i.e. output of `layers.Input()`)

to use as image input for the model.

input_shape: optional shape tuple, only to be specified

if `include_top` is False (otherwise the input shape

has to be `(224, 224, 3)`

(with `channels_last` data format)

or `(3, 224, 224)` (with `channels_first` data format).

It should have exactly 3 input channels,

and width and height should be no smaller than 32.

E.g. `(200, 200, 3)` would be one valid value.

pooling: Optional pooling mode for feature extraction

when `include_top` is `False`.

- `None` means that the output of the model will be

the 4D tensor output of the

last convolutional block.

- `avg` means that global average pooling

will be applied to the output of the

last convolutional block, and thus

the output of the model will be a 2D tensor.

- `max` means that global max pooling will

be applied.

classes: optional number of classes to classify images

into, only to be specified if `include_top` is True, and

if no `weights` argument is specified.

classifier_activation: A `str` or callable. The activation function to use

on the "top" layer. Ignored unless `include_top=True`. Set

`classifier_activation=None` to return the logits of the "top" layer.

Returns:

A `keras.Model` instance.

Raises:

ValueError: in case of invalid argument for `weights`,

or invalid input shape.

ValueError: if `classifier_activation` is not `softmax` or `None` when

using a pretrained top layer.

2.1 导入包

import numpy as np

%matplotlib inline

import matplotlib.pyplot as plt

import pandas as pd

import cv2

from keras.optimizers import Adam

from keras.models import Model

from keras.utils import np_utils

from keras.models import Sequential

from keras.layers import Flatten

from keras.layers import Dense

from keras.layers import Input

from keras.layers import Dropout

from keras.applications.vgg16 import VGG16

2.2 定义模型

vgg16 = VGG16(weights='imagenet',include_top=False,

input_shape=(32, 32, 3))

# VGG16 模型在include_top=False时，可以自定义输入大小，至少32x32，通道必须是3

mylayer = vgg16.output

mylayer = Flatten()(mylayer)

mylayer = Dense(128, activation='relu')(mylayer)

mylayer = Dropout(0.3)(mylayer)

mylayer = Dense(10, activation='softmax')(mylayer)

model = Model(inputs=vgg16.inputs, outputs=mylayer)

for layer in vgg16.layers:

layer.trainable = False # vgg16的各个层不训练

2.3 数据处理

train = pd.read_csv('train.csv')

y_train_full = train['label']

X_train_full = train.drop(['label'], axis=1)

X_test_full = pd.read_csv('test.csv')

将单通道的数据，复制成3通道的(vgg16要求3通道的)，再resize成 32*32的，vgg16 要求图片最低分辨率是 32*32

def process(data):

data = np.array(data).reshape(-1,28,28)

output = np.zeros((data.shape[0], 32, 32, 3))

for i in range(data.shape[0]):

img = data[i]

rgb_array = np.zeros((img.shape[0], img.shape[1], 3), "uint8")

rgb_array[:, :, 0], rgb_array[:, :, 1], rgb_array[:, :, 2] = img, img, img

pic = cv2.resize(rgb_array, (32, 32), interpolation=cv2.INTER_LINEAR)

output[i] = pic

output = output.astype('float32')/255.0

return output

y_train = np_utils.to_categorical(y_train_full, 10)

X_train = process(X_train_full)

X_test = process(X_test_full)

print(X_train.shape)

print(X_test.shape)

输出：

(42000, 32, 32, 3)

(28000, 32, 32, 3)

看一看处理后的图片

img = X_train[0]

plt.imshow(img)

np.set_printoptions(threshold=np.inf)# 全部显示矩阵

# print(X_train[0])

2.4 配置模型、训练

opt = Adam(learning_rate=0.001, beta_1=0.9, beta_2=0.999, decay=0.01)

model.compile(loss="categorical_crossentropy",

optimizer=opt, metrics=["accuracy"])

history = model.fit(X_train, y_train, epochs=50, batch_size=128,

validation_split=0.2)

输出：

Epoch 1/50

263/263 [==============================] - 101s 384ms/step -

loss: 0.9543 - accuracy: 0.7212 -

val_loss: 0.5429 - val_accuracy: 0.8601

...

Epoch 10/50

263/263 [==============================] - 110s 417ms/step -

loss: 0.3284 - accuracy: 0.9063 -

val_loss: 0.2698 - val_accuracy: 0.9263

...

Epoch 40/50

263/263 [==============================] - 114s 433ms/step -

loss: 0.2556 - accuracy: 0.9254 -

val_loss: 0.2121 - val_accuracy: 0.9389

...

Epoch 50/50

263/263 [==============================] - 110s 420ms/step -

loss: 0.2466 - accuracy: 0.9272 -

val_loss: 0.2058 - val_accuracy: 0.9406

model.summary()

输出：

Model: "functional_15"

_________________________________________________________________

Layer (type) Output Shape Param #

=================================================================

input_23 (InputLayer) [(None, 32, 32, 3)] 0

_________________________________________________________________

block1_conv1 (Conv2D) (None, 32, 32, 64) 1792

_________________________________________________________________

block1_conv2 (Conv2D) (None, 32, 32, 64) 36928

_________________________________________________________________

block1_pool (MaxPooling2D) (None, 16, 16, 64) 0

_________________________________________________________________

block2_conv1 (Conv2D) (None, 16, 16, 128) 73856

_________________________________________________________________

block2_conv2 (Conv2D) (None, 16, 16, 128) 147584

_________________________________________________________________

block2_pool (MaxPooling2D) (None, 8, 8, 128) 0

_________________________________________________________________

block3_conv1 (Conv2D) (None, 8, 8, 256) 295168

_________________________________________________________________

block3_conv2 (Conv2D) (None, 8, 8, 256) 590080

_________________________________________________________________

block3_conv3 (Conv2D) (None, 8, 8, 256) 590080

_________________________________________________________________

block3_pool (MaxPooling2D) (None, 4, 4, 256) 0

_________________________________________________________________

block4_conv1 (Conv2D) (None, 4, 4, 512) 1180160

_________________________________________________________________

block4_conv2 (Conv2D) (None, 4, 4, 512) 2359808

_________________________________________________________________

block4_conv3 (Conv2D) (None, 4, 4, 512) 2359808

_________________________________________________________________

block4_pool (MaxPooling2D) (None, 2, 2, 512) 0

_________________________________________________________________

block5_conv1 (Conv2D) (None, 2, 2, 512) 2359808

_________________________________________________________________

block5_conv2 (Conv2D) (None, 2, 2, 512) 2359808

_________________________________________________________________

block5_conv3 (Conv2D) (None, 2, 2, 512) 2359808

_________________________________________________________________

block5_pool (MaxPooling2D) (None, 1, 1, 512) 0

_________________________________________________________________

flatten_19 (Flatten) (None, 512) 0

_________________________________________________________________

dense_28 (Dense) (None, 128) 65664

_________________________________________________________________

dropout_9 (Dropout) (None, 128) 0

_________________________________________________________________

dense_29 (Dense) (None, 10) 1290

=================================================================

Total params: 14,781,642

Trainable params: 66,954

Non-trainable params: 14,714,688

_________________________________________________________________

绘制模型结构

from keras.utils import plot_model

plot_model(model, './model.png', show_shapes=True)

2.5 预测提交

y_pred = model.predict(X_test)

pred = y_pred.argmax(axis=1).reshape(-1)

print(pred.shape)

print(pred)

image_id = pd.Series(range(1,len(pred)+1))

output = pd.DataFrame({'ImageId':image_id, 'Label':pred})

output.to_csv("submission_NN.csv", index=False)

预测得分：0.93696

可能是由于 VGG16模型是用 224*224 的图片训练的权重，我们使用的是 28*28 的图片，可能不能很好的使用VGG16已经训练好的权重

长按或扫码关注我的公众号(Michael阿明)，一起加油、一起学习进步！

weixin_39926193

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
python数字识别kaggle论文_[Kaggle] Digit Recognizer 手写数字识别（卷积神经网络）

上一篇的简单神经网络，将28*28的图片展平了，每个像素在空间上的位置关系是没有考虑的，空间的信息丢失。1. 使用 LeNet 预测1.1 导入包from keras import backend as K # 兼容不同后端的代码from keras.models import Sequentialfrom keras.layers.convolutional import Conv2Dfrom ...
复制链接

扫一扫

python数字识别kaggle论文_[Kaggle] Digit Recognizer 手写数字识别（卷积神经网络）

“相关推荐”对你有帮助么？