一、 背景
CIFAR-10 数据集由 10 类中的 60000 张 32x32 彩色图像组成,每类 6000 张图像。有 50000 张训练图像和 10000 张测试图像。数据集分为五个训练批次和一个测试批次,每个批次有 10000 张图像。测试批次包含来自每个类的 1000 个随机选择的图像。训练批次以随机顺序包含剩余的图像,但某些训练批次可能包含来自一个类的图像多于另一个类的图像。在它们之间,训练批次正好包含来自每个类的 5000 张图像。以下是数据集中的类,以及每个类的 10 张随机图像:
存档包含文件data_batch_1,data_batch_2,…,data_batch_5,以及test_batch。这些文件中的每一个都是用cPickle生成的Python“pickled”对象
数据集包含另一个文件,称为batches.meta。它也包含一个 Python 字典对象。它具有以下条目:label_names为一个包含 10 个元素的列表,它为上述标签数组中的数字标签提供有意义的名称。例如,label_names[0] == “飞机”,label_names[1] == “汽车”等。
下载的cifar10数据集结构文件如下图所示:
二、 实验过程
1、自定义神经网络:
自定义神经网络的结构如下,包含3个卷积层和2个池化层,一个展平层和两个全连接层,最终的输出为一1*10的数组,对应10类物体。可以看出其参数量和可训练的参数量均为122570
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 30, 30, 32) 896
max_pooling2d (MaxPooling2D ) (None, 15, 15, 32) 0
conv2d_1 (Conv2D) (None, 13, 13, 64) 18496
max_pooling2d_1 (MaxPooling2D) (None, 6, 6, 64) 0
conv2d_2 (Conv2D) (None, 4, 4, 64) 36928
flatten (Flatten) (None, 1024) 0
dense (Dense) (None, 64) 65600
dense_1 (Dense) (None, 10) 650
Total params: 122,570
Trainable params: 122,570
Non-trainable params: 0
下图为自定义神经网络的据图编程过程,三个卷积层的激活函数均为relu
如下图所示为relu函数,x>0的时候,函数的导数直接就是1,不存在梯度衰减的问题。虽然ReLU函数缓解了梯度消失的问题,但是同时也带来另外一个问题,就是梯度死亡问题。可以看到在x<0的时候,函数是硬饱和的,这个时候导数直接为0了,一旦输入落进这个区域,那么神经元就不会更新权重了,这个现象称为神经元死亡。稍微值得欣慰的一点就是通过良好的初始化和学习率设置可以使得神经元死亡的概率不那么大。ReLU的另一个优点就是计算非常简单,只需要使用阈值判断即可,导数也是几乎不用计算。基于以上两个优点,ReLU的收敛速度要远远快于sigmoid和tanh。ReLU的第三大优点就是可以产生稀疏性,可以看到小于0的部分直接设置为0,这就使得神经网络的中间输出是稀疏的,有一定的Droupout的作用,也就能够在一定程度上防止过拟合。
2、基于VGG16的cifar-10图片分类
VGG16模型很好的适用于分类和定位任务VGG由5层卷积层、3层全连接层、softmax输出层构成,层与层之间使用max-pooling(最大化池)分开,所有隐层的激活单元都采用ReLU函数。如下图所示为VGG-16的基本结构。
VGG-16的结构及参数数量如下图所示,可见其总参数量为8954186,可训练的参数量为8951882。
Model: “sequential”
Layer (type) Output Shape Param #
conv2d (Conv2D) (None, 32, 32, 64) 1792
activation (Activation) (None, 32, 32, 64) 0
batch_normalization (Batch Normalization) (None, 32, 32, 64) 256
dropout (Dropout) (None, 32, 32, 64) 0
conv2d_1 (Conv2D) (None, 32, 32, 64) 36928
activation_1 (Activation) (None, 32, 32, 64) 0
batch_normalization_1 (Batch Normalization) (None, 32, 32, 64) 256
max_pooling2d (MaxPooling2D) (None, 16, 16, 64) 0
conv2d_2 (Conv2D) (None, 16, 16, 128) 73856
activation_2 (Activation) (None, 16, 16, 128) 0
batch_normalization_2 (BatchNormalization) (None, 16, 16, 128) 512
dropout_1 (Dropout) (None, 16, 16, 128) 0
conv2d_3 (Conv2D) (None, 16, 16, 128) 147584
activation_3 (Activation) (None, 16, 16, 128) 0
batch_normalization_3 (BatchNormalization) (None, 16, 16, 128) 512
max_pooling2d_1 (MaxPooling 2D) (None, 8, 8, 128) 0
conv2d_4 (Conv2D) (None, 8, 8, 256) 295168
activation_4 (Activation) (None, 8, 8, 256) 0
batch_normalization_4 (Batc hNormalization) (None, 8, 8, 256) 1024
dropout_2 (Dropout) (None, 8, 8, 256) 0
flatten (Flatten) (None, 16384) 0
dense (Dense) (None, 512) 8389120
activation_5 (Activation) (None, 512) 0
batch_normalization_5 (Batc hNormalization) (None, 512) 2048
dropout_3 (Dropout) (None, 512) 0
dense_1 (Dense) (None, 10) 5130
activation_6 (Activation) (None, 10) 0
Total params: 8,954,186
Trainable params: 8,951,882
Non-trainable params: 2,304
VGG16模型在python中部分定义如下:
三、 实验结果与评价
1、自定义神经网络:
如下图所示为对cifar10的图片进行可视化后的结果
下图所示为训练过程:
可见如下训练输出,可以看见每一epoch的step-loss都是在减小的,准确率都是在增加,最终的val_accuracy:0.7021
Epoch 1/10 1563/1563 - 47s 22ms/step - loss: 1.4941 - accuracy: 0.4534 - val_loss: 1.2436 - val_accuracy: 0.5529
Epoch 2/10 1563/1563 - 31s 20ms/step - loss: 1.1299 - accuracy: 0.5983 - val_loss: 1.0409 - val_accuracy: 0.6295
Epoch 3/10 1563/1563 - 30s 19ms/step - loss: 0.9870 - accuracy: 0.6520 - val_loss: 0.9618 - val_accuracy: 0.6647
Epoch 4/10 1563/1563 - 29s 19ms/step - loss: 0.8997 - accuracy: 0.6838 - val_loss: 0.9349 - val_accuracy: 0.6730
Epoch 5/10 1563/1563 - 29s 18ms/step - loss: 0.8251 - accuracy: 0.7110 - val_loss: 0.8920 - val_accuracy: 0.6907
Epoch 6/10 1563/1563 - 29s 19ms/step - loss: 0.7651 - accuracy: 0.7314 - val_loss: 0.9284 - val_accuracy: 0.6786
Epoch 7/10 1563/1563 - 29s 18ms/step - loss: 0.7124 - accuracy: 0.7511 - val_loss: 0.8382 - val_accuracy: 0.7118
Epoch 8/10 1563/1563 - 29s 19ms/step - loss: 0.6739 - accuracy: 0.7612 - val_loss: 0.8627 - val_accuracy: 0.7104
Epoch 9/10 1563/1563 - 28s 18ms/step - loss: 0.6336 - accuracy: 0.7772 - val_loss: 0.8597 - val_accuracy: 0.7125
Epoch 10/10 1563/1563 - 27s 17ms/step - loss: 0.5928 - accuracy: 0.7903 - val_loss: 0.8885 - val_accuracy: 0.7021
如下图所示为10次epoch的训练accuracy和val_accuracy,可见accuracy呈上升趋势,而val_accuracy在后期上升趋势放缓。
2、基于VGG16的cifar-10图片分类
下图所示为训练过程
如下为训练参数输出,本人电脑1050ti,每一epoch训练用时大约6分钟,整个训练过程大约持续3分钟,最终的val_sparse_categorical_accuracy为0.8416,相比自定义的神经网络模型提高了不少。
Epoch 1/30 176/176 [==============================] - 405s 2s/step - loss: 1.7683 -sparse_categorical_accuracy: 0.4373 - val_loss: 6.6153 - val_sparse_categorical_accuracy: 0.0990
Epoch 2/30 176/176 [==============================] - 376s 2s/step - loss: 1.1255 -sparse_categorical_accuracy: 0.6113 - val_loss: 3.4102 - val_sparse_categorical_accuracy: 0.2654
Epoch 3/30 176/176 [==============================] - 372s 2s/step - loss: 0.8594 -sparse_categorical_accuracy: 0.7013 - val_loss: 1.6747 - val_sparse_categorical_accuracy: 0.4878
Epoch 4/30 176/176 [==============================] - 363s 2s/step - loss: 0.7047 -sparse_categorical_accuracy: 0.7525 - val_loss: 0.8144 - val_sparse_categorical_accuracy: 0.7320
Epoch 5/30 176/176 [==============================] - 342s 2s/step - loss: 0.5976 -sparse_categorical_accuracy: 0.7910 - val_loss: 0.7348 - val_sparse_categorical_accuracy: 0.7636
Epoch 6/30 176/176 [==============================] - 341s 2s/step - loss: 0.5096 -sparse_categorical_accuracy: 0.8200 - val_loss: 0.7354 - val_sparse_categorical_accuracy: 0.7760
Epoch 7/30 176/176 [==============================] - 340s 2s/step - loss: 0.4427 -sparse_categorical_accuracy: 0.8454 - val_loss: 0.7187 - val_sparse_categorical_accuracy: 0.7764
Epoch 8/30 176/176 [==============================] - 342s 2s/step - loss: 0.3774 -sparse_categorical_accuracy: 0.8694 - val_loss: 0.7170 - val_sparse_categorical_accuracy: 0.7892
Epoch 9/30 176/176 [==============================] - 340s 2s/step - loss: 0.3332 -sparse_categorical_accuracy: 0.8835 - val_loss: 0.6022 - val_sparse_categorical_accuracy: 0.8180
Epoch 10/30176/176 [==============================] - 340s 2s/step - loss: 0.2839 -sparse_categorical_accuracy: 0.9001 - val_loss: 0.7328 - val_sparse_categorical_accuracy: 0.7820
Epoch 11/30 176/176 [==============================] - 341s 2s/step - loss: 0.2487 -sparse_categorical_accuracy: 0.9130 - val_loss: 0.6756 - val_sparse_categorical_accuracy: 0.8160
Epoch 12/30 176/176 [==============================] - 341s 2s/step - loss: 0.2211 -sparse_categorical_accuracy: 0.9223 - val_loss: 0.7470 - val_sparse_categorical_accuracy: 0.8072
Epoch 13/30 176/176 [==============================] - 339s 2s/step - loss: 0.1993 -sparse_categorical_accuracy: 0.9290 - val_loss: 0.7039 - val_sparse_categorical_accuracy: 0.8140
Epoch 14/30 176/176 [==============================] - 341s 2s/step - loss: 0.1797 -sparse_categorical_accuracy: 0.9364 - val_loss: 0.6957 - val_sparse_categorical_accuracy: 0.8180
Epoch 15/30 176/176 [==============================] - 340s 2s/step - loss: 0.1686 -sparse_categorical_accuracy: 0.9399 - val_loss: 0.6386 - val_sparse_categorical_accuracy: 0.8244
Epoch 16/30 176/176 [==============================] - 338s 2s/step - loss: 0.1500 -sparse_categorical_accuracy: 0.9458 - val_loss: 0.7909 - val_sparse_categorical_accuracy: 0.8088
Epoch 17/30 176/176 [==============================] - 338s 2s/step - loss: 0.1378 -sparse_categorical_accuracy: 0.9528 - val_loss: 0.6029 - val_sparse_categorical_accuracy: 0.8410
Epoch 18/30 176/176 [==============================] - 339s 2s/step - loss: 0.1312 -sparse_categorical_accuracy: 0.9531 - val_loss: 0.7029 - val_sparse_categorical_accuracy: 0.8226
Epoch 19/30 176/176 [==============================] - 340s 2s/step - loss: 0.1221 -sparse_categorical_accuracy: 0.9576 - val_loss: 0.6307 - val_sparse_categorical_accuracy: 0.8416
Epoch 20/30176/176 [==============================] - 340s 2s/step - loss: 0.1207 -sparse_categorical_accuracy: 0.9564 - val_loss: 0.7292 - val_sparse_categorical_accuracy: 0.8246
Epoch 21/30 176/176 [==============================] - 339s 2s/step - loss: 0.1134 -sparse_categorical_accuracy: 0.9605 - val_loss: 0.6793 - val_sparse_categorical_accuracy: 0.8324
Epoch 22/30176/176 [==============================] - 339s 2s/step - loss: 0.1037 -sparse_categorical_accuracy: 0.9633 - val_loss: 0.7182 - val_sparse_categorical_accuracy: 0.8314
Epoch 23/30 176/176 [==============================] - 339s 2s/step - loss: 0.1004 -sparse_categorical_accuracy: 0.9652 - val_loss: 0.7064 - val_sparse_categorical_accuracy: 0.8294
Epoch 24/30 176/176 [==============================] - 345s 2s/step - loss: 0.0989 -sparse_categorical_accuracy: 0.9658 - val_loss: 0.6996 - val_sparse_categorical_accuracy: 0.8380
Epoch 25/30 176/176 [==============================] - 340s 2s/step - loss: 0.0978 -sparse_categorical_accuracy: 0.9657 - val_loss: 0.7072 - val_sparse_categorical_accuracy: 0.8386
Epoch 26/30 176/176 [==============================] - 339s 2s/step - loss: 0.0859 -sparse_categorical_accuracy: 0.9705 - val_loss: 0.7050 - val_sparse_categorical_accuracy: 0.8354
Epoch 27/30 176/176 [==============================] - 338s 2s/step - loss: 0.0880 -sparse_categorical_accuracy: 0.9693 - val_loss: 0.7319 - val_sparse_categorical_accuracy: 0.8368
Epoch 28/30 176/176 [==============================] - 339s 2s/step - loss: 0.0859 -sparse_categorical_accuracy: 0.9693 - val_loss: 0.7067 - val_sparse_categorical_accuracy: 0.8394
Epoch 29/30 176/176 [==============================] - 339s 2s/step - loss: 0.0830 -sparse_categorical_accuracy: 0.9714 - val_loss: 0.7131 - val_sparse_categorical_accuracy: 0.8366
Epoch 30/30 176/176 [==============================] - 338s 2s/step - loss: 0.0758 -sparse_categorical_accuracy: 0.9740 - val_loss: 0.7441 - val_sparse_categorical_accuracy: 0.8416
313/313 - 15s - loss: 0.8027 - sparse_categorical_accuracy: 0.8243 - 15s/epoch - 47ms/step
313/313 [==============================] - 15s 47ms/step
如下图所示为训练的train-loss和test-loss,可以看出train-loss下降的后期比较平缓,而test-loss在下降之后再0.8左右上下波动。
下图所示为train-accuracy和test-accuracy,可见train-accuracy一直在上升,但到5epoch以后上升就比较缓慢了,而test-accuracy上升后则一直稳定在0.82左右,即在0.82左右上下波动。
下图所示为预测结果的可视化,我用该模型对cifar10里的100张图片进行了预测,红色字体所表示的图片为预测错误的,而黑色字体所表示的图片为预测正确,可以看出预测正确的有85张图片,正确率为85%。