本文为[P365天深度学习训练营](https://mp.weixin.qq.com/s/0 dvHCa0oFnW8SCp3 JpzKxg)中的学习记录博客
原作者:[K同学啊](https://mtyjkh.blog.csdn.net/)
一、前期工作
1.设置GPU
import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")
if gpus:
tf.config.experimental.set_memory_growth(gpus[0], True)
tf.config.set_visible_devices(gpus[0],"GPU")
2.导入数据
import matplotlib.pyplot as plt
plt.rcParams['axes.unicode_minus'] = False
import os, PIL, pathlib
import numpy as np
from tensorflow import keras
from tensorflow.keras import layers, models
data_dir = './bird_photos'
data_dir = pathlib.Path(data_dir)
image_count = len(list(data_dir.glob('*/*')))
print("The number of images: ", image_count)
二、数据处理
1.加载数据
使用image._dataset_from_directory方法将磁盘中的数据加载到tf.data.Dataset中
batch_size = 8
img_width = 224
img_height = 224
train_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="training",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.2,
subset="validation",
seed=123,
image_size=(img_height, img_width),
batch_size=batch_size
)
class_names = train_ds.class_names
print(class_names)
2.可视化数据
plt.figure(figsize=(10,5))
plt.suptitle("data showing")
for images, labels in train_ds.take(1):
for i in range(8):
ax = plt.subplot(2, 4, i+1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[labels[i]])
plt.axis("off")
plt.imshow(images[1].numpy().astype("uint8"))
3.再次检查数据
for image_batch, labels_batch in train_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
4.配置数据集
AUTOTUNE = tf.data.AUTOTUNE
train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
三、残差网络(ResNet)介绍
1. 残差网络解决了什么
残差网络是为了解决神经网络隐藏层过多时,而引起的网络退化问题。退化(degradation)问题是指:当网络隐藏层变多时,网络的准确度达到饱和然后急剧退化,而且这个退化不是由于过拟合引起的。
2. ResNet-50介绍
ResNet-.50有两个基本的块,分别名为Conv Block和Identity Block
四、构建ResNet-50网络模型
from keras import layers
from keras.models import Model
from keras.layers import Input, Activation, BatchNormalization, Flatten
from keras.layers import Dense, Conv2D, MaxPooling2D, ZeroPadding2D, AveragePooling2D
def identity_block(input_tensor, kernel_size, filters, stage, block):
filters1, filters2, filters3 = filters
name_base = str(stage) + block + '_identity_block_block_'
x = Conv2D(filters1, (1,1), name=name_base + 'conv1')(input_tensor)
x = BatchNormalization(name=name_base + 'bn1')(x)
x = Activation('relu', name=name_base + 'relu1')(x)
x = Conv2D(filters2, kernel_size, padding='same', name=name_base + 'conv2')(x)
x = BatchNormalization(name=name_base + 'bn2')(x)
x = Activation('relu', name=name_base + 'relu2')(x)
x = Conv2D(filters3, (1,1), name=name_base + 'conv3')(x)
x = BatchNormalization(name=name_base + 'bn3')(x)
x = layers.add([x, input_tensor], name=name_base + 'add')
x = Activation('relu', name=name_base + 'relu4')(x)
return x
def conv_block(input_tensor, kernel_size, filters, stage, block, strides=(2, 2)):
filters1, filters2, filters3 = filters
res_name_base = str(stage) + block + '_conv_block_res_'
name_base = str(stage) + block + '_conv_block_'
x = Conv2D(filters1, (1,1), strides=strides, name=name_base + 'conv1')(input_tensor)
x = BatchNormalization(name=name_base + 'bn1')(x)
x = Activation('relu', name=name_base + 'relu1')(x)
x = Conv2D(filters2, kernel_size, padding='same', name=name_base + 'conv2')(x)
x = BatchNormalization(name=name_base + 'bn2')(x)
x = Activation('relu', name=name_base + 'relu2')(x)
x = Conv2D(filters3, (1,1), name=name_base + 'conv3')(x)
x = BatchNormalization(name=name_base + 'bn3')(x)
shortcut = Conv2D(filters3, (1,1), strides=strides, name=res_name_base + 'conv')(input_tensor)
shortcut = BatchNormalization(name=name_base + 'bn')(shortcut)
x = layers.add([x, shortcut], name=name_base + 'add')
x = Activation('relu', name=name_base + 'relu4')(x)
return x
def ResNet50(input_shape=[224,224,3], classes=1000):
img_input = Input(shape=input_shape)
x = ZeroPadding2D((3, 3))(img_input)
x = Conv2D(64, (7,7), strides=(2,2), name='conv1')(x)
x = BatchNormalization(name='bn_conv1')(x)
x = Activation('relu')(x)
x = MaxPooling2D((3, 3), strides=(2,2))(x)
x = conv_block(x, 3, [64, 64, 256], stage=2, block='a', strides=(1, 1))
x = identity_block(x, 3, [64, 64, 256], stage=2, block='b')
x = identity_block(x, 3, [64, 64, 256], stage=2, block='c')
x = conv_block(x, 3, [128, 128, 512], stage=3, block='a')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='b')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='c')
x = identity_block(x, 3, [128, 128, 512], stage=3, block='d')
x = conv_block(x, 3, [256, 256, 1024], stage=4, block='a')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='b')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='c')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='d')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='e')
x = identity_block(x, 3, [256, 256, 1024], stage=4, block='f')
x = conv_block(x, 3, [512, 512, 2048], stage=5, block='a')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='b')
x = identity_block(x, 3, [512, 512, 2048], stage=5, block='c')
x = AveragePooling2D((7,7), name='avg_pool')(x)
x = Flatten()(x)
x = Dense(classes, activation='softmax', name='fc1000')(x)
model = Model(img_input, x, name='resnet50')
model.load_weights("/content/drive/MyDrive/Colab Notebooks/第8天/resnet50_weights_tf_dim_ordering_tf_kernels.h5")
return model
model = ResNet50()
model.summary()
Model: "resnet50"
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0 []
zero_padding2d_1 (ZeroPadd (None, 230, 230, 3) 0 ['input_2[0][0]']
ing2D)
conv1 (Conv2D) (None, 112, 112, 64) 9472 ['zero_padding2d_1[0][0]']
bn_conv1 (BatchNormalizati (None, 112, 112, 64) 256 ['conv1[0][0]']
on)
activation_1 (Activation) (None, 112, 112, 64) 0 ['bn_conv1[0][0]']
max_pooling2d_1 (MaxPoolin (None, 55, 55, 64) 0 ['activation_1[0][0]']
g2D)
2a_conv_block_conv1 (Conv2 (None, 55, 55, 64) 4160 ['max_pooling2d_1[0][0]']
D)
2a_conv_block_bn1 (BatchNo (None, 55, 55, 64) 256 ['2a_conv_block_conv1[0][0]']
rmalization)
2a_conv_block_relu1 (Activ (None, 55, 55, 64) 0 ['2a_conv_block_bn1[0][0]']
ation)
2a_conv_block_conv2 (Conv2 (None, 55, 55, 64) 36928 ['2a_conv_block_relu1[0][0]']
D)
2a_conv_block_bn2 (BatchNo (None, 55, 55, 64) 256 ['2a_conv_block_conv2[0][0]']
rmalization)
2a_conv_block_relu2 (Activ (None, 55, 55, 64) 0 ['2a_conv_block_bn2[0][0]']
ation)
2a_conv_block_conv3 (Conv2 (None, 55, 55, 256) 16640 ['2a_conv_block_relu2[0][0]']
D)
2a_conv_block_res_conv (Co (None, 55, 55, 256) 16640 ['max_pooling2d_1[0][0]']
nv2D)
2a_conv_block_bn3 (BatchNo (None, 55, 55, 256) 1024 ['2a_conv_block_conv3[0][0]']
rmalization)
2a_conv_block_bn (BatchNor (None, 55, 55, 256) 1024 ['2a_conv_block_res_conv[0][0]
malization) ']
2a_conv_block_add (Add) (None, 55, 55, 256) 0 ['2a_conv_block_bn3[0][0]',
'2a_conv_block_bn[0][0]']
2a_conv_block_relu4 (Activ (None, 55, 55, 256) 0 ['2a_conv_block_add[0][0]']
ation)
2b_identity_block_block_co (None, 55, 55, 64) 16448 ['2a_conv_block_relu4[0][0]']
nv1 (Conv2D)
2b_identity_block_block_bn (None, 55, 55, 64) 256 ['2b_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
2b_identity_block_block_re (None, 55, 55, 64) 0 ['2b_identity_block_block_bn1[
lu1 (Activation) 0][0]']
2b_identity_block_block_co (None, 55, 55, 64) 36928 ['2b_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
2b_identity_block_block_bn (None, 55, 55, 64) 256 ['2b_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
2b_identity_block_block_re (None, 55, 55, 64) 0 ['2b_identity_block_block_bn2[
lu2 (Activation) 0][0]']
2b_identity_block_block_co (None, 55, 55, 256) 16640 ['2b_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
2b_identity_block_block_bn (None, 55, 55, 256) 1024 ['2b_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
2b_identity_block_block_ad (None, 55, 55, 256) 0 ['2b_identity_block_block_bn3[
d (Add) 0][0]',
'2a_conv_block_relu4[0][0]']
2b_identity_block_block_re (None, 55, 55, 256) 0 ['2b_identity_block_block_add[
lu4 (Activation) 0][0]']
2c_identity_block_block_co (None, 55, 55, 64) 16448 ['2b_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
2c_identity_block_block_bn (None, 55, 55, 64) 256 ['2c_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
2c_identity_block_block_re (None, 55, 55, 64) 0 ['2c_identity_block_block_bn1[
lu1 (Activation) 0][0]']
2c_identity_block_block_co (None, 55, 55, 64) 36928 ['2c_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
2c_identity_block_block_bn (None, 55, 55, 64) 256 ['2c_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
2c_identity_block_block_re (None, 55, 55, 64) 0 ['2c_identity_block_block_bn2[
lu2 (Activation) 0][0]']
2c_identity_block_block_co (None, 55, 55, 256) 16640 ['2c_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
2c_identity_block_block_bn (None, 55, 55, 256) 1024 ['2c_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
2c_identity_block_block_ad (None, 55, 55, 256) 0 ['2c_identity_block_block_bn3[
d (Add) 0][0]',
'2b_identity_block_block_relu
4[0][0]']
2c_identity_block_block_re (None, 55, 55, 256) 0 ['2c_identity_block_block_add[
lu4 (Activation) 0][0]']
3a_conv_block_conv1 (Conv2 (None, 28, 28, 128) 32896 ['2c_identity_block_block_relu
D) 4[0][0]']
3a_conv_block_bn1 (BatchNo (None, 28, 28, 128) 512 ['3a_conv_block_conv1[0][0]']
rmalization)
3a_conv_block_relu1 (Activ (None, 28, 28, 128) 0 ['3a_conv_block_bn1[0][0]']
ation)
3a_conv_block_conv2 (Conv2 (None, 28, 28, 128) 147584 ['3a_conv_block_relu1[0][0]']
D)
3a_conv_block_bn2 (BatchNo (None, 28, 28, 128) 512 ['3a_conv_block_conv2[0][0]']
rmalization)
3a_conv_block_relu2 (Activ (None, 28, 28, 128) 0 ['3a_conv_block_bn2[0][0]']
ation)
3a_conv_block_conv3 (Conv2 (None, 28, 28, 512) 66048 ['3a_conv_block_relu2[0][0]']
D)
3a_conv_block_res_conv (Co (None, 28, 28, 512) 131584 ['2c_identity_block_block_relu
nv2D) 4[0][0]']
3a_conv_block_bn3 (BatchNo (None, 28, 28, 512) 2048 ['3a_conv_block_conv3[0][0]']
rmalization)
3a_conv_block_bn (BatchNor (None, 28, 28, 512) 2048 ['3a_conv_block_res_conv[0][0]
malization) ']
3a_conv_block_add (Add) (None, 28, 28, 512) 0 ['3a_conv_block_bn3[0][0]',
'3a_conv_block_bn[0][0]']
3a_conv_block_relu4 (Activ (None, 28, 28, 512) 0 ['3a_conv_block_add[0][0]']
ation)
3b_identity_block_block_co (None, 28, 28, 128) 65664 ['3a_conv_block_relu4[0][0]']
nv1 (Conv2D)
3b_identity_block_block_bn (None, 28, 28, 128) 512 ['3b_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
3b_identity_block_block_re (None, 28, 28, 128) 0 ['3b_identity_block_block_bn1[
lu1 (Activation) 0][0]']
3b_identity_block_block_co (None, 28, 28, 128) 147584 ['3b_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
3b_identity_block_block_bn (None, 28, 28, 128) 512 ['3b_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
3b_identity_block_block_re (None, 28, 28, 128) 0 ['3b_identity_block_block_bn2[
lu2 (Activation) 0][0]']
3b_identity_block_block_co (None, 28, 28, 512) 66048 ['3b_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
3b_identity_block_block_bn (None, 28, 28, 512) 2048 ['3b_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
3b_identity_block_block_ad (None, 28, 28, 512) 0 ['3b_identity_block_block_bn3[
d (Add) 0][0]',
'3a_conv_block_relu4[0][0]']
3b_identity_block_block_re (None, 28, 28, 512) 0 ['3b_identity_block_block_add[
lu4 (Activation) 0][0]']
3c_identity_block_block_co (None, 28, 28, 128) 65664 ['3b_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
3c_identity_block_block_bn (None, 28, 28, 128) 512 ['3c_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
3c_identity_block_block_re (None, 28, 28, 128) 0 ['3c_identity_block_block_bn1[
lu1 (Activation) 0][0]']
3c_identity_block_block_co (None, 28, 28, 128) 147584 ['3c_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
3c_identity_block_block_bn (None, 28, 28, 128) 512 ['3c_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
3c_identity_block_block_re (None, 28, 28, 128) 0 ['3c_identity_block_block_bn2[
lu2 (Activation) 0][0]']
3c_identity_block_block_co (None, 28, 28, 512) 66048 ['3c_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
3c_identity_block_block_bn (None, 28, 28, 512) 2048 ['3c_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
3c_identity_block_block_ad (None, 28, 28, 512) 0 ['3c_identity_block_block_bn3[
d (Add) 0][0]',
'3b_identity_block_block_relu
4[0][0]']
3c_identity_block_block_re (None, 28, 28, 512) 0 ['3c_identity_block_block_add[
lu4 (Activation) 0][0]']
3d_identity_block_block_co (None, 28, 28, 128) 65664 ['3c_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
3d_identity_block_block_bn (None, 28, 28, 128) 512 ['3d_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
3d_identity_block_block_re (None, 28, 28, 128) 0 ['3d_identity_block_block_bn1[
lu1 (Activation) 0][0]']
3d_identity_block_block_co (None, 28, 28, 128) 147584 ['3d_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
3d_identity_block_block_bn (None, 28, 28, 128) 512 ['3d_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
3d_identity_block_block_re (None, 28, 28, 128) 0 ['3d_identity_block_block_bn2[
lu2 (Activation) 0][0]']
3d_identity_block_block_co (None, 28, 28, 512) 66048 ['3d_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
3d_identity_block_block_bn (None, 28, 28, 512) 2048 ['3d_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
3d_identity_block_block_ad (None, 28, 28, 512) 0 ['3d_identity_block_block_bn3[
d (Add) 0][0]',
'3c_identity_block_block_relu
4[0][0]']
3d_identity_block_block_re (None, 28, 28, 512) 0 ['3d_identity_block_block_add[
lu4 (Activation) 0][0]']
4a_conv_block_conv1 (Conv2 (None, 14, 14, 256) 131328 ['3d_identity_block_block_relu
D) 4[0][0]']
4a_conv_block_bn1 (BatchNo (None, 14, 14, 256) 1024 ['4a_conv_block_conv1[0][0]']
rmalization)
4a_conv_block_relu1 (Activ (None, 14, 14, 256) 0 ['4a_conv_block_bn1[0][0]']
ation)
4a_conv_block_conv2 (Conv2 (None, 14, 14, 256) 590080 ['4a_conv_block_relu1[0][0]']
D)
4a_conv_block_bn2 (BatchNo (None, 14, 14, 256) 1024 ['4a_conv_block_conv2[0][0]']
rmalization)
4a_conv_block_relu2 (Activ (None, 14, 14, 256) 0 ['4a_conv_block_bn2[0][0]']
ation)
4a_conv_block_conv3 (Conv2 (None, 14, 14, 1024) 263168 ['4a_conv_block_relu2[0][0]']
D)
4a_conv_block_res_conv (Co (None, 14, 14, 1024) 525312 ['3d_identity_block_block_relu
nv2D) 4[0][0]']
4a_conv_block_bn3 (BatchNo (None, 14, 14, 1024) 4096 ['4a_conv_block_conv3[0][0]']
rmalization)
4a_conv_block_bn (BatchNor (None, 14, 14, 1024) 4096 ['4a_conv_block_res_conv[0][0]
malization) ']
4a_conv_block_add (Add) (None, 14, 14, 1024) 0 ['4a_conv_block_bn3[0][0]',
'4a_conv_block_bn[0][0]']
4a_conv_block_relu4 (Activ (None, 14, 14, 1024) 0 ['4a_conv_block_add[0][0]']
ation)
4b_identity_block_block_co (None, 14, 14, 256) 262400 ['4a_conv_block_relu4[0][0]']
nv1 (Conv2D)
4b_identity_block_block_bn (None, 14, 14, 256) 1024 ['4b_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
4b_identity_block_block_re (None, 14, 14, 256) 0 ['4b_identity_block_block_bn1[
lu1 (Activation) 0][0]']
4b_identity_block_block_co (None, 14, 14, 256) 590080 ['4b_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
4b_identity_block_block_bn (None, 14, 14, 256) 1024 ['4b_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
4b_identity_block_block_re (None, 14, 14, 256) 0 ['4b_identity_block_block_bn2[
lu2 (Activation) 0][0]']
4b_identity_block_block_co (None, 14, 14, 1024) 263168 ['4b_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
4b_identity_block_block_bn (None, 14, 14, 1024) 4096 ['4b_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
4b_identity_block_block_ad (None, 14, 14, 1024) 0 ['4b_identity_block_block_bn3[
d (Add) 0][0]',
'4a_conv_block_relu4[0][0]']
4b_identity_block_block_re (None, 14, 14, 1024) 0 ['4b_identity_block_block_add[
lu4 (Activation) 0][0]']
4c_identity_block_block_co (None, 14, 14, 256) 262400 ['4b_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
4c_identity_block_block_bn (None, 14, 14, 256) 1024 ['4c_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
4c_identity_block_block_re (None, 14, 14, 256) 0 ['4c_identity_block_block_bn1[
lu1 (Activation) 0][0]']
4c_identity_block_block_co (None, 14, 14, 256) 590080 ['4c_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
4c_identity_block_block_bn (None, 14, 14, 256) 1024 ['4c_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
4c_identity_block_block_re (None, 14, 14, 256) 0 ['4c_identity_block_block_bn2[
lu2 (Activation) 0][0]']
4c_identity_block_block_co (None, 14, 14, 1024) 263168 ['4c_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
4c_identity_block_block_bn (None, 14, 14, 1024) 4096 ['4c_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
4c_identity_block_block_ad (None, 14, 14, 1024) 0 ['4c_identity_block_block_bn3[
d (Add) 0][0]',
'4b_identity_block_block_relu
4[0][0]']
4c_identity_block_block_re (None, 14, 14, 1024) 0 ['4c_identity_block_block_add[
lu4 (Activation) 0][0]']
4d_identity_block_block_co (None, 14, 14, 256) 262400 ['4c_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
4d_identity_block_block_bn (None, 14, 14, 256) 1024 ['4d_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
4d_identity_block_block_re (None, 14, 14, 256) 0 ['4d_identity_block_block_bn1[
lu1 (Activation) 0][0]']
4d_identity_block_block_co (None, 14, 14, 256) 590080 ['4d_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
4d_identity_block_block_bn (None, 14, 14, 256) 1024 ['4d_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
4d_identity_block_block_re (None, 14, 14, 256) 0 ['4d_identity_block_block_bn2[
lu2 (Activation) 0][0]']
4d_identity_block_block_co (None, 14, 14, 1024) 263168 ['4d_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
4d_identity_block_block_bn (None, 14, 14, 1024) 4096 ['4d_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
4d_identity_block_block_ad (None, 14, 14, 1024) 0 ['4d_identity_block_block_bn3[
d (Add) 0][0]',
'4c_identity_block_block_relu
4[0][0]']
4d_identity_block_block_re (None, 14, 14, 1024) 0 ['4d_identity_block_block_add[
lu4 (Activation) 0][0]']
4e_identity_block_block_co (None, 14, 14, 256) 262400 ['4d_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
4e_identity_block_block_bn (None, 14, 14, 256) 1024 ['4e_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
4e_identity_block_block_re (None, 14, 14, 256) 0 ['4e_identity_block_block_bn1[
lu1 (Activation) 0][0]']
4e_identity_block_block_co (None, 14, 14, 256) 590080 ['4e_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
4e_identity_block_block_bn (None, 14, 14, 256) 1024 ['4e_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
4e_identity_block_block_re (None, 14, 14, 256) 0 ['4e_identity_block_block_bn2[
lu2 (Activation) 0][0]']
4e_identity_block_block_co (None, 14, 14, 1024) 263168 ['4e_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
4e_identity_block_block_bn (None, 14, 14, 1024) 4096 ['4e_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
4e_identity_block_block_ad (None, 14, 14, 1024) 0 ['4e_identity_block_block_bn3[
d (Add) 0][0]',
'4d_identity_block_block_relu
4[0][0]']
4e_identity_block_block_re (None, 14, 14, 1024) 0 ['4e_identity_block_block_add[
lu4 (Activation) 0][0]']
4f_identity_block_block_co (None, 14, 14, 256) 262400 ['4e_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
4f_identity_block_block_bn (None, 14, 14, 256) 1024 ['4f_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
4f_identity_block_block_re (None, 14, 14, 256) 0 ['4f_identity_block_block_bn1[
lu1 (Activation) 0][0]']
4f_identity_block_block_co (None, 14, 14, 256) 590080 ['4f_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
4f_identity_block_block_bn (None, 14, 14, 256) 1024 ['4f_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
4f_identity_block_block_re (None, 14, 14, 256) 0 ['4f_identity_block_block_bn2[
lu2 (Activation) 0][0]']
4f_identity_block_block_co (None, 14, 14, 1024) 263168 ['4f_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
4f_identity_block_block_bn (None, 14, 14, 1024) 4096 ['4f_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
4f_identity_block_block_ad (None, 14, 14, 1024) 0 ['4f_identity_block_block_bn3[
d (Add) 0][0]',
'4e_identity_block_block_relu
4[0][0]']
4f_identity_block_block_re (None, 14, 14, 1024) 0 ['4f_identity_block_block_add[
lu4 (Activation) 0][0]']
5a_conv_block_conv1 (Conv2 (None, 7, 7, 512) 524800 ['4f_identity_block_block_relu
D) 4[0][0]']
5a_conv_block_bn1 (BatchNo (None, 7, 7, 512) 2048 ['5a_conv_block_conv1[0][0]']
rmalization)
5a_conv_block_relu1 (Activ (None, 7, 7, 512) 0 ['5a_conv_block_bn1[0][0]']
ation)
5a_conv_block_conv2 (Conv2 (None, 7, 7, 512) 2359808 ['5a_conv_block_relu1[0][0]']
D)
5a_conv_block_bn2 (BatchNo (None, 7, 7, 512) 2048 ['5a_conv_block_conv2[0][0]']
rmalization)
5a_conv_block_relu2 (Activ (None, 7, 7, 512) 0 ['5a_conv_block_bn2[0][0]']
ation)
5a_conv_block_conv3 (Conv2 (None, 7, 7, 2048) 1050624 ['5a_conv_block_relu2[0][0]']
D)
5a_conv_block_res_conv (Co (None, 7, 7, 2048) 2099200 ['4f_identity_block_block_relu
nv2D) 4[0][0]']
5a_conv_block_bn3 (BatchNo (None, 7, 7, 2048) 8192 ['5a_conv_block_conv3[0][0]']
rmalization)
5a_conv_block_bn (BatchNor (None, 7, 7, 2048) 8192 ['5a_conv_block_res_conv[0][0]
malization) ']
5a_conv_block_add (Add) (None, 7, 7, 2048) 0 ['5a_conv_block_bn3[0][0]',
'5a_conv_block_bn[0][0]']
5a_conv_block_relu4 (Activ (None, 7, 7, 2048) 0 ['5a_conv_block_add[0][0]']
ation)
5b_identity_block_block_co (None, 7, 7, 512) 1049088 ['5a_conv_block_relu4[0][0]']
nv1 (Conv2D)
5b_identity_block_block_bn (None, 7, 7, 512) 2048 ['5b_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
5b_identity_block_block_re (None, 7, 7, 512) 0 ['5b_identity_block_block_bn1[
lu1 (Activation) 0][0]']
5b_identity_block_block_co (None, 7, 7, 512) 2359808 ['5b_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
5b_identity_block_block_bn (None, 7, 7, 512) 2048 ['5b_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
5b_identity_block_block_re (None, 7, 7, 512) 0 ['5b_identity_block_block_bn2[
lu2 (Activation) 0][0]']
5b_identity_block_block_co (None, 7, 7, 2048) 1050624 ['5b_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
5b_identity_block_block_bn (None, 7, 7, 2048) 8192 ['5b_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
5b_identity_block_block_ad (None, 7, 7, 2048) 0 ['5b_identity_block_block_bn3[
d (Add) 0][0]',
'5a_conv_block_relu4[0][0]']
5b_identity_block_block_re (None, 7, 7, 2048) 0 ['5b_identity_block_block_add[
lu4 (Activation) 0][0]']
5c_identity_block_block_co (None, 7, 7, 512) 1049088 ['5b_identity_block_block_relu
nv1 (Conv2D) 4[0][0]']
5c_identity_block_block_bn (None, 7, 7, 512) 2048 ['5c_identity_block_block_conv
1 (BatchNormalization) 1[0][0]']
5c_identity_block_block_re (None, 7, 7, 512) 0 ['5c_identity_block_block_bn1[
lu1 (Activation) 0][0]']
5c_identity_block_block_co (None, 7, 7, 512) 2359808 ['5c_identity_block_block_relu
nv2 (Conv2D) 1[0][0]']
5c_identity_block_block_bn (None, 7, 7, 512) 2048 ['5c_identity_block_block_conv
2 (BatchNormalization) 2[0][0]']
5c_identity_block_block_re (None, 7, 7, 512) 0 ['5c_identity_block_block_bn2[
lu2 (Activation) 0][0]']
5c_identity_block_block_co (None, 7, 7, 2048) 1050624 ['5c_identity_block_block_relu
nv3 (Conv2D) 2[0][0]']
5c_identity_block_block_bn (None, 7, 7, 2048) 8192 ['5c_identity_block_block_conv
3 (BatchNormalization) 3[0][0]']
5c_identity_block_block_ad (None, 7, 7, 2048) 0 ['5c_identity_block_block_bn3[
d (Add) 0][0]',
'5b_identity_block_block_relu
4[0][0]']
5c_identity_block_block_re (None, 7, 7, 2048) 0 ['5c_identity_block_block_add[
lu4 (Activation) 0][0]']
avg_pool (AveragePooling2D (None, 1, 1, 2048) 0 ['5c_identity_block_block_relu
) 4[0][0]']
flatten_1 (Flatten) (None, 2048) 0 ['avg_pool[0][0]']
fc1000 (Dense) (None, 1000) 2049000 ['flatten_1[0][0]']
==================================================================================================
Total params: 25636712 (97.80 MB)
Trainable params: 25583592 (97.59 MB)
Non-trainable params: 53120 (207.50 KB)
__________________________________________________________________________________________________
五、编译
在准备对模型进行训练之前,还需要再对其进行一些设置。以下内容是在模型的编译步骤中添加的:
·损失函数(IoS$):用于衡量模型在训练期间的准确率。
·优化器(optimizer):决定模型如何根据其看到的数据和自身的损失函数进行更新。
·指标(metrics):用于监控训练和测试步骤。以下示例使用了准确率,即被正确分类的图像的比率。
opt = tf.keras.optimizers.Adam(learning_rate=1e-7)
model.compile(optimizer="adam",
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
六、训练模型
epochs = 10
history = model.fit(
train_ds,
validation_data=val_ds,
epochs=epochs
)
七、模型评估
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(epochs)
plt.figure(figsize=(12,4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
八、预测
plt.figure(figsize=(10, 5))
for images, labels in val_ds.take(1):
for i in range(8):
ax = plt.subplot(2, 4, i+1)
plt.imshow(images[i].numpy().astype("uint8"))
img_array = tf.expand_dims(images[i], 0)
predictions = model.predict(img_array)
plt.title(class_names[np.argmax(predictions)])
plt.axis('off')
九 、Pytorch 实现
import torch
import torchvision
import torch.nn as nn
import os, PIL, pathlib, warnings
from torchvision import transforms, datasets
warnings.filterwarnings("ignore")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
device
data_dir = '/content/drive/MyDrive/Colab Notebooks/第8天/bird_photos'
data_dir = pathlib.Path(data_dir)
data_paths = list(data_dir.glob("*"));
classeNames = [str(path).split("/")[-1] for path in data_paths]
train_transforms = transforms.Compose([
transforms.Resize([224, 224]),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize(
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
])
total_data = datasets.ImageFolder('/content/drive/MyDrive/Colab Notebooks/第8天/bird_photos', transform=train_transforms)
total_data.class_to_idx
train_size = int(len(total_data)*0.8)
test_size = len(total_data) - train_size
train_dataset, test_dataset = torch.utils.data.random_split(total_data, [train_size, test_size])
train_dataset, test_dataset
batch_size = 8
train_dl = torch.utils.data.DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=1)
test_dl = torch.utils.data.DataLoader(test_dataset, batch_size=batch_size, shuffle=True, num_workers=1)
for X, y in test_dl:
print("Shape of X [N, C, H, W]: ", X.shape)
print("Shape of y: ", y.shape, y.dtype)
break
import torch.nn.functional as F
def autopad(k, p=None):
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k]
return p
class IdentityBlock(nn.Module):
def __init__(self,in_channels,kernel_size,out_channels):
super(IdentityBlock, self).__init__()
out_channels1, out_channels2, out_channels3 = out_channels
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, out_channels1, kernel_size=1),
nn.BatchNorm2d(out_channels1),
nn.ReLU(),
)
self.conv2 = nn.Sequential(
nn.Conv2d(out_channels1, out_channels2, kernel_size=kernel_size, padding=autopad(kernel_size)),
nn.BatchNorm2d(out_channels2),
nn.ReLU()
)
self.conv3 = nn.Sequential(
nn.Conv2d(out_channels2, out_channels3, kernel_size=1),
nn.BatchNorm2d(out_channels3)
)
def forward(self,x):
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x = x1 + x
x = nn.ReLU()(x)
return x
class ConvBlock(nn.Module):
def __init__(self, in_channels, kernel_size, out_channels, stride=2):
super(ConvBlock, self).__init__()
out_channels1, out_channels2, out_channels3 = out_channels
self.conv1 = nn.Sequential(
nn.Conv2d(in_channels, out_channels1, kernel_size=1, stride=stride),
nn.BatchNorm2d(out_channels1),
nn.ReLU()
)
self.conv2 = nn.Sequential(
nn.Conv2d(out_channels1, out_channels2, kernel_size=kernel_size, padding=autopad(kernel_size)),
nn.BatchNorm2d(out_channels2),
nn.ReLU()
)
self.conv3 = nn.Sequential(
nn.Conv2d(out_channels2, out_channels3, kernel_size=1),
nn.BatchNorm2d(out_channels3),
nn.ReLU()
)
self.shortcut = nn.Sequential(
nn.Conv2d(in_channels, out_channels3, kernel_size=1, stride=stride),
nn.BatchNorm2d(out_channels3)
)
def forward(self, x):
x1 = self.conv1(x)
x1 = self.conv2(x1)
x1 = self.conv3(x1)
x2 = self.shortcut(x)
x = x1 + x2
nn.ReLU()(x)
return x
class ResNet50(nn.Module):
def __init__(self, classes=1000):
super(ResNet50, self).__init__()
self.conv1 = nn.Sequential(
nn.Conv2d(3, 64, 7, stride=2, padding=3, padding_mode="zeros"),
nn.BatchNorm2d(64),
nn.ReLU(),
nn.MaxPool2d(kernel_size=3, stride=2)
)
self.conv2 = nn.Sequential(
ConvBlock(64, 3, [64,64,256], stride=1),
IdentityBlock(256, 3, [64,64,256]),
IdentityBlock(256, 3, [64,64,256])
)
self.conv3 = nn.Sequential(
ConvBlock(256, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512]),
IdentityBlock(512, 3, [128, 128, 512])
)
self.conv4 = nn.Sequential(
ConvBlock(512, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024]),
IdentityBlock(1024, 3, [256, 256, 1024])
)
self.conv5 = nn.Sequential(
ConvBlock(1024, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048]),
IdentityBlock(2048, 3, [512, 512, 2048])
)
self.pool = nn.AvgPool2d((7,7))
self.fc = nn.Linear(2048, 4)
def forward(self,x):
x = self.conv1(x)
x = self.conv2(x)
x = self.conv3(x)
x = self.conv4(x)
x = self.conv5(x)
x = self.pool(x)
x = torch.flatten(x, start_dim=1)
x = self.fc(x)
return x
model = ResNet50().to(device)
model
import torchsummary as summary
summary.summary(model, (3, 224, 224))
def train(dataloader, model, loss_fn, optimizer):
size = len(dataloader.dataset)
num_batches = len(dataloader)
train_loss, train_acc = 0,0
for X, y in dataloader:
X, y = X.to(device), y.to(device)
pred = model(X)
loss = loss_fn(pred, y)
optimizer.zero_grad()
loss.backward()
optimizer.step()
train_acc += (pred.argmax(1) == y ).type(torch.float).sum().item()
train_loss += loss.item()
train_acc /= size
train_loss /= num_batches
return train_acc, train_loss
def test(dataloader, model, loss_fn):
size = len(dataloader.dataset)
num_batches = len(dataloader)
test_loss, test_acc = 0, 0
with torch.no_grad():
for imgs, target in dataloader:
imgs, target = imgs.to(device), target.to(device)
target_pred = model(imgs)
loss = loss_fn(target_pred, target)
test_loss += loss.item()
test_acc += (target_pred.argmax(1) == target).type(torch.float).sum().item()
test_acc /= size
test_loss /= num_batches
return test_loss, test_acc
import copy
optimizer = torch.optim.Adam(model.parameters(), lr=1e-4)
loss_fn = nn.CrossEntropyLoss()
epochs = 10
train_loss = []
train_acc = []
test_loss = []
test_acc = []
best_acc = 0 # 设置一个最佳准确率,作为最佳模型的判别指标
for epoch in range(epochs):
model.train()
epoch_train_acc, epoch_train_loss = train(train_dl, model, loss_fn, optimizer)
model.eval()
epoch_test_acc, epoch_test_loss = test(test_dl, model, loss_fn)
# 保存最佳模型到 best_model
if epoch_test_acc > best_acc:
best_acc = epoch_test_acc
best_model = copy.deepcopy(model)
train_acc.append(epoch_train_acc)
train_loss.append(epoch_train_loss)
test_acc.append(epoch_test_acc)
test_loss.append(epoch_test_loss)
# 获取当前的学习率
lr = optimizer.state_dict()['param_groups'][0]['lr']
template = ('Epoch:{:2d}, Train_acc:{:.1f}%, Train_loss:{:.3f}, Test_acc:{:.1f}%, Test_loss:{:.3f}, Lr:{:.2E}')
print(template.format(epoch + 1, epoch_train_acc * 100, epoch_train_loss,epoch_test_acc * 100, epoch_test_loss, lr))
# 保存最佳模型到文件中
PATH = './best_model.pth' # 保存的参数文件名
torch.save(model.state_dict(), PATH)
print('Done')
import matplotlib.pyplot as plt
# 隐藏警告
import warnings
warnings.filterwarnings("ignore") # 忽略警告信息
plt.rcParams['font.sans-serif'] = ['SimHei'] # 用来正常显示中文标签
plt.rcParams['axes.unicode_minus'] = False # 用来正常显示负号
plt.rcParams['figure.dpi'] = 100 # 分辨率
epochs_range = range(epochs)
plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, train_acc, label='Training Accuracy')
plt.plot(epochs_range, test_acc, label='Test Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, train_loss, label='Training Loss')
plt.plot(epochs_range, test_loss, label='Test Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
from PIL import Image
classes = list(total_data.class_to_idx)
def predict_one_image(image_path, model, transform, classes):
test_img = Image.open(image_path).convert('RGB')
plt.imshow(test_img) # 展示预测的图片
test_img = transform(test_img)
img = test_img.to(device).unsqueeze(0)
model.eval()
output = model(img)
_, pred = torch.max(output, 1)
pred_class = classes[pred]
print(f'预测结果是:{pred_class}')
predict_one_image(image_path='/content/drive/MyDrive/Colab Notebooks/第8天/bird_photos/Bananaquit/007.jpg',
model=model,
transform=train_transforms,
classes=classes)
best_model.eval()
epoch_test_acc, epoch_test_loss = test(test_dl, best_model, loss_fn)
print(epoch_test_acc, epoch_test_loss)
print(epoch_test_acc)
输出
device(type='cuda')
total_data.class_to_idx
total_data.class_to_idx
{'Bananaquit': 0,
'Black Skimmer': 1,
'Black Throated Bushtiti': 2,
'Cockatoo': 3}
Shape of X [N, C, H, W]: torch.Size([8, 3, 224, 224])
Shape of y: torch.Size([8]) torch.int64
ResNet50(
(conv1): Sequential(
(0): Conv2d(3, 64, kernel_size=(7, 7), stride=(2, 2), padding=(3, 3))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(conv2): Sequential(
(0): ConvBlock(
(conv1): Sequential(
(0): Conv2d(64, 64, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(shortcut): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(256, 64, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(64, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv3): Sequential(
(0): ConvBlock(
(conv1): Sequential(
(0): Conv2d(256, 128, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(shortcut): Sequential(
(0): Conv2d(256, 512, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(512, 128, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(128, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv4): Sequential(
(0): ConvBlock(
(conv1): Sequential(
(0): Conv2d(512, 256, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(shortcut): Sequential(
(0): Conv2d(512, 1024, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(3): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(4): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(5): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(1024, 256, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(256, 1024, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(1024, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(conv5): Sequential(
(0): ConvBlock(
(conv1): Sequential(
(0): Conv2d(1024, 512, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(shortcut): Sequential(
(0): Conv2d(1024, 2048, kernel_size=(1, 1), stride=(2, 2))
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(1): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
(2): IdentityBlock(
(conv1): Sequential(
(0): Conv2d(2048, 512, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv2): Sequential(
(0): Conv2d(512, 512, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(1): BatchNorm2d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(conv3): Sequential(
(0): Conv2d(512, 2048, kernel_size=(1, 1), stride=(1, 1))
(1): BatchNorm2d(2048, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
)
)
(pool): AvgPool2d(kernel_size=(7, 7), stride=(7, 7), padding=0)
(fc): Linear(in_features=2048, out_features=4, bias=True)
)
----------------------------------------------------------------
Layer (type) Output Shape Param #
================================================================
Conv2d-1 [-1, 64, 112, 112] 9,472
BatchNorm2d-2 [-1, 64, 112, 112] 128
ReLU-3 [-1, 64, 112, 112] 0
MaxPool2d-4 [-1, 64, 55, 55] 0
Conv2d-5 [-1, 64, 55, 55] 4,160
BatchNorm2d-6 [-1, 64, 55, 55] 128
ReLU-7 [-1, 64, 55, 55] 0
Conv2d-8 [-1, 64, 55, 55] 36,928
BatchNorm2d-9 [-1, 64, 55, 55] 128
ReLU-10 [-1, 64, 55, 55] 0
Conv2d-11 [-1, 256, 55, 55] 16,640
BatchNorm2d-12 [-1, 256, 55, 55] 512
ReLU-13 [-1, 256, 55, 55] 0
Conv2d-14 [-1, 256, 55, 55] 16,640
BatchNorm2d-15 [-1, 256, 55, 55] 512
ConvBlock-16 [-1, 256, 55, 55] 0
Conv2d-17 [-1, 64, 55, 55] 16,448
BatchNorm2d-18 [-1, 64, 55, 55] 128
ReLU-19 [-1, 64, 55, 55] 0
Conv2d-20 [-1, 64, 55, 55] 36,928
BatchNorm2d-21 [-1, 64, 55, 55] 128
ReLU-22 [-1, 64, 55, 55] 0
Conv2d-23 [-1, 256, 55, 55] 16,640
BatchNorm2d-24 [-1, 256, 55, 55] 512
IdentityBlock-25 [-1, 256, 55, 55] 0
Conv2d-26 [-1, 64, 55, 55] 16,448
BatchNorm2d-27 [-1, 64, 55, 55] 128
ReLU-28 [-1, 64, 55, 55] 0
Conv2d-29 [-1, 64, 55, 55] 36,928
BatchNorm2d-30 [-1, 64, 55, 55] 128
ReLU-31 [-1, 64, 55, 55] 0
Conv2d-32 [-1, 256, 55, 55] 16,640
BatchNorm2d-33 [-1, 256, 55, 55] 512
IdentityBlock-34 [-1, 256, 55, 55] 0
Conv2d-35 [-1, 128, 28, 28] 32,896
BatchNorm2d-36 [-1, 128, 28, 28] 256
ReLU-37 [-1, 128, 28, 28] 0
Conv2d-38 [-1, 128, 28, 28] 147,584
BatchNorm2d-39 [-1, 128, 28, 28] 256
ReLU-40 [-1, 128, 28, 28] 0
Conv2d-41 [-1, 512, 28, 28] 66,048
BatchNorm2d-42 [-1, 512, 28, 28] 1,024
ReLU-43 [-1, 512, 28, 28] 0
Conv2d-44 [-1, 512, 28, 28] 131,584
BatchNorm2d-45 [-1, 512, 28, 28] 1,024
ConvBlock-46 [-1, 512, 28, 28] 0
Conv2d-47 [-1, 128, 28, 28] 65,664
BatchNorm2d-48 [-1, 128, 28, 28] 256
ReLU-49 [-1, 128, 28, 28] 0
Conv2d-50 [-1, 128, 28, 28] 147,584
BatchNorm2d-51 [-1, 128, 28, 28] 256
ReLU-52 [-1, 128, 28, 28] 0
Conv2d-53 [-1, 512, 28, 28] 66,048
BatchNorm2d-54 [-1, 512, 28, 28] 1,024
IdentityBlock-55 [-1, 512, 28, 28] 0
Conv2d-56 [-1, 128, 28, 28] 65,664
BatchNorm2d-57 [-1, 128, 28, 28] 256
ReLU-58 [-1, 128, 28, 28] 0
Conv2d-59 [-1, 128, 28, 28] 147,584
BatchNorm2d-60 [-1, 128, 28, 28] 256
ReLU-61 [-1, 128, 28, 28] 0
Conv2d-62 [-1, 512, 28, 28] 66,048
BatchNorm2d-63 [-1, 512, 28, 28] 1,024
IdentityBlock-64 [-1, 512, 28, 28] 0
Conv2d-65 [-1, 128, 28, 28] 65,664
BatchNorm2d-66 [-1, 128, 28, 28] 256
ReLU-67 [-1, 128, 28, 28] 0
Conv2d-68 [-1, 128, 28, 28] 147,584
BatchNorm2d-69 [-1, 128, 28, 28] 256
ReLU-70 [-1, 128, 28, 28] 0
Conv2d-71 [-1, 512, 28, 28] 66,048
BatchNorm2d-72 [-1, 512, 28, 28] 1,024
IdentityBlock-73 [-1, 512, 28, 28] 0
Conv2d-74 [-1, 256, 14, 14] 131,328
BatchNorm2d-75 [-1, 256, 14, 14] 512
ReLU-76 [-1, 256, 14, 14] 0
Conv2d-77 [-1, 256, 14, 14] 590,080
BatchNorm2d-78 [-1, 256, 14, 14] 512
ReLU-79 [-1, 256, 14, 14] 0
Conv2d-80 [-1, 1024, 14, 14] 263,168
BatchNorm2d-81 [-1, 1024, 14, 14] 2,048
ReLU-82 [-1, 1024, 14, 14] 0
Conv2d-83 [-1, 1024, 14, 14] 525,312
BatchNorm2d-84 [-1, 1024, 14, 14] 2,048
ConvBlock-85 [-1, 1024, 14, 14] 0
Conv2d-86 [-1, 256, 14, 14] 262,400
BatchNorm2d-87 [-1, 256, 14, 14] 512
ReLU-88 [-1, 256, 14, 14] 0
Conv2d-89 [-1, 256, 14, 14] 590,080
BatchNorm2d-90 [-1, 256, 14, 14] 512
ReLU-91 [-1, 256, 14, 14] 0
Conv2d-92 [-1, 1024, 14, 14] 263,168
BatchNorm2d-93 [-1, 1024, 14, 14] 2,048
IdentityBlock-94 [-1, 1024, 14, 14] 0
Conv2d-95 [-1, 256, 14, 14] 262,400
BatchNorm2d-96 [-1, 256, 14, 14] 512
ReLU-97 [-1, 256, 14, 14] 0
Conv2d-98 [-1, 256, 14, 14] 590,080
BatchNorm2d-99 [-1, 256, 14, 14] 512
ReLU-100 [-1, 256, 14, 14] 0
Conv2d-101 [-1, 1024, 14, 14] 263,168
BatchNorm2d-102 [-1, 1024, 14, 14] 2,048
IdentityBlock-103 [-1, 1024, 14, 14] 0
Conv2d-104 [-1, 256, 14, 14] 262,400
BatchNorm2d-105 [-1, 256, 14, 14] 512
ReLU-106 [-1, 256, 14, 14] 0
Conv2d-107 [-1, 256, 14, 14] 590,080
BatchNorm2d-108 [-1, 256, 14, 14] 512
ReLU-109 [-1, 256, 14, 14] 0
Conv2d-110 [-1, 1024, 14, 14] 263,168
BatchNorm2d-111 [-1, 1024, 14, 14] 2,048
IdentityBlock-112 [-1, 1024, 14, 14] 0
Conv2d-113 [-1, 256, 14, 14] 262,400
BatchNorm2d-114 [-1, 256, 14, 14] 512
ReLU-115 [-1, 256, 14, 14] 0
Conv2d-116 [-1, 256, 14, 14] 590,080
BatchNorm2d-117 [-1, 256, 14, 14] 512
ReLU-118 [-1, 256, 14, 14] 0
Conv2d-119 [-1, 1024, 14, 14] 263,168
BatchNorm2d-120 [-1, 1024, 14, 14] 2,048
IdentityBlock-121 [-1, 1024, 14, 14] 0
Conv2d-122 [-1, 256, 14, 14] 262,400
BatchNorm2d-123 [-1, 256, 14, 14] 512
ReLU-124 [-1, 256, 14, 14] 0
Conv2d-125 [-1, 256, 14, 14] 590,080
BatchNorm2d-126 [-1, 256, 14, 14] 512
ReLU-127 [-1, 256, 14, 14] 0
Conv2d-128 [-1, 1024, 14, 14] 263,168
BatchNorm2d-129 [-1, 1024, 14, 14] 2,048
IdentityBlock-130 [-1, 1024, 14, 14] 0
Conv2d-131 [-1, 512, 7, 7] 524,800
BatchNorm2d-132 [-1, 512, 7, 7] 1,024
ReLU-133 [-1, 512, 7, 7] 0
Conv2d-134 [-1, 512, 7, 7] 2,359,808
BatchNorm2d-135 [-1, 512, 7, 7] 1,024
ReLU-136 [-1, 512, 7, 7] 0
Conv2d-137 [-1, 2048, 7, 7] 1,050,624
BatchNorm2d-138 [-1, 2048, 7, 7] 4,096
ReLU-139 [-1, 2048, 7, 7] 0
Conv2d-140 [-1, 2048, 7, 7] 2,099,200
BatchNorm2d-141 [-1, 2048, 7, 7] 4,096
ConvBlock-142 [-1, 2048, 7, 7] 0
Conv2d-143 [-1, 512, 7, 7] 1,049,088
BatchNorm2d-144 [-1, 512, 7, 7] 1,024
ReLU-145 [-1, 512, 7, 7] 0
Conv2d-146 [-1, 512, 7, 7] 2,359,808
BatchNorm2d-147 [-1, 512, 7, 7] 1,024
ReLU-148 [-1, 512, 7, 7] 0
Conv2d-149 [-1, 2048, 7, 7] 1,050,624
BatchNorm2d-150 [-1, 2048, 7, 7] 4,096
IdentityBlock-151 [-1, 2048, 7, 7] 0
Conv2d-152 [-1, 512, 7, 7] 1,049,088
BatchNorm2d-153 [-1, 512, 7, 7] 1,024
ReLU-154 [-1, 512, 7, 7] 0
Conv2d-155 [-1, 512, 7, 7] 2,359,808
BatchNorm2d-156 [-1, 512, 7, 7] 1,024
ReLU-157 [-1, 512, 7, 7] 0
Conv2d-158 [-1, 2048, 7, 7] 1,050,624
BatchNorm2d-159 [-1, 2048, 7, 7] 4,096
IdentityBlock-160 [-1, 2048, 7, 7] 0
AvgPool2d-161 [-1, 2048, 1, 1] 0
Linear-162 [-1, 4] 8,196
================================================================
Total params: 23,542,788
Trainable params: 23,542,788
Non-trainable params: 0
----------------------------------------------------------------
Input size (MB): 0.57
Forward/backward pass size (MB): 240.24
Params size (MB): 89.81
Estimated Total Size (MB): 330.62
----------------------------------------------------------------
Epoch: 1, Train_acc:61.7%, Train_loss:1.009, Test_acc:64.0%, Test_loss:0.735, Lr:1.00E-04
Epoch: 2, Train_acc:76.1%, Train_loss:0.678, Test_acc:65.3%, Test_loss:0.796, Lr:1.00E-04
Epoch: 3, Train_acc:76.5%, Train_loss:0.607, Test_acc:113.1%, Test_loss:0.699, Lr:1.00E-04
Epoch: 4, Train_acc:81.6%, Train_loss:0.487, Test_acc:139.4%, Test_loss:0.681, Lr:1.00E-04
Epoch: 5, Train_acc:88.3%, Train_loss:0.341, Test_acc:131.8%, Test_loss:0.779, Lr:1.00E-04
Epoch: 6, Train_acc:86.9%, Train_loss:0.389, Test_acc:66.9%, Test_loss:0.788, Lr:1.00E-04
Epoch: 7, Train_acc:87.6%, Train_loss:0.388, Test_acc:106.5%, Test_loss:0.708, Lr:1.00E-04
Epoch: 8, Train_acc:93.1%, Train_loss:0.195, Test_acc:24.2%, Test_loss:0.920, Lr:1.00E-04
Epoch: 9, Train_acc:92.3%, Train_loss:0.257, Test_acc:23.7%, Test_loss:0.903, Lr:1.00E-04
Epoch:10, Train_acc:91.8%, Train_loss:0.239, Test_acc:75.7%, Test_loss:0.832, Lr:1.00E-04
Done
预测结果是:Black Throated Bushtiti
十、个人总结
ResNet-50(Residual Network-50)是一个具有50层深度的深度神经网络,主要用于图像识别和分类任务。这种网络是由微软研究院的Kaiming He等人在2015年提出的,它引入了一种称为残差学习(residual learning)的概念,极大地改善了训练非常深的神经网络的效果和效率。下面是对ResNet-50的总结:
基本架构
- 层数:正如名称所示,ResNet-50总共有50层,这包括48层卷积层,1层全局平均池化层,和1层全连接层。
- 输入:标准输入尺寸为224x224像素的图像,具有3个颜色通道(RGB)。
残差块(Residual Blocks)
- 主要创新:引入了残差块的概念,每个残差块包含几个卷积层,并有一个“快捷连接”(也称为“跳过连接”)直接连接残差块的输入到输出。快捷连接允许输入直接“跳过”一到两层,然后与该块的卷积层输出相加。这有助于解决梯度消失和梯度爆炸的问题,使得可以训练更深的网络。
- 类型:ResNet-50具有3种类型的残差块,分别适应不同的输入和输出维度。每种类型的块可能包含3层或更多层卷积。
网络结构
- 初始卷积层和池化层:网络以一个7x7的卷积层开始,步长为2,接着是3x3的最大池化层,步长也为2。
- 残差块组:接下来是四组残差块,每组包含不同数量的残差块。每个残差块内部的卷积层使用不同数量的过滤器,以及不同的步长和填充设置。
- 结束层:在所有的残差块之后,使用全局平均池化层来减少每个特征图的维度到1x1。最后,应用一个全连接层和softmax层来输出分类的预测结果。
训练和优化
- 批归一化(Batch Normalization):ResNet-50在每个卷积层后使用批归一化来加速训练,同时帮助缓解了内部协变量偏移问题。
- 优化器:通常使用SGD(随机梯度下降)或Adam优化器,结合动量和学习率衰减策略