文章目录
T6周:好莱坞明星识别
- 🍨 本文为🔗365天深度学习训练营 中的学习记录博客
- 🍖 原作者:K同学啊
🍺 要求:
- 使用categorical_crossentropy(多分类的对数损失函数)完成本次选题✅
- 探究不同损失函数的使用场景与代码实现✅
🍻 拔高(可选):
- 自己搭建VGG-16网络框架✅
- 调用官方的VGG-16网络框架✅
- 使用VGG-16算法训练该模型✅
🔎 探索(难度有点大)
- 准确率达到60%✅
⛽ 我的环境
- 语言环境:Python3.10.12
- 编译器:Google Colab
- 深度学习环境:
- TensorFlow2.15.0
⛽ 参考学习博客汇总:
- 深度学习 Day6——T6好莱坞明星识别
- tensorflow2.X搭建/调用VGG16及加载预训练参数+https://www.heywhale.com/mw/project/630f725c34a1cfd3575ddd7d
- TensorFlow深入了解损失函数Categorical Cross-Entropy Loss、Binary Cross-Entropy Loss等
- 损失函数:binary_crossentropy
- 深度学习 Day6——T6好莱坞明星识别
- 深入理解VGG16模型与代码实现
- Keras:VGG16模型微调
- 基于VGG16的好莱坞明星识别
一、前期工作
1.设置GPU,导入库
#os提供了一些与操作系统交互的功能,比如文件和目录操作
import os
#提供图像处理的功能,包括打开和显示、保存、裁剪等
import PIL
from PIL import Image
#pathlib提供了一个面向对象的接口来处理文件系统路径。路径被表示为Path对象,可以调用方法来进行各种文件和目录操作。
import pathlib
#用于绘制图形和可视化数据
import tensorflow as tf
import matplotlib.pyplot as plt
#用于数值计算的库,提供支持多维数组和矩阵运算
import numpy as np
#keras作为高层神经网络API,已被集成进tensorflow,使得训练更方便简单
from tensorflow import keras
#layers提供了神经网络的基本构建块,比如全连接层、卷积层、池化层等
#提供了构建和训练神经网络模型的功能,包括顺序模型(Sequential)和函数式模型(Functional API)
from tensorflow.keras import layers, models
#导入两个重要的回调函数:前者用于训练期间保存模型最佳版本;后者监测到模型性能不再提升时提前停止训练,避免过拟合
from tensorflow.keras.callbacks import ModelCheckpoint, EarlyStopping
tf.__version__
'2.15.0'
# 获取所有可用的GPU设备列表,储存在变量gpus中
gpus = tf.config.list_physical_devices("GPU")
# 如果有GPU,即列表不为空
if gpus:
# 获取第一个 GPU 设备
gpu0 = gpus[0]
# 设置 GPU 内存增长策略。开启这个选项可以让tf按需分配gpu内存,而不是一次性分配所有可用内存。
tf.config.experimental.set_memory_growth(gpu0, True)
#设置tf只使用指定的gpu(gpu[0])
tf.config.set_visible_devices([gpu0],"GPU")
gpus
[PhysicalDevice(name='/physical_device:GPU:0', device_type='GPU')]
2.导入数据
from google.colab import drive
drive.mount("/content/drive/")
%cd "/content/drive/My Drive/Colab Notebooks/jupyter notebook/data/"
Mounted at /content/drive/
/content/drive/My Drive/Colab Notebooks/jupyter notebook/data
data_dir = "./T6"
data_dir = pathlib.Path(data_dir)
3.查看数据
# 使用glob方法获取当前目录的子目录里所有以'.jpg'为结尾的文件
# '*/*.jpg' 是一個通配符模式
# 第一个星号表示当前目录
# 第二个星号表示子目录
image_count = len (list(data_dir.glob("*/*.jpg")))
print("图片总数:", image_count)
图片总数: 1800
ex = list(data_dir.glob("Natalie Portman/*.jpg"))
image=PIL.Image.open(str(ex[5]))
#查看图像属性
print(image.format, image.size,image.mode)
plt.axis("off")
plt.imshow(image)
plt.show()
JPEG (160, 188) RGB
二、数据预处理
1.加载数据
#设置批量大小,即每次训练模型时输入图像数量
#每次训练迭代时,模型需处理32张图像
batch_size = 32
#图像的高度,加载图像数据时,将所有的图像调整为相同的高度
img_height = 224
#图像的宽度,加载图像数据时,将所有的图像调整为相同的宽度
img_width = 224
"""
关于image_dataset_from_directory()的详细介绍可以参考文章:https://mtyjkh.blog.csdn.net/article/details/117018789
"""
tr_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split=0.1,
#指定数据集中分割出多少比例数据当作验证集,0.1表示10%数据会被用来当验证集
subset="training",
#指定是用于训练还是验证的数据子集,这里设定为training
label_mode = "categorical",
#标签编码为分类向量,独热编码数组--使用损失函数应为--categorical_crossentropy loss
seed=123,
#用于设置随机数种子,以确保数据集划分的可重复性和一致性
image_size=(img_height, img_width),
batch_size=batch_size)
Found 1800 files belonging to 17 classes.
Using 1620 files for training.
val_ds = tf.keras.preprocessing.image_dataset_from_directory(
data_dir,
validation_split = 0.1,
subset = "validation",
label_mode="categorical",
seed = 123,
image_size=(img_height,img_width),
batch_size=batch_size
)
Found 1800 files belonging to 17 classes.
Using 180 files for validation.
class_names = tr_ds.class_names
# 可以通过class_names输出数据集的标签。标签将按字母顺序对应于目录名称
class_names
['Angelina Jolie',
'Brad Pitt',
'Denzel Washington',
'Hugh Jackman',
'Jennifer Lawrence',
'Johnny Depp',
'Kate Winslet',
'Leonardo DiCaprio',
'Megan Fox',
'Natalie Portman',
'Nicole Kidman',
'Robert Downey Jr',
'Sandra Bullock',
'Scarlett Johansson',
'Tom Cruise',
'Tom Hanks',
'Will Smith']
#数据增强---参考博客:https://blog.csdn.net/afive54/article/details/135004174
def augment_images(image, label):
image = tf.image.random_flip_up_down(image) # 随机水平翻转
image = tf.image.random_flip_left_right(image)
image = tf.image.random_contrast(image, lower=0.1, upper=1.2) # 随机对比度
image = tf.image.random_brightness(image, max_delta=0.2) # 随机亮度
image = tf.image.random_saturation(image, lower=0.1, upper=1.2) # 随机饱和度
#noise = tf.random.normal(tf.shape(image), mean=0.0, stddev=0.1)
#image = tf.clip_by_value(image, 0.0, 0.5) # 添加高斯噪声并将像素值限制在0到1之间
return image, label
# 对训练集数据进行增强
augmented_tr_ds = tr_ds.map(augment_images)
随机数种子相关可参考:https://blog.csdn.net/weixin_51390582/article/details/124246873
2.可视化数据
plt.figure(figsize=(20,10))
for images, labels in tr_ds.take(1):
for i in range(20):
ax = plt.subplot(5, 10, i + 1)
plt.imshow(images[i].numpy().astype("uint8"))
plt.title(class_names[np.argmax(labels[i])], fontsize=10)
plt.axis("off")
# 显示图片
plt.show()
for image_batch, labels_batch in tr_ds:
print(image_batch.shape)
print(labels_batch.shape)
break
#`(32, 224, 224, 3)`--最后一维指的是彩色通道RGB
#`label_batch`是形状(32,17)的张量,这些标签对应32张图片,分为17类
(32, 224, 224, 3)
(32, 17)
3.配置数据集
#自动调整数据管道性能
AUTOTUNE = tf.data.AUTOTUNE
# 使用 tf.data.AUTOTUNE 具体的好处包括:
#自动调整并行度:自动决定并行处理数据的最佳线程数,以最大化数据吞吐量。
#减少等待时间:通过优化数据加载和预处理,减少模型训练时等待数据的时间。
#提升性能:自动优化数据管道的各个环节,使整个训练过程更高效。
#简化代码:不需要手动调整参数,代码更简洁且易于维护。
#使用cache()方法将训练集缓存到内存中,这样加快数据加载速度
#当多次迭代训练数据时,可以重复使用已经加载到内存的数据而不必重新从磁盘加载
#使用shuffle()对训练数据集进行洗牌操作,打乱数据集中的样本顺序
#参数1000指缓冲区大小,即每次从数据集中随机选择的样本数量
#prefetch()预取数据,节约在训练过程中数据加载时间
#augmented_tr_ds = augmented_tr_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
tr_ds = tr_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)
三、构建CNN网络模型
卷积神经网络(CNN)的输入是张量 (Tensor) 形式的 (image_height, image_width, color_channels)
,包含了图像高度、宽度及颜色信息。不需要输入batch size
。color_channels 为 (R,G,B) 分别对应 RGB 的三个颜色通道(color channel)。在此示例中,我们的 CNN 输入形状是 (224, 224, 3)
。我们需要在声明第一层时将形状赋值给参数input_shape。
CNN的输入张量表示图像的结构和颜色信息。每个像素点都被表示为具有color_channels个数值的向量,在训练时,通过一系列卷积层、池化层和全连接层等操作提取和处理图像特征。
#创建序列模型,一种线性堆叠模型,各层按照他们被添加到模型中的顺序来堆叠
"""
关于卷积核的计算不懂的可以参考文章:https://blog.csdn.net/qq_38251616/article/details/114278995
layers.Dropout(0.4) 作用是防止过拟合,提高模型的泛化能力。
关于Dropout层的更多介绍可以参考文章:https://mtyjkh.blog.csdn.net/article/details/115826689
"""
model = models.Sequential([
layers.experimental.preprocessing.Rescaling(1./255, input_shape=(img_height, img_width, 3)),
layers.Conv2D(16, (3, 3), activation='relu', input_shape=(img_height, img_width, 3)), # 卷积层1,卷积核3*3
layers.AveragePooling2D((2, 2)), # 池化层1,2*2采样
layers.Conv2D(32, (3, 3), activation='relu'), # 卷积层2,卷积核3*3
layers.AveragePooling2D((2, 2)), # 池化层2,2*2采样
layers.Dropout(0.5),
layers.Conv2D(64, (3, 3), activation='relu'), # 卷积层3,卷积核3*3
layers.AveragePooling2D((2, 2)),
layers.Dropout(0.5),
layers.Conv2D(128, (3, 3), activation='relu'), # 卷积层3,卷积核3*3
layers.Dropout(0.5),
layers.Flatten(), # Flatten层,连接卷积层与全连接层
layers.Dense(128, activation='relu'), # 全连接层,特征进一步提取
layers.Dense(len(class_names)) # 输出层,输出预期结果
])
model.summary() # 打印网络结构
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 224, 224, 3) 0
_________________________________________________________________
conv2d (Conv2D) (None, 222, 222, 16) 448
_________________________________________________________________
average_pooling2d (AveragePo (None, 111, 111, 16) 0
_________________________________________________________________
conv2d_1 (Conv2D) (None, 109, 109, 32) 4640
_________________________________________________________________
average_pooling2d_1 (Average (None, 54, 54, 32) 0
_________________________________________________________________
dropout (Dropout) (None, 54, 54, 32) 0
_________________________________________________________________
conv2d_2 (Conv2D) (None, 52, 52, 64) 18496
_________________________________________________________________
average_pooling2d_2 (Average (None, 26, 26, 64) 0
_________________________________________________________________
dropout_1 (Dropout) (None, 26, 26, 64) 0
_________________________________________________________________
conv2d_3 (Conv2D) (None, 24, 24, 128) 73856
_________________________________________________________________
dropout_2 (Dropout) (None, 24, 24, 128) 0
_________________________________________________________________
flatten (Flatten) (None, 73728) 0
_________________________________________________________________
dense (Dense) (None, 128) 9437312
_________________________________________________________________
dense_1 (Dense) (None, 17) 2193
=================================================================
Total params: 9,536,945
Trainable params: 9,536,945
Non-trainable params: 0
_________________________________________________________________
四、编译模型
在准备对模型进行训练之前,还需要再对其进行一些设置。以下内容是在模型的编译步骤中添加的:
- 损失函数(loss):用于衡量模型在训练期间的准确率。
- 优化器(optimizer):决定模型如何根据其看到的数据和自身的损失函数进行更新。
- 指标(metrics):用于监控训练和测试步骤。以下示例使用了准确率,即被正确分类的图像的比率。
#本次使用代码
# 设置初始学习率
initial_learning_rate = 1e-4
lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
initial_learning_rate,
decay_steps=50, #原60,最佳范围处于:46-60之间
decay_rate=0.96, #0.94-0.98均可
staircase=True)
# 将指数衰减学习率送入优化器
optimizer = tf.keras.optimizers.Adam(learning_rate=lr_schedule)
model.compile(optimizer=optimizer,
loss=tf.keras.losses.CategoricalCrossentropy(from_logits=False),
metrics=['accuracy'])
#Adam优化器是一种常用的梯度下降优化算法,用于更新模型的权重以最小化训练过程中的损失函数
#由于是多分类问题这里使用categorical crossentropy损失函数
五、训练模型
epochs = 150
# 保存最佳模型参数
checkpointer = ModelCheckpoint(
"/content/drive/My Drive/Colab Notebooks/jupyter notebook/xunlianying/vgg16_shou_final.weights.h5",
monitor='val_accuracy',
verbose=1,
mode = "max",
save_best_only=True,
save_weights_only=True)
# 设置早停
earlystopper = EarlyStopping(
monitor='val_accuracy',
min_delta=0.0001, #原0.001
patience=20,
mode = "max",
verbose=1)
history = model.fit(
tr_ds,
validation_data=val_ds,
epochs=epochs,
callbacks=[checkpointer, earlystopper])
... 保存了最后一次部分训练结果
Epoch 61/150
51/51 [==============================] - ETA: 0s - loss: 0.0076 - accuracy: 0.9994
Epoch 61: val_accuracy did not improve from 0.85000
51/51 [==============================] - 14s 267ms/step - loss: 0.0076 - accuracy: 0.9994 - val_loss: 0.7455 - val_accuracy: 0.8222
Epoch 62/150
51/51 [==============================] - ETA: 0s - loss: 0.0071 - accuracy: 0.9988
Epoch 62: val_accuracy did not improve from 0.85000
51/51 [==============================] - 14s 266ms/step - loss: 0.0071 - accuracy: 0.9988 - val_loss: 0.7164 - val_accuracy: 0.8222
Epoch 63/150
51/51 [==============================] - ETA: 0s - loss: 0.0083 - accuracy: 0.9988
Epoch 63: val_accuracy did not improve from 0.85000
51/51 [==============================] - 14s 267ms/step - loss: 0.0083 - accuracy: 0.9988 - val_loss: 0.7085 - val_accuracy: 0.8389
Epoch 64/150
51/51 [==============================] - ETA: 0s - loss: 0.0093 - accuracy: 0.9981
Epoch 64: val_accuracy did not improve from 0.85000
51/51 [==============================] - 14s 266ms/step - loss: 0.0093 - accuracy: 0.9981 - val_loss: 0.6983 - val_accuracy: 0.8389
Epoch 64: early stopping
六、模型评估
acc = history.history['accuracy']
val_acc = history.history['val_accuracy']
loss = history.history['loss']
val_loss = history.history['val_loss']
epochs_range = range(len(loss))
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, acc, label='Training Accuracy')
plt.plot(epochs_range, val_acc, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')
plt.subplot(1, 2, 2)
plt.plot(epochs_range, loss, label='Training Loss')
plt.plot(epochs_range, val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()
上图为手动搭建后调整得到的最后一次的结果。
七、预测
'''指定图片进行预测'''
# 加载效果最好的模型权重
model.load_weights("/content/drive/My Drive/Colab Notebooks/jupyter notebook/xunlianying/vgg16_shou_final.weights.h5")
from PIL import Image
import numpy as np
img = Image.open("/content/drive/My Drive/Colab Notebooks/jupyter notebook/data/T6/Scarlett Johansson/100_0bc6635b.jpg") #这里选择你需要预测的图片
image = tf.image.resize(img, [img_height, img_width])
img_array = tf.expand_dims(image, 0)
predictions = model.predict(img_array) # 这里选用你已经训练好的模型
print("预测结果为:",class_names[np.argmax(predictions)])
1/1 [==============================] - 1s 1s/step
预测结果为: Scarlett Johansson
八、实验总结
-
学习了解了不同损失函数的原理与适用背景;
-
调整指数衰减学习率中部分参数大小,使得val_acc达到40%
-
tf框架中进行VGG16调用和手动搭建(代码见九)
-
对VGG16模型进行调整修改使得val正确率达到80%
直接调用官方vgg16注意点(见9.2)
- 注意是否保留顶层分类器;建议不留且使用drop_out
- 调用后可进行部分解冻(注意只解冻后几层conv)
- 最后分类器若已采用
softmax
激活函数,则原代码中from_logits要设置成False - 没尝试正则化因为具体还不知道如何在调用官方模型时直接添加,分类器和密集连接层也没尝试用
附上个人训练里最好的一次
手动搭建vgg16注意点(见9.3)
- 主要注重最后分类器的参数调整,可以在顶部几层加dropout和BN
- 搭建完什么都不修改val_acc只有40%多,过拟合很严重;
- 需要去掉顶层分类器再用自己尝试出来相对最好的那个,建议密集连接层间添加dropout层
- 在vgg16模型内部加BN和droout有一定效果但是drop_rate不可以过高,位置和数量也可以进行调整(可能也是因为我加了好多层)
- 正则化的添加后效果并不理想(特别是在vgg16主体内部)
- 加载预训练权重:要注意是不是对应层添加了对应的预训练权重,建议搭建时候进行层命名再
by_name = True
(也可以尝试False直接加载,本人之前尝试再解冻的效果好像也还行) - 可以解冻比调用的那次更多层试试看
训练比较好的几次如下:
[1-参考:https://www.heywhale.com/mw/project/630f725c34a1cfd3575ddd7d]
2、应该是val_acc最高的一次:
3、本文保留-每个卷积块后均加BN+drop_rate0.1;顶层分类器添加1024(L1=0.0001)+128(L1=0.0001)+DR0.5; 加载预训练权重并冻住前18层;
九、补充
1、三种交叉熵损失函数简述(不求甚解版)
在分类的任务中,往往会使用交叉熵损失函数。对于二分类,使用的是binary_crossentropy
,在多分类的任务中,经常使用sparse_categorical_crossentropy
和categorical_crossentropy
.
🍅 binary crossentropy二元交叉熵
二分类的任务中,标签值为[0,1],以下为该损失函数的公式
该交叉熵损失函数通常结合sigmoid
函数使用。也可用于多标签分类的损失计算,原理就是将多分类拆成多个二分类,每个类别互相独立预测。
keras.losses.BinaryCrossentropy(
from_logits=False,
label_smoothing=0.0,
axis=-1,
reduction="sum_over_batch_size",
name="binary_crossentropy",
dtype=None,
)
参数:
- from_logits: Whether to interpret y_pred as a tensor of logit values. By default, we assume that y_pred is probabilities (i.e., values in [0, 1]).
- label_smoothing: Float in range [0, 1]. When 0, no smoothing occurs. When > 0, we compute the loss between the predicted labels and a smoothed version of the true labels, where the smoothing squeezes the labels towards 0.5. Larger values of label_smoothing correspond to heavier smoothing.
- axis: The axis along which to compute crossentropy (the features axis). Defaults to -1.
- reduction: Type of reduction to apply to the loss. In almost all cases this should be “sum_over_batch_size”. Supported options are “sum”, “sum_over_batch_size” or None.
sum_over_batch_size
:返回批次中每个样本损失的平均值sum
:返回批次中每个样本损失的和None
:返回每个样本损失的完整数组
- name: Optional name for the loss instance.
- dtype: The dtype of the loss’s computations. Defaults to
None
, which means usingkeras.backend.floatx()
.keras.backend.floatx()
is afloat32
unless set to different value (viakeras.backend.set_floatx()
). If akeras.DTypePolicy
is provided, then thecompute_dtype
will be utilized.
需要的输入: y_true
:是0或者1;y_pred
:模型的预测值,即,代表一个logit(当from_logits=True
时,取值[-inf,inf]
)或一个概率(当from_logits=False
时取值[0.,1]
)的单个浮点值
import tensorflow as tf
y_true = tf.constant([[0., 1.], [0.2, 0.8], [0.3, 0.7], [0.4, 0.6]])
y_pred = tf.constant([[0.6, 0.4], [0.4, 0.6], [0.6, 0.4], [0.8, 0.2]])
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum_over_batch_size')
print(bce(y_true, y_pred).numpy()) # 0.839445
bce = tf.keras.losses.BinaryCrossentropy(reduction='sum')
print(bce(y_true, y_pred).numpy()) # 3.35778
bce = tf.keras.losses.BinaryCrossentropy(reduction='none')
print(bce(y_true, y_pred).numpy()) # [0.9162905 0.5919184 0.79465103 1.0549198]
0.8394452
3.3577807
[0.91629076 0.5919186 0.79465115 1.0549202 ]
🍅 categorical_crossentropy
在TensorFlow中可通过tf.keras.losses.CategoricalCrossentropy
调用,可用于二分类或多分类。在模型中用做损失函数时,注意标签需要是独热编码(one hot)形式,即形如y_true = [[1,0,0],[0, 1, 0], [0, 0, 1]]方式。
原文链接:https://blog.csdn.net/qq_28955669/article/details/135265782
#调用方法:
model.compile(optimizer="adam",
loss=tf.keras.losses.CategoricalCrossentropy(),
# loss='categorical_crossentropy',
metrics=['accuracy'])
🍅 sparse_categorical_crossentropy
tf.keras.losses.SparseCategoricalCrossentropy
也是用于多分类,与上一个CategoricalCrossentropy的差别在于,其接受的分类标签是整型(label mode = int),即形如y_true = [1, 2, 3]的标签类型。所以2、3的选择,在于当前数据集的标签形式。
#函数原型
tf.keras.losses.SparseCategoricalCrossentropy(
from_logits=False,
ignore_class=None,
reduction='sum_over_batch_size',
name='sparse_categorical_crossentropy'
)
```python
#调用方法:
model.compile(optimizer="adam",
loss='sparse_categorical_crossentropy',
# loss=tf.keras.losses.SparseCategoricalCrossentropy(),
metrics=['accuracy'])
#示例代码:
y_true = tf.constant([1, 2])
y_pred = tf.constant([[0.05, 0.95, 0], [0.1, 0.8, 0.1]])
# Using 'auto'/'sum_over_batch_size' reduction type.
scce = keras.losses.SparseCategoricalCrossentropy()
scce(y_true, y_pred)
<tf.Tensor: shape=(), dtype=float32, numpy=1.1769392>
2、VGG16官方模型调用及调试
#调用-去除顶层自定义全连接层,加imagenet权重参数,冻结conv,加BN和dropout
from tensorflow.keras.applications import VGG16
# 加载VGG16模型,不包括全连接层,使用ImageNet的权重
base_model = VGG16(weights="imagenet",
include_top=False,
input_shape=(img_height, img_width, 3),
pooling = "max")
# 冻结VGG16的卷积层,不进行训练
# base_model.trainable = False
#部分解冻?
# 冻结直到某一层的所有层
#仅微调卷积基的最后的两三层
base_model.trainable = True
set_trainable = False
for layer in base_model.layers[:-2]:
if layer.name == 'block5_conv1':
set_trainable = True
if set_trainable:
layer.trainable = True
print(layer)
else:
set_trainable = False
layer.trainable = False
print(base_model.summary(),end="\n")
# 在VGG16基础上添加自定义的全连接层
model = models.Sequential([
base_model,
#layers.GlobalAveragePooling2D(),
#layers.GlobalMaxPooling2D(),
layers.Flatten(),
layers.Dense(1024, activation="relu"),
layers.BatchNormalization(),
layers.Dropout(0.4),
layers.Dense(128, activation= "relu"),
layers.BatchNormalization(),
layers.Dropout(0.4),
layers.Dense(len(class_names), activation="softmax")
])
# 打印网络结构
model.summary()
# model.load_weights("/content/drive/Othercomputers/My laptop/jupyter notebook/xunlianying/vgg16_1_final.weights.h5")
Downloading data from https://storage.googleapis.com/tensorflow/keras-applications/vgg16/vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5
58889256/58889256 [==============================] - 2s 0us/step
<keras.src.layers.convolutional.conv2d.Conv2D object at 0x7f6003f76bc0>
<keras.src.layers.convolutional.conv2d.Conv2D object at 0x7f6003f759f0>
<keras.src.layers.convolutional.conv2d.Conv2D object at 0x7f6003f77fa0>
Model: "vgg16"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
input_2 (InputLayer) [(None, 224, 224, 3)] 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
block1_pool (MaxPooling2D) (None, 112, 112, 64) 0
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
block2_pool (MaxPooling2D) (None, 56, 56, 128) 0
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
block3_pool (MaxPooling2D) (None, 28, 28, 256) 0
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
block4_pool (MaxPooling2D) (None, 14, 14, 512) 0
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
block5_pool (MaxPooling2D) (None, 7, 7, 512) 0
global_max_pooling2d (Glob (None, 512) 0
alMaxPooling2D)
=================================================================
Total params: 14714688 (56.13 MB)
Trainable params: 7079424 (27.01 MB)
Non-trainable params: 7635264 (29.13 MB)
_________________________________________________________________
None
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 512) 14714688
flatten_1 (Flatten) (None, 512) 0
dense_2 (Dense) (None, 1024) 525312
batch_normalization_2 (Bat (None, 1024) 4096
chNormalization)
dropout_3 (Dropout) (None, 1024) 0
dense_3 (Dense) (None, 128) 131200
batch_normalization_3 (Bat (None, 128) 512
chNormalization)
dropout_4 (Dropout) (None, 128) 0
dense_4 (Dense) (None, 17) 2193
=================================================================
Total params: 15378001 (58.66 MB)
Trainable params: 7740433 (29.53 MB)
Non-trainable params: 7637568 (29.14 MB)
_________________________________________________________________
# 加载效果最好的模型权重
model.load_weights("/content/drive/Othercomputers/My laptop/jupyter notebook/xunlianying/vgg16_1_final.weights.h5")
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
vgg16 (Functional) (None, 512) 14714688
flatten (Flatten) (None, 512) 0
dense (Dense) (None, 1024) 525312
batch_normalization (Batch (None, 1024) 4096
Normalization)
dropout (Dropout) (None, 1024) 0
dense_1 (Dense) (None, 128) 131200
batch_normalization_1 (Bat (None, 128) 512
chNormalization)
dropout_1 (Dropout) (None, 128) 0
dense_2 (Dense) (None, 17) 2193
=================================================================
Total params: 15378001 (58.66 MB)
Trainable params: 15375697 (58.65 MB)
Non-trainable params: 2304 (9.00 KB)
_________________________________________________________________
VGG16模型简介:
VGG16模型的设计思想是通过堆叠多个较小的卷积层和池化层来构建深层网络,以增强模型的表达能力。具体来说,VGG16模型由13个卷积层和3个全连接层组成。其中,卷积层主要用于提取输入图像的特征,而全连接层则用于将提取到的特征映射到类别概率上。
VGG16的卷积部分采用了较小的3x3卷积核和步长为1的卷积操作,这种设计方式使得网络可以更深,从而提升了特征的表达能力。在每两个卷积层之间,VGG16还使用了2x2的最大池化层,以减小特征图的尺寸并保留最显著的特征。在最后的卷积层之后,VGG16采用了三个全连接层,每个全连接层都有4096个隐藏单元,最后一个全连接层输出模型的预测结果。
下图为VGG16的模型结构计算过程说明,可以看出其不包含BN层、Dropout等
#官方文档(注释直译)
tf.keras.applications.VGG16(
include_top=True, #表示是否包括顶部三个全连接层
weights="imagenet", #x27;imagenet'
#None(随机初始化)、 "imagenet"(在 ImageNet 上进行预训练)或要加载的权重文件的路径 之一。
input_tensor=None, #可选的 Keras 张量(即layers.Input()的输出)用作模型的图像输入
input_shape=None,
#可选的形状元组,仅在 include_top 为 False 时指定(否则输入形状必须是 (224, 224, 3)
#(使用 channels_last 数据格式)或 (3, 224, 224)(使用 “channels_first” 数据格式))。
#它应正好有 3 个输入通道,宽度和高度不应小于 32。例如,(200, 200, 3) 就是一个有效值。
pooling=None,
#当 include_top 为False时,用于特征提取的可选池化模式。
#None表示模型的输出将是最后一个卷积块的 4D 张量输出。
#avg表示将对最后一个卷积块的输出应用全局平均池化,因此模型的输出将是一个二维张量。max表示应用全局最大池化。
classes=1000,
# 对图像进行分类的可选类别数,仅在 include_top 为 True 且未指定weights时可指定。
classifier_activation="softmax" #x27;softmax'
#字符串或可调用字符串。用于“顶层”的激活函数。
#除非 include_top=True,否则忽略。
#设置 classifier_activation=None 会返回 “顶层 ”的logits。加载预训练权重时,classifier_activation 只能为 None 或 “softmax”
)
# 调用vgg16
from tensorflow.keras.applications.vgg16 import VGG16
vgg16_model = VGG16(
include_top=True,
weights=None,
input_tensor=None,
pooling = "none",
input_shape=(img_height, img_width, 3),
classes=len(class_names),
classifier_activation='softmax'
)
vgg16_model.summary() # 打印网络结构
Model: "vgg16"
┏━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┓ ┃ Layer (type) ┃ Output Shape ┃ Param # ┃ ┡━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━┩ │ input_layer_15 (InputLayer) │ (None, 224, 224, 3) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block1_conv1 (Conv2D) │ (None, 224, 224, 64) │ 1,792 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block1_conv2 (Conv2D) │ (None, 224, 224, 64) │ 36,928 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block1_pool (MaxPooling2D) │ (None, 112, 112, 64) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block2_conv1 (Conv2D) │ (None, 112, 112, 128) │ 73,856 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block2_conv2 (Conv2D) │ (None, 112, 112, 128) │ 147,584 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block2_pool (MaxPooling2D) │ (None, 56, 56, 128) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block3_conv1 (Conv2D) │ (None, 56, 56, 256) │ 295,168 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block3_conv2 (Conv2D) │ (None, 56, 56, 256) │ 590,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block3_conv3 (Conv2D) │ (None, 56, 56, 256) │ 590,080 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block3_pool (MaxPooling2D) │ (None, 28, 28, 256) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block4_conv1 (Conv2D) │ (None, 28, 28, 512) │ 1,180,160 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block4_conv2 (Conv2D) │ (None, 28, 28, 512) │ 2,359,808 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block4_conv3 (Conv2D) │ (None, 28, 28, 512) │ 2,359,808 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block4_pool (MaxPooling2D) │ (None, 14, 14, 512) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block5_conv1 (Conv2D) │ (None, 14, 14, 512) │ 2,359,808 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block5_conv2 (Conv2D) │ (None, 14, 14, 512) │ 2,359,808 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block5_conv3 (Conv2D) │ (None, 14, 14, 512) │ 2,359,808 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ block5_pool (MaxPooling2D) │ (None, 7, 7, 512) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ flatten (Flatten) │ (None, 25088) │ 0 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ fc1 (Dense) │ (None, 4096) │ 102,764,544 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ fc2 (Dense) │ (None, 4096) │ 16,781,312 │ ├──────────────────────────────────────┼─────────────────────────────┼─────────────────┤ │ predictions (Dense) │ (None, 17) │ 69,649 │ └──────────────────────────────────────┴─────────────────────────────┴─────────────────┘
Total params: 134,330,193 (512.43 MB)
Trainable params: 134,330,193 (512.43 MB)
Non-trainable params: 0 (0.00 B)
3、VGG16手动搭建
#参考https://www.heywhale.com/mw/project/630f725c34a1cfd3575ddd7d
drop_rate = 0.1 #此处drop_rate设置太高效果不好
weight_decay = 0 #此处设置正则化效果变差
vgg16_model = models.Sequential([
layers.Input(shape=(224, 224, 3)),
layers.Rescaling(1./255),
#两次使用64个3*3的卷积核,采用l2正则化,池化后维度(112,112,64)
layers.Conv2D(64, (3, 3),padding='same',activation='relu', input_shape=(img_height, img_width, 3), name = "block1_conv1"),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(64, (3, 3), padding='same',activation='relu', name = "block1_conv2"),
layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
#两次使用128个3*3的卷积核,池化后维度(56,56,128)
layers.Conv2D(128, (3, 3),padding='same',activation='relu', name = "block2_conv1"),
#layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers.Conv2D(128, (3, 3),padding='same',activation='relu', name = "block2_conv2"),
layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers. MaxPooling2D(pool_size=(2,2),strides=(2,2)),
#三次使用256个3*3的卷积核,池化后维度(28,28,256)
layers.Conv2D(256, (3, 3), padding='same',activation='relu', name = "block3_conv1"),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(256, (3, 3), padding='same', activation='relu',name = "block3_conv2"),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(256, (3, 3),padding='same', activation='relu',name = "block3_conv3"),
layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers. MaxPooling2D(pool_size=(2,2),strides=(2,2)),
#三次使用512个3*3的卷积核,池化后维度(14,14,512)
layers.Conv2D(512, (3, 3),padding='same', activation='relu',name = "block4_conv1"),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(512, (3, 3),padding='same', activation='relu',name = "block4_conv2"),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(512, (3, 3),padding='same', activation='relu', name = "block4_conv3"),
layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
layers.Conv2D(512, (3, 3),padding='same', activation='relu',name = "block5_conv1",kernel_regularizer=keras.regularizers.L1(weight_decay)),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(512, (3, 3),padding='same', activation='relu',name = "block5_conv2", kernel_regularizer=keras.regularizers.L1(weight_decay)),
#layers.BatchNormalization(),
#layers.Dropout(drop_rate),
layers.Conv2D(512, (3, 3),padding='same', activation='relu', name = "block5_conv3", kernel_regularizer=keras.regularizers.L1(weight_decay)),
layers.BatchNormalization(),
layers.Dropout(drop_rate),
layers.MaxPooling2D(pool_size=(2,2),strides=(2,2)),
])
vgg16_model.summary() # 打印网络结构
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
batch_normalization (Batch (None, 224, 224, 64) 256
Normalization)
dropout (Dropout) (None, 224, 224, 64) 0
max_pooling2d (MaxPooling2 (None, 112, 112, 64) 0
D)
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
dropout_1 (Dropout) (None, 112, 112, 128) 0
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
batch_normalization_1 (Bat (None, 112, 112, 128) 512
chNormalization)
dropout_2 (Dropout) (None, 112, 112, 128) 0
max_pooling2d_1 (MaxPoolin (None, 56, 56, 128) 0
g2D)
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
batch_normalization_2 (Bat (None, 56, 56, 256) 1024
chNormalization)
dropout_3 (Dropout) (None, 56, 56, 256) 0
max_pooling2d_2 (MaxPoolin (None, 28, 28, 256) 0
g2D)
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
batch_normalization_3 (Bat (None, 28, 28, 512) 2048
chNormalization)
dropout_4 (Dropout) (None, 28, 28, 512) 0
max_pooling2d_3 (MaxPoolin (None, 14, 14, 512) 0
g2D)
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
batch_normalization_4 (Bat (None, 14, 14, 512) 2048
chNormalization)
dropout_5 (Dropout) (None, 14, 14, 512) 0
max_pooling2d_4 (MaxPoolin (None, 7, 7, 512) 0
g2D)
=================================================================
Total params: 14720576 (56.15 MB)
Trainable params: 14717632 (56.14 MB)
Non-trainable params: 2944 (11.50 KB)
_________________________________________________________________
vgg16_model.load_weights("./vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5", by_name = True)
#注意使用by_name = True需要提前设置好和权重文件中一样的层名
#查看权重文件中的层名、权重和形状
# import h5py
# # 加载 .h5 文件
# with h5py.File("./vgg16_weights_tf_dim_ordering_tf_kernels_notop.h5", 'r') as f:
# # 遍历所有层级
# for layer_name in f.attrs['layer_names']:
# g = f[layer_name]
# # 遍历该层级下的所有权重
# for weight_name in g.attrs['weight_names']:
# weight = g[weight_name]
# # 打印层名和权重形状
# print(f"层名: {layer_name}, 权重名: {weight_name}, 形状: {weight.shape}")
for layer in vgg16_model.layers[:18]:
print(layer.name) #可以输出冻结了哪些层
layer.trainable = False
vgg16_model.summary()
rescaling
block1_conv1
block1_conv2
batch_normalization
dropout
max_pooling2d
block2_conv1
dropout_1
block2_conv2
batch_normalization_1
dropout_2
max_pooling2d_1
block3_conv1
block3_conv2
block3_conv3
batch_normalization_2
dropout_3
max_pooling2d_2
Model: "sequential"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
rescaling (Rescaling) (None, 224, 224, 3) 0
block1_conv1 (Conv2D) (None, 224, 224, 64) 1792
block1_conv2 (Conv2D) (None, 224, 224, 64) 36928
batch_normalization (Batch (None, 224, 224, 64) 256
Normalization)
dropout (Dropout) (None, 224, 224, 64) 0
max_pooling2d (MaxPooling2 (None, 112, 112, 64) 0
D)
block2_conv1 (Conv2D) (None, 112, 112, 128) 73856
dropout_1 (Dropout) (None, 112, 112, 128) 0
block2_conv2 (Conv2D) (None, 112, 112, 128) 147584
batch_normalization_1 (Bat (None, 112, 112, 128) 512
chNormalization)
dropout_2 (Dropout) (None, 112, 112, 128) 0
max_pooling2d_1 (MaxPoolin (None, 56, 56, 128) 0
g2D)
block3_conv1 (Conv2D) (None, 56, 56, 256) 295168
block3_conv2 (Conv2D) (None, 56, 56, 256) 590080
block3_conv3 (Conv2D) (None, 56, 56, 256) 590080
batch_normalization_2 (Bat (None, 56, 56, 256) 1024
chNormalization)
dropout_3 (Dropout) (None, 56, 56, 256) 0
max_pooling2d_2 (MaxPoolin (None, 28, 28, 256) 0
g2D)
block4_conv1 (Conv2D) (None, 28, 28, 512) 1180160
block4_conv2 (Conv2D) (None, 28, 28, 512) 2359808
block4_conv3 (Conv2D) (None, 28, 28, 512) 2359808
batch_normalization_3 (Bat (None, 28, 28, 512) 2048
chNormalization)
dropout_4 (Dropout) (None, 28, 28, 512) 0
max_pooling2d_3 (MaxPoolin (None, 14, 14, 512) 0
g2D)
block5_conv1 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv2 (Conv2D) (None, 14, 14, 512) 2359808
block5_conv3 (Conv2D) (None, 14, 14, 512) 2359808
batch_normalization_4 (Bat (None, 14, 14, 512) 2048
chNormalization)
dropout_5 (Dropout) (None, 14, 14, 512) 0
max_pooling2d_4 (MaxPoolin (None, 7, 7, 512) 0
g2D)
=================================================================
Total params: 14720576 (56.15 MB)
Trainable params: 12981248 (49.52 MB)
Non-trainable params: 1739328 (6.64 MB)
_________________________________________________________________
from keras import regularizers
model = models.Sequential([
vgg16_model,
layers.Flatten(),
layers.Dense(1024, activation='relu'), #kernel_regularizer = regularizers.l1(0.0001)),
#layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(128, activation="relu"), #kernel_regularizer = regularizers.l1(0.0001)),
#layers.BatchNormalization(),
layers.Dropout(0.5),
layers.Dense(len(class_names), activation="softmax")
])
model.build(input_shape=(batch_size, img_height, img_width, 3))
model.summary()
Model: "sequential_1"
_________________________________________________________________
Layer (type) Output Shape Param #
=================================================================
sequential (Sequential) (None, 7, 7, 512) 14720576
flatten (Flatten) (32, 25088) 0
dense (Dense) (32, 1024) 25691136
dropout_6 (Dropout) (32, 1024) 0
dense_1 (Dense) (32, 128) 131200
dropout_7 (Dropout) (32, 128) 0
dense_2 (Dense) (32, 17) 2193
=================================================================
Total params: 40545105 (154.67 MB)
Trainable params: 38805777 (148.03 MB)
Non-trainable params: 1739328 (6.64 MB)
_________________________________________________________________