深度学习T9猫狗识别2

最新推荐文章于 2024-09-09 23:51:40 发布

我也不太懂

最新推荐文章于 2024-09-09 23:51:40 发布

阅读量125

点赞数

文章标签：深度学习人工智能

本文链接：https://blog.csdn.net/qq_66033623/article/details/131117077

版权

本文详细介绍了如何在Python环境中，利用TensorFlow2.5搭建VGG16网络进行深度学习。首先，设置了GPU环境并导入数据集，接着进行了数据预处理，包括数据加载、归一化和数据集划分。然后，构建了VGG16网络模型，并展示了模型结构。最后，编译模型并进行了训练，同时监控了训练和验证过程的损失与准确性。

摘要由CSDN通过智能技术生成

本文为🔗365天深度学习训练营中的学习记录博客
原作者：K同学啊|接辅导、项目定制

我的环境：

1.语言：python3.7

2.编译器：pycharm

3.深度学习环境：TensorFlow2.5

一.前期工作

1.设置GPU

若是使用的是cpu可忽略

import tensorflow as tf
gpus = tf.config.list_physical_devices("GPU")

if gpus:
    gpu0 = gpus[0]
    tf.config.experimental.set_memory_growth(gpu0, True)
    tf.config.set_visible_devices([gpu0],"GPU")

使用cpu训练

import os
os.environ["CUDA_VISIBLE_DEVICES"] = "-1"

2.导入数据集

import matplotlib.pyplot as plt
import pathlib, PIL, warnings
import tensorflow as tf
# 用来设置中文字体，此处为黑体
plt.rcParams['font.sans-serif'] = ['SimHei']
# 用来显示负号
plt.rcParams['axes.unicode_minus'] = False
# 隐藏警告
warnings.filterwarnings('ignore')

# 导入数据
path = 'E:/TF环境/T8_data'
data_dir = pathlib.Path(path)

# 显示数据
image_count = len(list(data_dir.glob('*/*')))
print('图片总数为',image_count)

与前几次的学习任务一样，本次也是使用了pathlib模块，将data_dir中存储的路径传递给pathlib.Path类型的对象。方便我们对文件路径进行操作。

图片总数为 3400

二、数据预处理

1、加载数据

图片格式设置

batch_size = 64
img_height = 224
img_width = 224

划分训练集：

tf.keras.preprocessing.image_dataset_from_directory函数是TensorFlow中用于构建图像数据集的便捷工具，它可从一个给定的目录中（通常包含训练和验证子目录）自动加载图像，并创建一个Dataset对象用于模型训练。该函数的使用非常方便，其主要参数如下：

directory: 字符串类型，表示包含各类别子目录的根目录路径。
labels: 可选参数，可指定标签列表。如果未指定，则类别名称将用作标签值。
label_mode: 标签模式，支持int、categorical、binary等模式。
class_names: 可选参数，手动提供类别的名字。如果未设置，则比赛会利用字典排序来确定类别名称。
color_mode: 图像颜色模式，通常为rgb或grayscale。
batch_size: 设置mini-batch大小。
image_size: 图像调整大小(高度, 宽度)的元组。
shuffle: 是否对数据进行随机化处理，一般设置为True。

train_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="training",
    label_mode = "categorical",
    seed = 123,
    image_size = (img_height, img_width),
    batch_size = batch_size
)

Found 3400 files belonging to 2 classes.
Using 2720 files for training.

划分验证集：

验证集虽然没有直接参与模型的训练过程，但是为我们增加了一个人工调试的环节。我们可以根据每一轮的训练在测试集上的表现来决定是否需要训练进行early stop，还可以根据这个过程来调整模型的超参。

val_ds = tf.keras.preprocessing.image_dataset_from_directory(
    data_dir,
    validation_split=0.2,
    subset="validation",
    seed = 123,
    image_size = (img_height, img_width),
    batch_size = batch_size
)

Found 3400 files belonging to 2 classes.
Using 680 files for validation.

查看标签

class_names = train_ds.class_names
print(class_names)

['cat', 'dog']

2、再次检查数据

# 查看数据
for image_batch, labels_batch in train_ds:
    print(image_batch.shape)
    print(labels_batch.shape)
    break

(64, 224, 224, 3)
(64, 2)

image_batch 是形状的张量（8，224，224，3）。这是一批形状224x224x3的8张图片。
labels_batch是形状（8，）的张量，这些标签对应8张图片。

3、配置数据集

AUTOTUNE = tf.data.AUTOTUNE

def preprocess_image(image,label):
    return(image/255.0,label)

#归一化处理

train_ds = train_ds.map(preprocess_image,num_parallel_calls=AUTOTUNE)
val_ds = val_ds.map(preprocess_image,num_parallel_calls=AUTOTUNE)

train_ds = train_ds.cache().shuffle(1000).prefetch(buffer_size=AUTOTUNE)
val_ds = val_ds.cache().prefetch(buffer_size=AUTOTUNE)

AUTOTUNE 是 TensorFlow 的一个常量。它表示 TensorFlow 数据处理流程中可以自动选择最优化参数（例如 GPU 处理数量等）的范围，在不同的硬件配置下可能会有不同的取值。

tf.data.Dataset 类中的 map() 方法，它可以将一个函数应用到数据集的每个元素上，并返回一个新的数据集。map() 方法接受两个参数：第一个是要应用的函数，第二个是要应用函数的输入数据集。rain_ds.map(preprocess_image) 将 preprocess_image 函数应用于 train_ds 数据集的每个元素，生成一个新的数据集。同样地，val_ds.map(preprocess_image) 将 preprocess_image 函数应用于 val_ds 数据集的每个元素，生成一个新的数据集。

train_ds.cache() 和 val_ds.cache() 函数是 Tensorflow 的数据转换函数，它们的作用是将数据集中的元素缓存到内存或者磁盘中，以便后续访问时能够更快地读取数据。使用缓存可以避免由于磁盘 I/O 等因素导致数据读取速度变慢的问题，从而加速训练或评估过程。

train_ds.shuffle(1000) 函数是 Tensorflow 的数据转换函数，它的作用是将输入数据集中的元素随机打乱顺序。
这样做的目的是防止模型过拟合，并促进模型对不同数据的学习能力。其中，1000 表示用于对数据集进行重排的元素数量，其具体取值可以根据数据集大小进行调整。

shuffle（）：打乱数据。

prefetch（）：预取数据，加速运算。

cache（）：将数据集缓存到内存中，加速运行。

4、可视化数据

plt.figure(figsize = (15, 10))
for images, labels in train_ds.take(1):
  for i in range(8):
    ax = plt.subplot(5, 8, i + 1)
    plt.imshow(images[i])
    plt.title(class_names[labels[i]])
    plt.axis("off")
plt.show()

三、构建VG-16网络

VGG优缺点分析：

VGG优点

VGG的结构很简洁，整个网络都使用了同样大小的卷积核尺寸（3x3）和最大池化尺寸（2x2）.

VGG缺点

1、训练时间过长。调参难度大。

2、需要的存储容量大，不利于部署。

VGG16 是一种卷积神经网络(CNN)模型，由牛津大学计算机视觉组(Visual Geometry Group, VGG)提出。VGG16 是一个广泛使用的预训练模型，用于图像分类、目标检测等计算机视觉任务。

VGG16 网络结构包含一个输入层、六个卷积层和三个全连接层。

输入层：VGG16 的输入层大小为 224x224x3。这个输入层接收一张大小为 224x224x3 的彩色图像作为输入。

卷积层：VGG16 包括六个卷积层，每个卷积层都有多个卷积核。卷积层按顺序排列如下：

a. 卷积层 1:输出通道数为 64,卷积核大小为 3x3,步长为 1x1。这个卷积层的作用是提取低级别的特征。在 VGG16 中，第一层卷积层的输入通道数为 3(RGB),输出通道数为 64。

b. 最大池化层：输出通道数不变，池化核大小为 2x2,步长为 2x2。这个池化层的作用是对第一层卷积层的输出进行下采样，减少计算量。在 VGG16 中，第二层最大池化层的输出通道数仍为 64。

c. 卷积层 2:输出通道数为 128,卷积核大小为 3x3,步长为 1x1。这个卷积层的作用是进一步提取更高级别的特征。在 VGG16 中，第三层卷积层的输入通道数仍为 3(RGB),输出通道数为 128。

d. 最大池化层：输出通道数不变，池化核大小为 2x2,步长为 2x2。这个池化层的作用与第二层最大池化层相同。在 VGG16 中，第四层最大池化层的输出通道数仍为 128。

e. 卷积层 3:输出通道数为 256,卷积核大小为 3x3,步长为 1x1。这个卷积层的作用是提取更高级别的特征。在 VGG16 中，第五层卷积层的输入通道数仍为 3(RGB)

from tensorflow.keras import layers,models,Input
from tensorflow.keras.models import Model
from tensorflow.keras.layers import Conv2D,MaxPooling2D,Dense,Flatten,Dropout
def VGG16(nb_classes, input_shape):
    input_tensor = Input(shape=input_shape)
    x = Conv2D(64,(3,3),activation='relu',padding='same',name='block1_conv1')(input_tensor)
    x = Conv2D(64,(3,3),activation='relu',padding='same',name='block1_conv2')(x)
    x = MaxPooling2D((2,2),strides=(2,2),name='block1_pool')(x)
    x = Conv2D(128,(3,3),activation='relu',padding='same',name='block2_conv1')(x)
    x = Conv2D(128,(3,3),activation='relu',padding='same',name='block2_conv2')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block2_pool')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv1')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv2')(x)
    x = Conv2D(256, (3, 3), activation='relu', padding='same', name='block3_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block3_pool')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block4_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block4_pool')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv1')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv2')(x)
    x = Conv2D(512, (3, 3), activation='relu', padding='same', name='block5_conv3')(x)
    x = MaxPooling2D((2, 2), strides=(2, 2), name='block5_pool')(x)
    x = Flatten()(x)
    x = Dense(4096,activation='relu',name='fc1')(x)
    x = Dense(4096, activation='relu', name='fc2')(x)
    output_tensor = Dense(nb_classes,activation='softmax',name='predictions')(x)
    model = Model(input_tensor,output_tensor)
    return model
 
model = VGG16(1000,(img_width,img_height,3))
model.summary()

Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_1 (InputLayer)         [(None, 224, 224, 3)]     0         
_________________________________________________________________
block1_conv1 (Conv2D)        (None, 224, 224, 64)      1792      
_________________________________________________________________
block1_conv2 (Conv2D)        (None, 224, 224, 64)      36928     
_________________________________________________________________
block1_pool (MaxPooling2D)   (None, 112, 112, 64)      0         
_________________________________________________________________
block2_conv1 (Conv2D)        (None, 112, 112, 128)     73856     
_________________________________________________________________
block2_conv2 (Conv2D)        (None, 112, 112, 128)     147584    
_________________________________________________________________
block2_pool (MaxPooling2D)   (None, 56, 56, 128)       0         
_________________________________________________________________
block3_conv1 (Conv2D)        (None, 56, 56, 256)       295168    
_________________________________________________________________
block3_conv2 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_conv3 (Conv2D)        (None, 56, 56, 256)       590080    
_________________________________________________________________
block3_pool (MaxPooling2D)   (None, 28, 28, 256)       0         
_________________________________________________________________
block4_conv1 (Conv2D)        (None, 28, 28, 512)       1180160   
_________________________________________________________________
block4_conv2 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_conv3 (Conv2D)        (None, 28, 28, 512)       2359808   
_________________________________________________________________
block4_pool (MaxPooling2D)   (None, 14, 14, 512)       0         
_________________________________________________________________
block5_conv1 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv2 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_conv3 (Conv2D)        (None, 14, 14, 512)       2359808   
_________________________________________________________________
block5_pool (MaxPooling2D)   (None, 7, 7, 512)         0         
_________________________________________________________________
flatten (Flatten)            (None, 25088)             0         
_________________________________________________________________
fc1 (Dense)                  (None, 4096)              102764544 
_________________________________________________________________
fc2 (Dense)                  (None, 4096)              16781312  
_________________________________________________________________
predictions (Dense)          (None, 1000)              4097000   
=================================================================
Total params: 138,357,544
Trainable params: 138,357,544
Non-trainable params: 0

四、编译

model.compile(
    optimizer='adam',
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    metrics=['accuracy']
)

五、模型训练

from tqdm import tqdm
import tensorflow.keras.backend as k

epochs = 10
lr = 1e-4

history_train_loss = []
history_train_accuracy = []
history_val_loss = []
history_val_accuracy = []

for epoch in range(epochs):
    train_total = len(train_ds)
    val_total = len(val_ds)
    with tqdm(total=train_total, desc=f'Epoch{epoch + 1}/{epoch}', mininterval=1, ncols=100) as pbar:
        lr = lr * 0.92
        k.set_value(model.optimizer.lr, lr)

        for image, label in train_ds:
            history = model.train_on_batch(image, label)

            train_loss = history[0]
            train_accuracy = history[1]
            pbar.set_postfix({'loss': '%.4f' % train_loss,
                              'accuracy': '%.4f' % train_accuracy,
                              'lr': k.get_value(model.optimizer.lr)}
                             )
            pbar.update(1)
        history_train_loss.append(train_loss)
        history_train_accuracy.append(train_accuracy)
    print('开始验证！')
    with tqdm(total=val_total, desc=f'Epoch{epoch + 1}/{epoch}', mininterval=0.3, ncols=100) as pbar:
        for image, label in val_ds:
            history = model.test_on_batch(image, label)

            val_loss = history[0]
            val_accuracy = history[1]
            pbar.set_postfix({'loss': '%.4f' % val_loss,
                              'accuracy': '%.4f' % val_accuracy}
                             )
            pbar.update(1)
        history_val_loss.append(val_loss)
        history_val_accuracy.append(val_accuracy)
    print('结束验证')
    print('验证loss为：%.4f' % val_loss)
    print('验证准确率为：%.4f' % val_accuracy)

使用了tqdm中的tqdm模块用于在循环或迭代中使用简易的方式显示进度条，其中的参数为total为当次循环的总次数，用于计算百分比；desc是描述进度条的文本；mininterval是每次更新的最小间隔，以秒为单位；nclos用于控制进度条的宽度，默认80。这里将tqdm.tqdm命名为了pdar并在训练集和验证集的循环中使用。

set_postfix()是tqdm库中的一个方法，用于在进度条后面添加附加信息。而update(1)是用于更新进度条的进度，其中的参数1表示进度条的步长，即每次更新进度条的进度增加的量。

这里训练模型使用的是model.train_on_batch()方法。它比fit()方法更加灵活。model.train_on_batch()方法是用于手动批量训练模型的方法。它需要手动传入一个批次的训练数据，然后计算损失并更新模型参数。这个方法通常用于需要更细粒度控制训练过程的场景，比如使用自定义的损失函数或者需要手动调整学习率等。而model.fit()方法则是用于自动批量训练模型的方法。它会自动将训练数据分成多个批次，然后进行训练。在每个批次训练完成后，它会自动计算损失并更新模型参数。这个方法通常用于一般的训练场景，比如使用常见的损失函数和优化器进行训练。

这里使用了tensorflow.keras.backend的set_value()方法修改张量的值。这里修改的为学习率。

六、模型评估

epochs_range = range(epochs)

plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs_range, history_train_loss, label='Training Accuracy')
plt.plot(epochs_range, history_val_accuracy, label='Validation Accuracy')
plt.legend(loc='lower right')
plt.title('Training and Validation Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, history_train_loss, label='Training Loss')
plt.plot(epochs_range, history_val_loss, label='Validation Loss')
plt.legend(loc='upper right')
plt.title('Training and Validation Loss')
plt.show()

七、预测

for images, labels in val_ds.take(1):
     for i in range(8):
         ax = plt.subplot(1,8,i + 1)
         #显示图片
         plt.imshow(images[i].numpy())
         #给图片加一个维度
         img_array = tf.expand_dims(images[i],0)
         #使用模型预测图片中的动物
         predictions = model.predict(img_array)
         plt.title(class_names[np.argmax(predictions)])

         plt.axis("off")