Image Classification 四标签图像分类

辰龙_

已于 2024-05-10 22:05:53 修改

阅读量1k

点赞数 18

文章标签：人工智能机器学习计算机视觉

于 2024-05-06 15:24:12 首次发布

本文链接：https://blog.csdn.net/qq_26817675/article/details/138479037

版权

一、简介

通过对模型的训练，预测图像是下列四项中的概率：食物（Food）、服装（Attire）、装饰和签名（Decorationandsignage）、杂项（msic）

项目链接：https://www.kaggle.com/code/peachwuhu/image-classification
可以在这个网站将代码一键复制到自己的Kaggle账号上，进行更深一步的研究

通过这个项目，可以学习到：
1.如何从.csv文件中读取图像文件（位于不同文件夹中）
2.修正类失衡
3.简单图像预处理
4.如何使用预构建模型进行训练和测试

二、数据集

数据集分为四个文件 Test Images、Train Images 、test.csv、train.csv
在这里插入图片描述

Train Images 里面包含5983张图片
Test Images 里面包含3219张图片

在这里插入图片描述

train.csv里面包含5983条信息，第一列是图片名字，第二列是图片类别
test.csv里面包含3219条信息，仅有一列，是图片名字

三、代码解读

3.1 导入所需的库

os:提供了许多与操作系统交互的功能，比如文件操作等。
numpy: 提供了多维数组对象和许多用于操作数组的函数。
pandas: 数据分析库，提供了数据结构和数据分析工具，特别适用于处理结构化数据。
matplotlib.pyplot: 用于绘制图表的库，常用于数据可视化。
matplotlib.image: 提供了对图像文件进行读取和操作的功能。
seaborn: 提供了更高级的统计图形。
cv2: OpenCV库的Python接口，用于计算机视觉任务的流行库，包括图像处理和计算机视觉算法。
sklearn.model_selection: 提供了用于模型选择和评估的函数，比如数据集的分割。
keras.applications: 提供了预训练的深度学习模型
keras.models: 用于定义和训练神经网络模型
keras.callbacks: 包含了一些常用的回调函数，比如早停法（Early Stopping）和学习率调整（ReduceLROnPlateau）。
keras.layers: 提供了构建神经网络模型所需的各种层。
keras.utils: 包含了一些实用函数，比如将标签进行one-hot编码的函数to_categorical。
keras.preprocessing.image: 提供了图像数据的预处理功能，比如图像增强和数据生成器。

import os
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
import seaborn as sns
import cv2
from sklearn.model_selection import train_test_split
from keras.applications import MobileNet, MobileNetV2
from keras.models import Sequential
from keras.callbacks import EarlyStopping
from keras.layers import Dropout, Dense, BatchNormalization
from keras.utils import to_categorical
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import EarlyStopping, ReduceLROnPlateau

path = '../input/images/dataset/'

3.2 输出当前地址下的文件名称

os.listdir(path)

输出：[‘test.csv’, ‘train.csv’, ‘Test Images’, ‘Train Images’]

3.3 读取训练和测试数据集，并显示训练数据集前5行

train_df = pd.read_csv(path + 'train.csv')
test_df = pd.read_csv(path + 'test.csv')
train_df.head()

输出：

3.4 查看训练集上的一些照片

plt.figure(figsize=(10, 10)) # 创建了一个新的图形对象，并设置了其大小为10x10英寸

for i in range(9): # i从0~8
    ax = plt.subplot(3, 3, i + 1) # 创建一个3x3的子图网格，i+1使得从1开始编号
    img = mpimg.imread(path + '/Train Images/' + train_df["Image"][i]) # train_df["Image"][i]代表第i行的图像名，例如train_df["Image"][2]为image10335.jpg
    img = cv2.resize(img, (224, 224))
    plt.imshow(img)
    plt.title(train_df["Class"][i])# 设置当前子图的标题，标题是从训练数据框train_df中的“Class”列中获取的第i个元素
    plt.axis("off")# 不会在图像周围显示坐标轴标签

输出

3.5 检查有几类图像

train_df['Class'].unique()

输出：array([‘Food’, ‘misc’, ‘Attire’, ‘Decorationandsignage’], dtype=object)

3.6 创建字典，将类名和数字对应

class_map = {
    'Food': 0,
    'Attire': 1,
    'Decorationandsignage': 2,
    'misc': 3
}

inverse_class_map = {
    0: 'Food',
    1: 'Attire',
    2: 'Decorationandsignage',
    3: 'misc'
}

3.7 检查类的数量是否平衡

sns.countplot(train_df['Class'])

输出：

train_df["Class"].value_counts()

输出：
Food 2278
Attire 1691
misc 1271
Decorationandsignage 743
Name: Class, dtype: int64

可以看到，类0（食物）的数量高于其他，需要修正
我们必须将一些类似的类数据添加到其他阶级中，以解决其他类的类失衡问题，同时食物仍将是最主要的类

3.8 修正类

将食物类从2278增加1000到3278 其余三类全部补齐到2278

balance_attire = (2278 - 1691)
balance_decoration = (2278 - 743) 
balance_misc = (2278 - 1271) 
balance_food = 1000

recover_balance = { 'Image': [], 'Class': [] }

while balance_food != 0:
    for i in range(train_df.shape[0]): #train_df.shape[0]是5983；shape[0]是行数 shape[1]是列数
        if balance_food == 0:
                break
        #将train_df中前1000项Food的加入到新元组recover_balance中
        if train_df.iloc[i]["Class"] == 'Food':
            recover_balance["Image"].append(train_df.iloc[i]["Image"])
            recover_balance["Class"].append(train_df.iloc[i]["Class"])
            balance_food -= 1
            
# 和上面一样
while balance_attire != 0:
    for i in range(train_df.shape[0]):
        if balance_attire == 0:
                break
        if train_df.iloc[i]["Class"] == 'Attire':
            recover_balance["Image"].append(train_df.iloc[i]["Image"])
            recover_balance["Class"].append(train_df.iloc[i]["Class"])
            balance_attire -= 1
            
# 和上面一样            
while balance_decoration != 0:
    for i in range(train_df.shape[0]):
        if balance_decoration == 0:
                break
        if train_df.iloc[i]["Class"] == 'Decorationandsignage':
            recover_balance["Image"].append(train_df.iloc[i]["Image"])
            recover_balance["Class"].append(train_df.iloc[i]["Class"])
            balance_decoration -= 1
            
# 和上面一样           
while balance_misc != 0:
    for i in range(train_df.shape[0]):
        if balance_misc == 0:
                break
        if train_df.iloc[i]["Class"] == 'misc':
            recover_balance["Image"].append(train_df.iloc[i]["Image"])
            recover_balance["Class"].append(train_df.iloc[i]["Class"])
            balance_misc -= 1
            
balance_df = pd.DataFrame(recover_balance)
balance_df = balance_df.sample(frac = 1) # 乱序
balance_df.head() # 展示前5行

输出

3.9 检查是否已经修正

temp_df = pd.concat([balance_df, train_df])

sns.countplot(temp_df['Class'])

输出：

可以看到符合预期，食物为3278，其余为2278

3.10 将类别转为数字

train_df['Class'] = train_df['Class'].map(class_map).astype(np.uint8)
balance_df['Class'] = balance_df['Class'].map(class_map).astype(np.uint8)
train_df.head()

输出：

Class中食物变为了0

3.11 将图像数据加载到NumPy数组中

h, w = 224, 224
batch_size = 64
epochs = 100

train_path = path + '/Train Images/'
test_path = path + '/Test Images/'

train_images, train_labels = [], []

for i in range(train_df.shape[0]):
    train_image = cv2.imread(train_path + str(train_df.Image[i]))
    train_image = cv2.cvtColor(train_image, cv2.COLOR_BGR2RGB) # 将BGR转换为RGB
    train_image = cv2.resize(train_image, (h, w))
    train_images.append(train_image)
    train_labels.append(train_df.Class[i])

# 将 train_df data 和 balance_df 添加到一起
for i in range(balance_df.shape[0]):
    train_image = cv2.imread(train_path + str(balance_df.Image[i]))
    train_image = cv2.cvtColor(train_image, cv2.COLOR_BGR2RGB)
    train_image = cv2.resize(train_image, (h, w))
    train_images.append(train_image)
    train_labels.append(balance_df.Class[i])

test_images = []

for i in range(test_df.shape[0]):
    test_image = cv2.imread(test_path + str(test_df.Image[i]))
    test_image = cv2.cvtColor(test_image, cv2.COLOR_BGR2RGB)
    test_image = cv2.resize(test_image, (h, w))
    test_images.append(test_image)

train_images = np.array(train_images)
test_images = np.array(test_images)

由于test_images将用于预测，我们不能在模型训练期间将其用作验证集。
因此，我们将通过分割train_images来创建一个train和测试集

3.12 创建训练集和测试集以训练我们的模型

X_train, X_test, y_train, y_test = train_test_split(train_images, to_categorical(train_labels), test_size=0.3, random_state=42)

测试集一共10112张图片
X_train有7078张（占70%），X_test有3034张（占30%）
Y_train有7078张（占70%），Y_test有3034张（占30%）
其中，X代表数据，Y代表标签

我们将使用预构建模型MobileNet来训练我们的数据
下面是一个MobileNet的结构
在这里插入图片描述

可以从该网址了解更多：https://towardsdatascience.com/review-mobilenetv1-depthwise-separable-convolution-light-weight-model-a382df364b69

3.13 创建基本模型

base_model = MobileNet(
    input_shape=(h, w, 3), 
    weights='imagenet',
    include_top=False, 
    pooling='avg'
)

现在我们已经创建了我们的基础模型。
我们的CNN的最后一个全连接层将使用 softmax 激活函数。在这里定义输出类的数量为4

在这里插入图片描述
可以在这里更多地了解全连接层：https://deepai.org/machine-learning-glossary-and-terms/softmax-layer#:~:text=The%20softmax%20function%20can%20be,be%20difficult%20to%20work%20with.

3.14 创建整个模型

base_model.trainable = False

output_class = 4

model = Sequential([
  base_model,
  Dense(output_class, activation='softmax')
])

model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.summary()

输出：

现在整个模型已经创建完毕。
我们现在将定义一些回调函数。回调函数基本上在我们的模型中检测到损失增加或过拟合时停止训练过程（并保存模型）。回调函数还根据模型性能减小学习率。

3.15 定义回调函数

earlystop = EarlyStopping(monitor='val_loss', patience=5)

learning_rate_reduction = ReduceLROnPlateau(monitor='val_acc', 
                                            patience=2, 
                                            verbose=1, 
                                            factor=0.5, 
                                            min_lr=0.00001)

callbacks = [earlystop, learning_rate_reduction]

EarlyStopping 是一个 Keras 回调函数，用于在训练过程中监视指定的指标（在这里是验证集上的损失），并在满足某些条件时停止训练。参数 monitor=‘val_loss’ 表示监视验证集上的损失值。patience=5 表示如果在连续 5 个训练周期中都没有改善，则停止训练。这有助于防止模型过拟合，并且可以提前终止训练，以节省计算资源。

ReduceLROnPlateau 是另一个 Keras 回调函数，用于在验证集上监视指定的指标（这里是验证集的准确率），并在指定条件下减小学习率。参数 monitor=‘val_acc’ 表示监视验证集上的准确率。patience=2 表示如果在连续 2 个训练周期中都没有改善，则减小学习率。verbose=1表示输出进度条，显示训练过程中的每个epoch的进度，包括训练损失和验证损失等信息。factor=0.5 表示减小学习率的因子，即将学习率乘以0.5。min_lr=0.00001 表示学习率的下限，即学习率不会减小到小于这个值。这有助于在训练过程中逐渐降低学习率，以提高模型的性能和稳定性。

最后，将这两个回调函数放入一个列表中，以便在训练模型时使用。这样一来，当训练过程中触发任何一个回调函数的条件时，训练过程将相应地停止或调整学习率。

3.16 简单易用的图像预处理

datagen = ImageDataGenerator(
        rescale=1./255,
        shear_range=0.2,
        zoom_range=0.2,
        horizontal_flip=True)

rescale=1./255：这个参数将图像的像素值缩放到 [0, 1] 的范围内，将像素值除以 255。这是一种常见的预处理步骤，有助于提高模型训练的稳定性和收敛速度。

shear_range=0.2：这个参数指定了剪切变换的范围，用来模拟图像在平面内的剪切效果。它表示随机应用剪切变换的角度范围为 [-0.2, 0.2] 弧度。

zoom_range=0.2：这个参数指定了随机缩放的范围。它表示图像可以被随机缩放的范围为 [0.8, 1.2]，其中 1 表示不进行缩放。

horizontal_flip=True)：这个参数指定是否随机水平翻转图像。设置为 True 表示进行随机水平翻转。水平翻转是一种常见的图像增强技术，有助于增加训练数据的多样性，从而提高模型的泛化能力。

3.17 训练模型

model.fit_generator(datagen.flow(X_train, y_train, batch_size = batch_size), validation_data = (X_test, y_test),
                    steps_per_epoch = len(X_train) / batch_size, epochs = epochs, callbacks = callbacks)

datagen.flow(X_train, y_train, batch_size=batch_size): 这里使用了 ImageDataGenerator 对象 datagen 的 flow() 方法来生成批量的增强后的训练数据。X_train 是训练数据的特征，y_train 是对应的标签，batch_size 是指定的批量大小。

validation_data=(X_test, y_test): 这个参数指定了用于验证模型性能的数据集。X_test 是验证集的特征，y_test 是对应的标签。

steps_per_epoch=len(X_train) / batch_size: 这个参数指定了每个训练周期（epoch）中要执行的训练步数。通常，它应该设置为训练集的样本数除以批量大小。

epochs=epochs: 这个参数指定了要训练的总的训练周期数。

callbacks=callbacks): 这个参数指定了要使用的回调函数列表，这里使用了之前定义的 earlystop 和 learning_rate_reduction 回调函数。这些回调函数在训练过程中监视指定的指标，并在满足某些条件时采取相应的行动，比如停止训练或调整学习率。

通过这段代码，模型将使用增强后的训练数据进行训练，并在每个训练周期结束时使用验证集进行性能评估。同时，通过指定的回调函数，可以在训练过程中采取一些自动化的措施，如提前终止训练或调整学习率，以提高模型的性能和稳定性。

3.18 使用模型预测

labels = model.predict(test_images)
print(labels[:4])

输出：
[[0.00311459 0.37252888 0.5049415 0.11941501]
[0.06037426 0.03678949 0.0866313 0.8162049 ]
[0.00663909 0.06747349 0.8179326 0.10795477]
[0.02819716 0.41583702 0.368282 0.18768382]]

使用已训练好的模型对测试集中的图像进行预测，并输出前四个样本的预测结果
第一行的4个数据分别代表测试集第一个图像是：食物（Food）、服装（Attire）、装饰和签名（Decorationandsignage）、杂项（msic）的概率。

label = [np.argmax(i) for i in labels]
print(label[:4])

输出：[2, 3, 2, 1]

选择概率最大的那个类

class_label = [inverse_class_map[x] for x in label]
print(class_label[:4])

输出：[‘Decorationandsignage’, ‘misc’, ‘Decorationandsignage’, ‘Attire’]

将序号转化为类的名字

3.19 创建需要提交的结果

submission = pd.DataFrame({ 'Image': test_df.Image, 'Class': class_label })
submission.head()

输出：

辰龙_

关注

18
点赞
踩
16

收藏

觉得还不错? 一键收藏
1
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫