基于 paddlepaddle 的多标签分类实验

L的知识库

已于 2022-02-14 08:40:05 修改

阅读量2.9k

点赞数 1

分类专栏： AI实践文章标签： paddlepaddle 深度学习 python 图像识别

于 2022-02-14 08:32:42 首次发布

本文链接：https://blog.csdn.net/weixin_43273742/article/details/122917478

版权

本文分享了一次基于PaddlePaddle的多标签图像分类实验，介绍了数据准备、预处理、模型定义、损失函数、训练配置、模型保存及评估的全过程，涉及ResNet50_vd模型和多标签分类的评估指标。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

基于 paddlepaddle 的多标签分类实验

最近自己基于paddlepaddle做了一系列实验和工作，这里跟大家分享一下一个简单的多标签分类实验，希望对大家有帮助。

这里仿照各个网盘APP、相册APP的智能分类功能做了一个简易版的智能相册分类实验。与单标签的图像分类任务不同，由于一张相片可能属于多个目标类别，所以在进行图像分类时，需要将相片所属的所有类别找出，这一类图像分类任务也称为多标签分类任务。

所有图片如果侵权，请联系我删掉，谢谢！

图1 智能相册分类示意图

接下来就进入到具体的实验环节，首先导入所有环境：

# coding=utf-8
# 导入环境
import os
import math
import random
import matplotlib.pyplot as plt
# 在notebook中使用matplotlib.pyplot绘图时，需要添加该命令进行显示
%matplotlib inline
import numpy as np
from PIL import Image
import cv2
from sklearn.metrics import hamming_loss, multilabel_confusion_matrix
from sklearn.preprocessing import binarize
from collections import OrderedDict
import paddle
from paddle.io import Dataset
import paddle.nn as nn
from paddle.nn import Conv2D, MaxPool2D, Linear, Dropout, BatchNorm, AdaptiveAvgPool2D, MaxPool2D, AvgPool2D
from paddle.nn.initializer import Uniform
import paddle.nn.functional as F
from paddle.optimizer.lr import CosineAnnealingDecay
from paddle.regularizer import L2Decay
from paddle import ParamAttr

一、数据准备

1.1 数据准备

这里我在网上收集了包含20个常见的相片类别的图片数据并进行了标注，自己实验用，不公开，大家可以自己按照下面的格式整理自己的数据。其中，训练集共16706张图片，验证集共2168张图片，测试集共2117张图片。

图1 智能相册数据集示意图

20个类别包括：Vehicle, Sky, Food, Person, Building, Animal, Cartoons, Certificate, Electronic, Screenshot, BankCard, Mountain, Sea, Bill, Selfie, Night, Aircraft, Flower, Child, Ship

图像文件名	图像标注信息
028012.png	1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0

使用PIL库，随机选取一张图片可视化，观察该数据集的图片数据。

img = Image.open('/home/aistudio/work/dataset/album/img/028012.png')
img = np.array(img)
plt.figure(figsize=(10, 10))
plt.imshow(img)

在这里插入图片描述

1.2 数据预处理

图像分类网络对输入图片的格式、大小有一定的要求，数据灌入模型前，需要对数据进行预处理操作，使图片满足网络训练以及预测的需要。另外，为了扩大训练数据集、抑制过拟合，提升模型的泛化能力，实验中还使用了几种基础的数据增广方法。

本实验的数据预处理共包括如下方法：

图像解码：将图像转为Numpy格式；
调整图片大小：将原图片中短边尺寸统一缩放到256；
随机裁剪图像：从原始图像和注释图像中随机裁剪一个子图像。如果目标裁切尺寸大于原始图像，则将添加右下角的填充。裁剪尺寸为[512, 512]；
随机翻转图像：以一定的概率水平翻转图像。这里使用0.5的概率进行图像翻转；
图像裁剪：将图像的长宽统一裁剪为224×224，确保模型读入的图片数据大小统一；
归一化（normalization）：通过规范化手段，把神经网络每层中任意神经元的输入值分布改变成均值为0，方差为1的标准正太分布，使得最优解的寻优过程明显会变得平缓，训练过程更容易收敛；
通道变换：图像的数据格式为[H, W, C]（即高度、宽度和通道数），而神经网络使用的训练数据的格式为[C, H, W]，因此需要对图像数据重新排列，例如[224, 224, 3]变为[3, 224, 224]。

下面分别介绍数据预处理方法的代码实现。

# 定义decode_image函数，将图片转为Numpy格式
def decode_image(img, to_rgb=True):
    data = np.frombuffer(img, dtype='uint8')
    img = cv2.imdecode(data, 1)
    if to_rgb:
        assert img.shape[2] == 3, 'invalid shape of image[%s]' % (
            img.shape)
        img = img[:, :, ::-1]

    return img

# 定义rand_crop_image函数，对图片进行随机裁剪
def rand_crop_image(img, size, scale=None, ratio=None, interpolation=-1):
    interpolation = interpolation if interpolation >= 0 else None
    if type(size) is int:
        size = (size, size)  # (h, w)
    else:
        size = size

    scale = [0.08, 1.0] if scale is None else scale
    ratio = [3. / 4., 4. / 3.] if ratio is None else ratio

    # 在ratio范围内随机生成一个值作为宽高比
    aspect_ratio = math.sqrt(random.uniform(*ratio))
    w = 1. * aspect_ratio
    h = 1. / aspect_ratio

    img_h, img_w = img.shape[:2]

    bound = min((float(img_w) / img_h) / (w**2),
                (float(img_h) / img_w) / (h**2))
    scale_max = min(scale[1], bound)
    scale_min = min(scale[0], bound)
        
    target_area = img_w * img_h * random.uniform(scale_min, scale_max)
    target_size = math.sqrt(target_area)
    # 得到裁剪框的宽和高
    w = int(target_size * w)
    h = int(target_size * h)
    # 随机生成裁剪框的左上角坐标
    i = random.randint(0, img_w - w)
    j = random.randint(0, img_h - h)
    # 裁剪该区域的图像作为新图片
    img = img[j:j + h, i:i + w, :]
    # 将裁剪后的图片缩放到指定大小
    if interpolation is None:
        return cv2.resize(img, size)
    else:
        return cv2.resize(img, size, interpolation=interpolation)

# 定义rand_flip_image函数，对图片进行随机翻转，其中通过flip_code指定翻转类型
# flip_code=1时为水平翻转；flip_code=0时为垂直翻转；flip_code=-1时同时进行水平和垂直翻转
def rand_flip_image(img, flip_code=1):
    assert flip_code in [-1, 0, 1
                        ], "flip_code should be a value in [-1, 0, 1]"

    # 使用opencv随机翻转图片
    if random.randint(0, 1) == 1:
        return cv2.flip(img, flip_code)
    else:
        return img

# 定义resize_image函数，对图片大小进行调整
def resize_image(img, size=None, resize_short=None, interpolation=-1):
    interpolation = interpolation if interpolation >= 0 else None
    if resize_short is not None and resize_short > 0:
        resize_short = resize_short
        w = None
        h = None
    elif size is not None:
        resize_short = None
        w = size if type(size) is int else size[0]
        h = size if type(size) is int else size[1]
    else:
        raise ValueError("invalid params for ReisizeImage for '\
            'both 'size' and 'resize_short' are None")

    img_h, img_w = img.shape[:2]
    if resize_short is not None:
        percent = float(resize_short) / min(img_w, img_h)
        w = int(round(img_w * percent))
        h = int(round(img_h * percent))
    else:
        w = w
        h = h
    if interpolation is None:
        return cv2.resize(img, (w, h))
    else:
        return cv2.resize(img, (w, h), interpolation=interpolation)

# 定义crop_image函数，对图片进行裁剪
def crop_image(img, size):
    if type(size) is int:
        size = (size, size)
    else:
        size = size  # (h, w)

    w, h = size
    img_h, img_w = img.shape[:2]
    w_start = (img_w - w) // 2
    h_start = (img_h - h) //

最低0.47元/天解锁文章