【Intel校企合作课程】猫狗大战图像分类-CSDN博客

本文链接：https://blog.csdn.net/xiao_ba17/article/details/135535568

1.作业简介

1.1问题描述

在这个问题中，你将面临一个经典的机器学习分类挑战——猫狗大战。你的任务是建立一个分类模型，能够准确地区分图像中是猫还是狗。

1.2预期解决方案

你的目标是通过训练一个机器学习模型，使其在给定一张图像时能够准确地预测图像中是猫还是狗。模型应该能够推广到未见过的图像，并在测试数据上表现良好。我们期待您将其部署到模拟的生产环境中——这里推理时间和二分类准确度（F1分数）将作为评分的主要依据。

1.3数据集

数据集：

链接：https://pan.baidu.com/s/1kfIuyXuvexREWAJ1ndFs1w

提取码：jc34

2.数据预处理

2.1数据集结构

本项目数据集分为两部分，train和test文件夹，本次项目由于需要在测试中结果进行F1评估，故只使用train数据集中带有标签的数据。

外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传

查看train中一张图片。

在这里插入图片描述

2.2数据集拆分

为了在模型训练后有带有标签的数据来验证模型的正确性，我这里将上文中的train数据集拆分成train和test数据集。

import os
import shutil

# 设置数据集路径
dataset_dir = '/kaggle/working/dac_train/train'  # 替换为您存储数据集的实际路径

# 创建训练集和测试集目录
train_dir = '/kaggle/working/train'  # 替换为您希望创建训练集的目录路径
test_dir = '/kaggle/working/test'  # 替换为您希望创建测试集的目录路径
os.makedirs(train_dir, exist_ok=True)
os.makedirs(test_dir, exist_ok=True)

# 将猫狗图像分别复制到训练集和测试集目录
for filename in os.listdir(dataset_dir):
    if 'cat' in filename:
        # 复制猫图像到训练集或测试集
        src = os.path.join(dataset_dir, filename)
        if int(filename.split('.')[1]) < 10000:
            dst = os.path.join(train_dir, filename)
        else:
            dst = os.path.join(test_dir, filename)
        shutil.copyfile(src, dst)
    elif 'dog' in filename:
        # 复制狗图像到训练集或测试集
        src = os.path.join(dataset_dir, filename)
        if int(filename.split('.')[1]) < 10000:
            dst = os.path.join(train_dir, filename)
        else:
            dst = os.path.join(test_dir, filename)
        shutil.copyfile(src, dst)

2.3预处理数据

为保证后续送入模型的数据的一致性，我定义了一个preprocess_data的函数，来对图片数据进行预处理，包括标签的提取、图片大小的统一和数据归一化。

import os
import cv2
import numpy as np

train_dir = '/kaggle/working/train'  # 替换为您希望创建训练集的目录路径
test_dir = '/kaggle/working/test'  # 替换为您希望创建测试集的目录路径
img_size = 224  # 调整图像大小为 224x224 像素（可根据需要调整）

def preprocess_data(directory):
    images = []
    labels = []
    
    for filename in os.listdir(directory):
        if 'cat' in filename:
            label = 0
        elif 'dog' in filename:
            label = 1
        else:
            continue
        
        img_path = os.path.join(directory, filename)
        img = cv2.imread(img_path)
        if img is None:
            print(f"Failed to load image: {img_path}")
            continue
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = cv2.resize(img, (img_size, img_size))
        img = img.astype('float32') / 255.0  # 归一化像素值到[0, 1]之间
        
        images.append(img)
        labels.append(label)
    
    return np.array(images), np.array(labels)

# 预处理训练集数据
train_images, train_labels = preprocess_data(train_dir)

# 预处理测试集数据
test_images, test_labels = preprocess_data(test_dir)

2.4定义数据集

本项目中，我自定义了一个CustomDataset对象，来保存处理过后的图像和标签，便于后续data loader对模型训练进行数据批量输入。

class CustomDataset(Dataset):
    def __init__(self, images, labels, transform=None):
        self.images = images
        self.labels = labels
        self.transform = transform
    
    def __len__(self):
        return len(self.images)
    
    def __getitem__(self, index):
        image = self.images[index]
        label = self.labels[index]
        
        if self.transform:
            image = self.transform(image)
        
        return image, label

2.5构建数据集

利用我定义的数据集对象，将之前预处理的图像和标签加载进去，并将数据集放进data loader等待后续训练取用。

transform = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize(mean=[0.5, 0.5, 0.5], std=[0.5, 0.5, 0.5])
])

# Create data loaders for the training and test sets
train_dataset = CustomDataset(train_images, train_labels, transform=transform)
test_dataset = CustomDataset(test_images, test_labels, transform=transform)

train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)

3.使用卷积神经网络识别猫狗图像

3.1神经网络结构

当涉及到神经网络的结构时，一个简单的示例是单层感知器（Perceptron）。感知器是一种最基本的神经网络模型，它由输入层、权重、激活函数和输出层组成。以下是一个简单的感知器结构：

输入层（Input Layer）： 接收输入特征的层。每个输入特征都与一个权重相关联。
权重（Weights）： 每个输入特征都有一个相关联的权重，表示其对模型的影响程度。权重用于调整输入的重要性。
加权和（Weighted Sum）： 输入层的每个特征乘以其相应的权重，然后将这些加权的输入求和，形成加权和。
激活函数（Activation Function）： 加权和通过激活函数，激活函数决定了神经元是否激活。常用的激活函数包括阶跃函数、sigmoid函数、ReLU（Rectified Linear Unit）等。
输出层（Output Layer）： 激活函数的输出作为神经网络的最终输出。

3.2卷积神经网络

卷积神经网络（Convolutional Neural Network，CNN）是一种专门用于处理具有网格结构数据（如图像和视频）的深度学习模型。CNN 在计算机视觉任务中取得了巨大成功，因为它能够有效地捕获图像中的空间结构信息。

3.3深度神经网络

深度神经网络（Deep Neural Network，DNN）是一种神经网络结构，其具有多个隐藏层，使其成为深层次模型。深度神经网络是深度学习的核心组成部分，能够学习和表示更抽象、更复杂的数据特征，适用于各种机器学习任务。

3.4自定义网络结构

针对本项目，猫狗的分类属于二分类任务，我使用PyTorch自定义了一个四层卷积-池化的卷积神经网络，可以满足猫狗分类任务的要求。

class CNN(nn.Module):
    def __init__(self):
        super(CNN, self).__init__()
        self.conv1 = nn.Conv2d(in_channels=3, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.bn1 = nn.BatchNorm2d(64)
        self.relu = nn.ReLU()
        self.pool1 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv2 = nn.Conv2d(in_channels=64, out_channels=64, kernel_size=3, stride=1, padding=1)
        self.bn2 = nn.BatchNorm2d(64)
        self.pool2 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv3 = nn.Conv2d(in_channels=64, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.bn3 = nn.BatchNorm2d(32)
        self.pool3 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.conv4 = nn.Conv2d(in_channels=32, out_channels=32, kernel_size=3, stride=1, padding=1)
        self.bn4 = nn.BatchNorm2d(32)
        self.pool4 = nn.MaxPool2d(kernel_size=2, stride=2)
        self.flatten = nn.Flatten()
        self.fc1 = nn.Linear(in_features=32*14*14, out_features=2)

    def forward(self, x):
        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.pool1(x)

        x = self.conv2(x)
        x = self.bn2(x)
        x = self.relu(x)
        x = self.pool2(x)

        x = self.conv3(x)
        x = self.bn3(x)
        x = self.relu(x)
        x = self.pool3(x)

        x = self.conv4(x)
        x = self.bn4(x)
        x = self.relu(x)
        x = self.pool4(x)

        x = self.flatten(x)
        x = self.fc1(x)
        return x

下图为该模型的网络结构。

在这里插入图片描述

4.在GPU上训练

4.1参数设置

创建自定义的网络模型，并将模型设置为GPU模式，再简单设置一下优化器等参数。

device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model = CNN()
model.to(device)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)

4.2在GPU上训练

将训练数据送入模型进行训练，共训练10轮，每送入256批数据打印一次Loss。

num_epochs = 10
print_every = 256  # 输出损失的频率

for epoch in range(num_epochs):
    running_loss = 0.0
    
    for batch_idx, (images, labels) in enumerate(train_loader, 1):
        images, labels = images.to(device), labels.to(device)
        optimizer.zero_grad()
        
        outputs = model(images)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
        
        running_loss += loss.item()

        if batch_idx % print_every == 0:
            print(f"Epoch {epoch+1}, Batch {batch_idx}/{len(train_loader)}, Loss: {running_loss/print_every}")
            running_loss = 0.0

下图为训练过程展示。

在这里插入图片描述

4.3查看test数据集F1分数和时间

将test数据集的数据送入模型，进行推理，评估F1分数并记录推理时间。

import time

predicted_labels = []
true_labels = []

correct = 0
total = 0

# 将模型移动到 GPU 上
model.to(device)

with torch.no_grad():
    inference_start_time = time.time()

    for images, labels in test_loader:
        # 将输入数据和标签移到 GPU 上
        images, labels = images.to(device), labels.to(device)

        outputs = model(images)
        _, predicted = torch.max(outputs, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

        predicted_labels.extend(predicted.cpu().numpy())  # 注意这里使用 .cpu() 将数据移回 CPU
        true_labels.extend(labels.cpu().numpy())

    # 计算整个测试集的推理时间
    inference_end_time = time.time()
    total_inference_time = inference_end_time - inference_start_time
    print(f"Total Inference Time: {total_inference_time} seconds")

# 计算 F1 Score for PyTorch
f1 = f1_score(true_labels, predicted_labels)
print(f"F1 Score: {f1}")

# 计算 Test Accuracy for PyTorch
accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy}%")

运行结果如下。

在这里插入图片描述

4.4保存模型

将模型保存。

torch.save(model, 'model.pth')

5.转移到CPU上

5.1创建模型结构

在CPU设备上创建相同的网络结构，然后载入我们保存的模型文件。

model = torch.load('model.pth')
model.eval()  # 将模型设置为评估模式

5.2尝试在CPU上进行推理

在CPU设备上用和GPU设备上相同的方式，预处理数据，并载入data loader。使用模型对测试集进行推理，查看推理时间和F1分数。

import time

predicted_labels = []
true_labels = []

correct = 0
total = 0

# 记录整个推理开始时间
inference_start_time = time.time()

with torch.no_grad():
    for images, labels in test_loader:
        outputs = model(images)
        _, predicted = torch.max(outputs, 1)

        total += labels.size(0)
        correct += (predicted == labels).sum().item()

        predicted_labels.extend(predicted.numpy())
        true_labels.extend(labels.numpy())

# 记录整个推理结束时间
inference_end_time = time.time()

# 计算整个测试集的推理时间
total_inference_time = inference_end_time - inference_start_time
print(f"Total Inference Time: {total_inference_time} seconds")

# 计算 F1 Score for PyTorch
f1 = f1_score(true_labels, predicted_labels)
print(f"F1 Score: {f1}")

# 计算 Test Accuracy for PyTorch
accuracy = 100 * correct / total
print(f"Test Accuracy: {accuracy}%")

运行结果如下。

在这里插入图片描述

对比GPU下运行结果，我发现CPU上的推理时间比GPU上的慢了不少。

6.使用oneAPI组件

6.1使用Intel Extension for PyTorch进行优化

在上一章中，我发现在CPU上的测试效果并不好，因此在这里我使用Intel Extension for PyTorch对模型进行加速优化，加快模型的推理速度。

import intel_extension_for_pytorch as ipex
import torch

model = torch.load('model.pth')
model.eval()  # 将模型设置为评估模式

# 移动模型和优化器到IPEX设备（CPU）
model= ipex.optimize(model=model, dtype=torch.float32)

我们将优化后的模型在进行一次推理测试，运行结果如下。

在这里插入图片描述

对比未优化的模型，推理速度快了近一半。

6.2保存优化后的模型

保存优化后的模型。

torch.save(model.state_dict(), 'optimized_model.pth')

6.3使用Intel® Neural Compressor量化模型

这里对优化后的模型再做一次量化，以缩小模型体积，加快运行速度。

import os
import torch
from neural_compressor.config import PostTrainingQuantConfig, AccuracyCriterion
from neural_compressor import quantization
from sklearn.metrics import confusion_matrix, accuracy_score, balanced_accuracy_score

# 定义评估函数
def eval_func(model):
    with torch.no_grad():
        y_true = []
        y_pred = []

        for inputs, labels in train_loader:
            inputs = inputs.to('cpu')
            labels = labels.to('cpu')
            preds_probs = model(inputs)
            preds_class = torch.argmax(preds_probs, dim=-1)
            y_true.extend(labels.numpy())
            y_pred.extend(preds_class.numpy())

        return accuracy_score(y_true, y_pred)

# 配置量化参数
conf = PostTrainingQuantConfig(backend='ipex',  # 使用 Intel PyTorch Extension
                               accuracy_criterion=AccuracyCriterion(higher_is_better=True, 
                                                                   criterion='relative',  
                                                                   tolerable_loss=0.01))

# 执行量化
q_model = quantization.fit(model,
                           conf,
                           calib_dataloader=train_loader,
                           eval_func=eval_func)

# 保存量化模型
quantized_model_path = './quantized_models'
if not os.path.exists(quantized_model_path):
    os.makedirs(quantized_model_path)

q_model.save(quantized_model_path)

量化成功后会出现如下输出。

在这里插入图片描述

查看量化后的模型，分别保存为pt文件和json文件。

在这里插入图片描述

6.4使用量化后的模型在CPU上进行推理

最后加载我们量化后的模型。

import torch
import json

# 指定模型和配置文件的路径
model_path = 'quantized_models/best_model.pt'
json_config_path = 'quantized_models/best_configure.json'

# 加载 PyTorch 模型
quantized_model = torch.load(model_path, map_location='cpu')

# 加载 JSON 配置文件
with open(json_config_path, 'r') as json_file:
    json_config = json.load(json_file)

对其进行一次推理，查看运行结果。

在这里插入图片描述

我们发现量化后，模型比量化前更快了，并且准确率也没有改变。

7.总结

在使用oneAPI的优化组件后，可以看见模型的推理时间大幅度下降，从原来的28s到目前的14s，其次，在使用量化工具后，推理时间又下降到了7s，并且整个优化和量化的过程，F1分数并没有很大的波动，一直稳定在0.9左右。本次项目充分证明了oneAPI优秀的模型压缩能力，在保证模型精确度的同时还能缩小模型规模，加快模型运行速度。