最适合机器学习入门者的项目进阶：FashionMNIST衣服种类识别之CNN简单实现（超详细！！！附pytorch完整代码）

最新推荐文章于 2024-09-26 16:43:41 发布

不见你字样

最新推荐文章于 2024-09-26 16:43:41 发布

阅读量442

点赞数 6

分类专栏：机器学习文章标签：机器学习 cnn pytorch

本文链接：https://blog.csdn.net/m0_66642542/article/details/141780595

版权

机器学习专栏收录该内容

2 篇文章 0 订阅

订阅专栏

前言

在上一篇中介绍了《FashionMNIST衣服种类识别与可视化之pytorch简单实现》，在此任务中我们使用最简单的分类器MLP做分类任务。但其实应对图像任务，卷积神经网络(Convolutional Neural Networks, CNN)更有优势。原因在于卷积神经网络具有表征学习（representation learning）能力，能够按其阶层结构对输入信息进行平移不变分类（shift-invariant classification）。

本文主要介绍CNN部分的代码实现和解释，其余部分和上一篇博客《FashionMNIST衣服种类识别与可视化之pytorch简单实现》几乎相同，有疑问的可以查看：FashionMNIST衣服种类识别与可视化之pytorch简单实现

CNN代码解释

CNN网络定义代码：

class Conv2d(nn.Module):
    def __init__(self):
        super(Conv2d, self).__init__()
        # 卷积层中没有特别指定步长stride，则默认为1
        self.conv1 = nn.Conv2d(in_channels=1, out_channels= 32, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=5)
        # 池化层中没有特别指定步长stride，则默认为kernel_size，此处为2
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.classify = nn.Linear(256,10)

    def forward(self, x):
        # 初始形状为64*1*28*28，分别为batchsize, channel_num, height, width
        batch = x.size(0)
        x = self.conv1(x)
        # 此时形状为64*32*24*24，因为24 = 28-kernel_size+1 = 28-5+1
        x = F.relu(self.pool(x))
        # 此时形状为64*32*12*12
        x = F.relu(self.pool(self.conv2(x)))
        # 此时形状为64*16*4*4，因此Linear的输入维度为16*4*4 = 256
        # 将四维转为二维，方便进行全连接层的分类任务
        x = x.reshape(batch,-1)
        x = self.classify(x)
        return x

初始化

在初始化中，我们定义了两个卷积层，池化层以及用于下游分类任务的Linear层。在CNN层中，由于数据集的输入通道为1（通道可以理解为图像数据的不同层次，例如黑白图像通道为1，彩色图像通道为3，分别为Red, Blue, Green三原色），因此conv1的输入通道设置为1。

卷积层的输出通道out_channels作为超参数，是人为设定的，out_channels也是卷积核的数量，有多少卷积核就有多少通道数。多个卷积核是为了捕获图像数据的不同特征，例如A卷积核可以捕获眼部特征，B卷积核可以捕获嘴巴特征。

池化层设置的kernel_size为2，没有另外指定步长stride，深度学习框架的池化层stride都默认等于kernel_size，因此stride=2。

前向传播

初始的训练输入shape为torch.Size([64, 1, 28, 28])，分别表示torch.Size([batch, channel_num, height, width])，经过第一层卷积后，channel_num由1变为输出通道数32，而长和宽变为28-kernel_size+1=25。经历第一层卷积后shape为torch.Size([64, 32, 24, 24])。

再经历一层池化层，池化的kernel大小为2，步长为2，此时长和宽应当缩小至原来的一半，此时shape为torch.Size([64, 32, 12, 12])。

经历第二层卷积层和池化层的过程类似，得到的shape为torch.Size([64, 16, 4, 4])。

接着需要最后一层Linear进行分类任务，在两层卷积层和池化层之后shape为torch.Size([64, 16, 4, 4])，此时需要转换为两维以作为Linear层的输入，输入大小为channel_num*height*width = 16*4*4=256。

完整代码

import torch
import torchvision
from torchvision import datasets,transforms
from torch.utils.data import DataLoader
import torch.nn as nn
from torch.optim import Adam
import argparse
from tqdm import tqdm
from sklearn.metrics import accuracy_score
import torch.nn.functional as F

def download():
    # 将图片转化为张量以及归一化处理
    Trans = transforms.Compose(
        [torchvision.transforms.ToTensor(), torchvision.transforms.Normalize(mean=[0.5], std=[0.5])])

    # 下载MNIST对应的训练和测试数据集
    train_data = datasets.FashionMNIST(
        root="data",
        train=True,
        download=True,
        transform=Trans,
    )

    test_data = datasets.FashionMNIST(
        root="data",
        train=False,
        download=True,
        transform=Trans,
    )

    train_Dataloader = DataLoader(train_data,batch_size=64)
    test_Dataloader = DataLoader(test_data,batch_size=999999)

    return train_Dataloader, test_Dataloader, train_data, test_data


class Conv2d(nn.Module):
    def __init__(self):
        super(Conv2d, self).__init__()
        # 卷积层中没有特别指定步长stride，则默认为1
        self.conv1 = nn.Conv2d(in_channels=1, out_channels= 32, kernel_size=5)
        self.conv2 = nn.Conv2d(in_channels=32, out_channels=16, kernel_size=5)
        # 池化层中没有特别指定步长stride，则默认为kernel_size，此处为2
        self.pool = nn.MaxPool2d(kernel_size=2)
        self.classify = nn.Linear(256,10)

    def forward(self, x):
        # 初始形状为64*1*28*28，分别为batchsize, channel_num, height, width
        batch = x.size(0)
        x = self.conv1(x)
        # 此时形状为64*32*24*24，因为24 = 28-kernel_size+1 = 28-5+1
        x = F.relu(self.pool(x))
        # 此时形状为64*32*12*12
        x = F.relu(self.pool(self.conv2(x)))
        # 此时形状为64*16*4*4，因此Linear的输入维度为16*4*4 = 256
        # 将四维转为二维，方便进行全连接层的分类任务
        x = x.reshape(batch,-1)
        x = self.classify(x)
        return x


def train(net, train_dataloader, loss_function, optimizer):
    for x,y in tqdm(train_dataloader):
        x = x.to(device)
        y = y.to(device)
        pred = net(x)

        loss = loss_function(pred, y)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

def test(net, test_dataloader):
    for x,y in test_dataloader:
        with torch.no_grad():
            x = x.to(device)
            pred = net(x)
            pred = pred.argmax(dim=1).cpu().numpy()

            acc = accuracy_score(y, pred)
            print("acc:",acc)




if __name__ == "__main__":
    train_dataloader, test_dataloader, train_data, test_data = download()
    parser = argparse.ArgumentParser(description='conv', formatter_class=argparse.ArgumentDefaultsHelpFormatter)

    parser.add_argument('--lr', type = float, default=0.001)
    parser.add_argument('--epoch', type = int, default=10)

    args = parser.parse_args()

    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
    net = Conv2d().to(device)
    loss_function = nn.CrossEntropyLoss()
    optimizer = Adam(net.parameters(), lr = 0.001)

    for epoch in range(args.epoch):
        print("training epoch:{}".format(epoch))
        train(net, train_dataloader, loss_function, optimizer)
        test(net, test_dataloader)