FashionMNIST分类
首先确认了数据集的图片shape均为:1x28x28,大多数数据集的图片尺寸是不一的,但FashionMNIST是固定的,这省下了很多清洗的工作;然后选择一个模型作为baseline,根据数据集的这样一个规模以及分辨率,肯定不适合用大模型,层数打算控制在20以内,所以我打算尝试一下resnet18作为baseline,然后使用图像增广和批量归一化来改善模型性能。
1.导入包和模块
import os
import sys
import time
import torch
from torch import nn, optim
import torch.nn.functional as F
import torchvision
from torchvision import transforms
# os.environ["CUDA_VISIBLE_DEVICES"] = "7" # TODO:如果有GPU的话使用
2.定义模型函数
定义全局平均池化层:
class GlobalAvgPool2d(nn.Module):
"""
全局平均池化层
可通过将普通的平均池化的窗口形状设置成输入的高和宽实现
"""
def __init__(self):
super(GlobalAvgPool2d, self).__init__()
def forward(self, x):
return F.avg_pool2d(x, kernel_size=x.size()[2:])
class FlattenLayer(torch.nn.Module):
def __init__(self):
super(FlattenLayer, self).__init__()
def forward(self, x): # x shape: (batch, *, *, ...)
return x.view(x.shape[0], -1)
定义残差block(使用BN归一化):
class Residual(nn.Module):
def __init__(self, in_channels, out_channels, use_1x1conv=False, stride=1):
"""
use_1×1conv: 是否使用额外的1x1卷积层来修改通道数
stride: 卷积层的步幅, resnet使用步长为2的卷积来替代pooling的作用,是个很赞的idea
"""
super(Residual, self).__init__()
self.conv1 = nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1, stride=stride)
self.conv2 = nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1)
if use_1x1conv:
self.conv3 = nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride)
else:
self.conv3 = None
self.bn1 = nn.BatchNorm2d(out_channels)
self.bn2 = nn.BatchNorm2d(out_channels)
def forward(self, X):
Y = F.relu(self.bn1(self.conv1(X)))
Y = self.bn2(self.conv2(Y))
if self.conv3:
X = self.conv3(X)
return F.relu