机器学习的入门笔记(第十一周)

本周观看了李沐老师的《动手学深度学习》,李沐老师很注重代码的编写,但是对概念的讲解并不是特别多,所以还要去结合一些其他的资料去进行了解,下面是本周的所看的课程总结。

批量归一化

当神经网络比较深时,数据在下面,损失函数在上面,如果是反向传播时,数据是从上往下进行传递,梯度在上面比较大,在下面就比较小,从而上面收敛的很快,下面收敛的很慢,这样会导致底层靠近数据的内容,比如,局部边缘,纹理等信息变化的很慢,每一次底层发生变化,所有的层都要变化,这就导致了收敛变慢,于是引出了批量归一化(batch normalization)。

注:这里的变化是指不同batch的分布变化,并非参数变化。

批量归一化可以持续加速深层网络的收敛速度,其原理如下:在每次训练迭代中,我们首先规范化输入,即通过减去其均值并除以其标准差,其中两者均基于当前小批量处理。 接下来,我们应用比例系数和比例偏移。 正是由于这个基于批量统计的标准化,才有了批量规范化的名称。

如下图所示:

  • 其中B指小批量。
  • ε指方差的估计值有一个很小的常量,确保在做归一化时除数不会为0。
  • xi指向量,均值和方差也为向量。
  • 固定分布,梯度,输出符合某一分布,相对来说比较稳定。

  • 之后给定来自小批量的输入xi,批量归一化对里面的每一个样本减去均值除以标准差,在乘γ,加上β,最后得到输出xi+1。
  • 其中γ是指拉伸参数(scale),β是指偏移参数(shift),它们都是可以学习的参数。
  • 相当于做了一个正则化,由原始分布近似限定均值为0,方差为1的正态分布,用样本分布近似总体分布。

注:如果对批量大小为1的小批量进行批量归一化,将无法学到任何东西,因为减去均值后,每个隐藏层单元将为0。所以,只有使用足够大的小批量,批量规范化这种方法才是有效且稳定的。

  • 我们知道,批量归一化层可以学习的参数是γ和β。
  • 批量归一化层可以作用在全连接或者卷积层的输出上,在激活函数之前;一般在经过全连接层或者卷积层的输出后,会经过激活函数变为非线性,在这里是进行批量归一化,批量归一化属于线性变化,作用是变化不那么剧烈,之后在进行激活函数进行非线性变化。
  • 批量归一化也可以作用在全连接层或者是卷积层输入上,所以输入做线性变化,使得输入的均值和方差较好。
  • 批量归一化对于全连接层来说,作用在特征维,特征维也就是列,行数是我们的样本数,对于每一个特征计算标量的均值和方差,对于每一个全连接层的输入和输出都进行批量归一化,再利用γ和β进行校验。
  • 批量归一化对于卷积层,作用在通道维,当卷积有多个输出通道时,需要对这些通道的“每个”输出执行批量归一化,每个通道都有自己的拉伸参数和偏移参数,并且都是标量 ;对于1*1的卷积来说,它等价于一个全连接层,卷积的通道也就是特征,将通道维当成是全连接层的特征维;假设小批量包含m个样本,并且对于每个通道,卷积的输出具有高度p和宽度q,那么对于卷积层,在每个输出通道的m * p * q个元素上同时执行每个批量归一化。所以在计算均值和方差的时候会收集所有空间位置的值,然后在给定通道内应用相同的均值和方差,以便在每个空间位置对值进行归一化。

  • 最初的论文提出批量归一化是为了减少内部协变量转移,也就是变量值的分布在训练过程中会发生变化。
  • 后续论文发现它并没有减少内部协变量的转移,并且指出它可能是通过在每个小批量中加入噪音来控制模型的复杂度,μB^为随机偏移 和 σB^为随机缩放,从而学习到稳定的均值和方差。
  • 并且批量归一化也可以说是控制模型复杂度的方法,所以没有必要跟丢弃法(drop out)混合使用。

总结:

需要注意的是,使用批量归一化后,学习率可以调的大一些,用更大的学习率可以提高训练速度,不会出现学习率过大,上层梯度爆炸,下层梯度消失,将每一层的输入变成差不多的分布后,可以使用大一些的学习率,一般不改变模型精度。

批量归一化的代码实现

从零实现

1、定义batch_norm函数,计算局部的均值和方差,更新全局均值和方差,最后输出Y和全局均值,方差

import torch
from torch import nn
from d2l import torch as d2l

# moving_mean,var全局(测试)均值,方差
def batch_norm(X,gamma,beta,moving_mean,moving_var,eps,momentum):
    # 判断为训练模式还是预测模式
    if not torch.is_grad_enabled():
        # 预测,inference
        X_hat = (X-moving_mean)/torch.sqrt(moving_var+eps)
    else:
        assert len(X.shape) in (2,4)
        if len(X.shape) == 2:
            # 全连接层 计算特征维上的均值和方差
            mean = X.mean(dim=0) # 按行求均值,行变列不变
            var = ((X-mean)**2).mean(dim=0) # 按行求均值
        else:
            # 二维卷积,计算通道维(axis=1)的均值和方差
            mean = X.mean(dim=(0,2,3),keepdim=True) # (1,n,1,1)的矩阵
            var = ((X-mean)**2).mean(dim=(0,2,3),keepdim=True)
        # 训练模式下,用当前均值和方差做标准化
        X_hat = (X-mean) / torch.sqrt(var+eps)
        # 更新移动平均的均值和方差
        moving_mean = momentum*moving_mean+(1.0-momentum)*mean
        moving_var = momentum*moving_var+(1.0-momentum)*var
    Y = gamma * X_hat + beta  # 缩放和移位
    return Y,moving_mean.data,moving_var.data
        

2、创建一个BatchNorm层,首先计算出形状的大小,固定输出维,之后进行参数的初始化,最后调用函数,返回输出Y

class BatchNorm(nn.Module):
    # num_features 全连接层的输出数量或卷积层输出通道数目
    # num_dims 2表示全连接,4表示卷积层
    def __init__(self, num_features,num_dims):
        super(BatchNorm,self).__init__()
        if num_dims == 2:
            shape = (1,num_features)
        else:
            shape = (1,num_features,1,1)
        # 参与求梯度和迭代的拉伸和偏移参数,分布初始化1和0
        self.gamma = nn.Parameter(torch.ones(shape))
        self.beta = nn.Parameter(torch.zeros(shape))
        # 非模型参数的变量初始化 0和1 并不进行梯度更新
        self.moving_mean = torch.zeros(shape)
        self.moving_var = torch.ones(shape)
    
    def forward(self,X):
        # X指模型的输入数据
        # 如果X不在内存,将moving_mean和moving_var复制到X所在显存
        if self.moving_mean.device != X.device:
            self.moving_mean = self.moving_mean.to(X.device)
            self.moving_var = self.moving_var.to(X.device)
        # 保存更新过的moving_mean和moving_var
        Y,self.moving_mean,self.moving_var = batch_norm(
            X,self.gamma,self.beta,self.moving_mean,self.moving_var,
            eps = 1e-5,momentum=0.9
        )
        return Y

3、应用BatchNorm于LeNet模型

net = nn.Sequential(
    nn.Conv2d(1,6,kernel_size=5),BatchNorm(6,num_dims=4),nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2,stride=2),
    nn.Conv2d(6,16,kernel_size=5),BatchNorm(16,num_dims=4),nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2,stride=2),nn.Flatten(),
    nn.Linear(16*4*4,120),BatchNorm(120,num_dims=2),nn.Sigmoid(),
    nn.Linear(120,84),BatchNorm(84,num_dims=2),nn.Sigmoid(),
    nn.Linear(84,10)
)

4、进行训练,参数的更新

lr,num_epochs,batch_size = 1.0,10,256
train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size)
d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,'mps')

简洁实现

简洁实现就是调包

net = nn.Sequential(
    nn.Conv2d(1,6,kernel_size=5),nn.BatchNorm2d(6),nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2,stride=2),
    nn.Conv2d(6,16,kernel_size=5),nn.BatchNorm2d(16),nn.Sigmoid(),
    nn.AvgPool2d(kernel_size=2,stride=2),nn.Flatten(),
    nn.Linear(256,120),nn.BatchNorm1d(120),nn.Sigmoid(),
    nn.Linear(120,84),nn.BatchNorm1d(84),nn.Sigmoid(),
    nn.Linear(84,10)
)

d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,'mps')

我们很容易看出,加入批量归一化后,模型的收敛速度变得更快了,精度也上升了。

残差网络ResNet

随着神经网络的不断加深,不一定会带来好处,如下图所示,五角星代表最优值,F代表函数,闭合区域的面积代表函数的复杂度,也可以说是神经网络的层数,但是随着函数的复杂度逐渐增加,虽然区域变大了,但是离最优值确越来越远,这属于非嵌套函数。

而如果每增加函数复杂度后覆盖的区域包含原来函数的所在区域时,更复杂的模型包含之前的模型,才能接近最优,属于嵌套函数,这引出了残差网络ResNet。

残差网络ResNet的核心思想是每个附加层都应该更容易地包含原始函数作为其元素之一。

  • 让我们聚集神经网络内部,假设我们的原始输入为x,希望学到的理想映射为f(x),在传统的增加模型深度采用层层堆叠的方法,需要去直接拟合出映射f(x)。
  • 而ResNet的思想是在堆叠层数的同时不会增加模型复杂度,并不需要去直接拟合出映射f(x),而是只需要拟合出残差映射f(x)-x,残差映射在现实更容易优化。

  • ResNet沿用了VGG完整的3*3卷积层设计,残差块里首先有2个有相同输出通道数的3*3卷积层。 每个卷积层后接一个批量规范化层和ReLU激活函数。 然后我们通过跨层数据通路,跳过这2个卷积运算,将输入直接加在最后的ReLU激活函数前。 这样的设计要求2个卷积层的输出与输入形状一样,从而使它们可以相加。
  • 如果想改变通道数,就需要引入一个额外的1*1卷积层来将输入变换成需要的形状后再做相加运算。

当然,我们也可以去试用各种不同的组合,将输入与其他的层数相加。

而ResNet的架构有两种:

  • 第一种是高宽减半的ResNet块,第一个卷积层的步幅设为2,使得高宽减半,通道数翻倍。
  • 第二种是后接多个高宽不变的ResNet块,重复多次,使所有卷积层的步幅等于1。

如下图为ResNet-18架构,也分为了5个stage,使用了4个由残差块组成的模块,每个模块使用若干个同样输出通道的残差块,每个模块都有4个卷积层,再加上第一个7*7的卷积层和最后的全连接层,共有18层,因此称为ResNet-18。

总结:

注:残差块属于跳转连接,所以可以使得很深的网络更加容易训练,输入可以通过层间的残余连接更快的向前传播。

ResNet的代码实现

1、实现ResNet残差块,默认不改变形状,把stride设为2,高宽减半,并使用1*1的卷积核,使得输入X的通道数和形状一致,最后进行相加,再通过激活函数得到输出Y。

import torch
from torch import nn
from torch.nn import functional as F
from d2l import torch as d2l

class Residual(nn.Module):
    def __init__(self,input_channels,num_channels,
                 use_1x1conv=False,strides=1 ):
        super(Residual,self).__init__()
        self.conv1 = nn.Conv2d(input_channels,num_channels,
                               kernel_size=3,padding=1,stride=strides)
        self.conv2 = nn.Conv2d(num_channels,num_channels,
                               kernel_size=3,padding=1)
        if use_1x1conv:
            self.conv3 = nn.Conv2d(input_channels,num_channels,
                                   kernel_size=1,stride=strides)
        else:
            self.conv3 = None
        self.bn1 = nn.BatchNorm2d(num_channels)
        self.bn2 = nn.BatchNorm2d(num_channels)

    def forward(self,X):
        Y = F.relu(self.bn1(self.conv1(X)))
        Y = self.bn2(self.conv2(Y))
        if self.conv3:
            X = self.conv3(X)
        Y += X
        return F.relu(Y)

2、检查输入和输出形状一致

blk = Residual(input_channels=3,num_channels=3)
X = torch.rand(size=(4,3,6,6))
Y = blk(X)
Y.shape

'''
torch.Size([4, 3, 6, 6])

'''

3、也可以变换通道数目,并高宽减半

blk = Residual(input_channels=3,num_channels=6,use_1x1conv=True,strides=2)
blk(X).shape

'''
torch.Size([4, 6, 3, 3])
'''

4、ResNet的前两层跟之前介绍的GoogLeNet中的一样: 在输出通道数为64、步幅为2的7*7卷积层后,接步幅为2的3*3的最大汇聚层。 不同之处在于ResNet每个卷积层后增加了批量规范化层。

b1 = nn.Sequential(nn.Conv2d(1,64,kernel_size=7,stride=2,padding=3),
                   nn.BatchNorm2d(64),nn.ReLU(),
                   nn.MaxPool2d(kernel_size=3,stride=2,padding=1))

5、定义ResNet块,第一个模块的通道数同输入通道数一致, 由于之前已经使用了步幅为2的最大汇聚层,所以无须减小高和宽。,之后的每个模块在第一个残差块里将上一个模块的通道数翻倍,并将高和宽减半。

def resnet_block(input_channels,num_channels,num_residuals,
                 first_block=False):
    blk=[]
    for i in range(num_residuals):
        if i==0 and not first_block:
            blk.append(Residual(input_channels,num_channels,
                                use_1x1conv=True,strides=2))
        else:
            blk.append(Residual(num_channels,num_channels))
    
    return blk

6、构建网络

b2 = nn.Sequential(*resnet_block(64,64,2,first_block=True))
b3 = nn.Sequential(*resnet_block(64,128,2))
b4 = nn.Sequential(*resnet_block(128,256,2))
b5 = nn.Sequential(*resnet_block(256,512,2))

net = nn.Sequential(b1,b2,b3,b4,b5,
                    nn.AdaptiveAvgPool2d((1,1)),
                    nn.Flatten(),
                    nn.Linear(512,10))

7、查看输出形状

X = torch.rand(size=(1,1,224,224))
for layer in net:
    X = layer(X)
    print(layer.__class__.__name__,'output shape:\t',X.shape)
    
'''
Sequential output shape:     torch.Size([1, 64, 56, 56])
Sequential output shape:     torch.Size([1, 64, 56, 56])
Sequential output shape:     torch.Size([1, 128, 28, 28])
Sequential output shape:     torch.Size([1, 256, 14, 14])
Sequential output shape:     torch.Size([1, 512, 7, 7])
AdaptiveAvgPool2d output shape:     torch.Size([1, 512, 1, 1])
Flatten output shape:     torch.Size([1, 512])
Linear output shape:     torch.Size([1, 10])
'''

8、训练

lr,num_epochs,batch_size = 0.05,10,256
train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size,resize=96)
d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,d2l.try_gpu())

可以看到准确度提高了很多。

Kaggle竞赛:Classify-Leaves(叶子分类)

1、导入相关的模块

from torch import nn
import torch.utils.data as Data
from torchvision import transforms
import torchvision
from PIL import Image
import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.preprocessing import LabelEncoder, OneHotEncoder
import torch
from d2l import torch as d2l
from sklearn.model_selection import train_test_split, KFold
from torch.optim.lr_scheduler import CosineAnnealingLR
from tqdm import tqdm
import ttach as tta

2、定义创建树叶数据集的类

class LeavesSet(Data.Dataset):
    """
    construct the dataset
    """

    def __init__(self, images_path, images_label, transform=None, train=True):
        self.imgs = [os.path.join('./classify-leaves/', ''.join(image_path)) for image_path in images_path]
        if train:
            self.train = True
            self.labels = images_label
        else:
            self.train = False

        self.transform = transform

    def __getitem__(self, index):
        image_path = self.imgs[index]
        pil_img = Image.open(image_path)
        if self.transform:
            transform = self.transform
        else:
            transform = transforms.Compose([
                transforms.Resize((224, 224)),
                transforms.ToTensor(),
            ])
        data = transform(pil_img)
        if self.train:
            image_label = self.labels[index]
            return data, image_label
        else:
            return data

    def __len__(self):
        return len(self.imgs)

3、初始化数据,并将label变为数字索引

def load_data_leaves(train_transform=None, test_transform=None):
    """
    load initial data to dataloader ,and encode the label
    """
    train_data = pd.read_csv('./classify-leaves/train.csv')
    test_data = pd.read_csv('./classify-leaves/test.csv')

    labelencoder = LabelEncoder()
    labelencoder.fit(train_data['label'])
    train_data['label'] = labelencoder.transform(train_data['label'])
    label_map = dict(zip(labelencoder.classes_, labelencoder.transform(labelencoder.classes_)))
    label_inv_map = {v: k for k, v in label_map.items()}

    train_dataSet = LeavesSet(train_data['image'], train_data['label'], transform=train_transform, train=True)
    test_dataSet = LeavesSet(test_data['image'], images_label=0, transform=test_transform, train=False)

    return (
        train_dataSet,
        test_dataSet,
        label_map,
        label_inv_map
    )

4、数据增强,进行随机变换

train_transform = transforms.Compose([
    # 随机裁剪图像,图像为原始面积的0.08到1之间,高宽在3/4和4/3之间
    transforms.RandomResizedCrop(128, scale=(0.8, 1.0), ratio=(3.0 / 4.0, 4.0 / 3.0)),
    # 随机水平翻转
    transforms.RandomHorizontalFlip(),
    # 随机更改亮度,对比度和饱和度
    transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4),
    # 随机添加噪声
    transforms.ToTensor(),
    # 标准化图像的每个通道
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

test_transform = transforms.Compose([
    # 输入图像的短边调整为 256 像素
    transforms.Resize(256),
    # 中心裁剪出一个 128x128 的区域
    transforms.CenterCrop(128),
    transforms.ToTensor(),
    transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
])

5、创建测试数据集、训练数据集、标签字典

train_dataset, test_dataset, label_map, label_inv_map = load_data_leaves(train_transform, test_transform)

其中label_map的字典的结构是标签的种类名是key,标签数字为value

{'abies_concolor': 0,
 'abies_nordmanniana': 1,
 'acer_campestre': 2,
 'acer_ginnala': 3,
 'acer_griseum': 4,
 'acer_negundo': 5,
 'acer_palmatum': 6,
 'acer_pensylvanicum': 7,
 'acer_platanoides': 8,
 'acer_pseudoplatanus': 9,
 'acer_rubrum': 10,
 'acer_saccharinum': 11,
 'acer_saccharum': 12,
 'aesculus_flava': 13,
 'aesculus_glabra': 14,
 'aesculus_hippocastamon': 15,
 'aesculus_pavi': 16,
 'ailanthus_altissima': 17,
 'albizia_julibrissin': 18,
 'amelanchier_arborea': 19,
 'amelanchier_canadensis': 20,
 'amelanchier_laevis': 21,
 'asimina_triloba': 22,
 'betula_alleghaniensis': 23,
 'betula_jacqemontii': 24,
...
 'ulmus_parvifolia': 171,
 'ulmus_procera': 172,
 'ulmus_pumila': 173,
 'ulmus_rubra': 174,
 'zelkova_serrata': 175}

label_inv_map与label_map正好相反,key为标签数字,value为标签的种类名

{0: 'abies_concolor',
 1: 'abies_nordmanniana',
 2: 'acer_campestre',
 3: 'acer_ginnala',
 4: 'acer_griseum',
 5: 'acer_negundo',
 6: 'acer_palmatum',
 7: 'acer_pensylvanicum',
 8: 'acer_platanoides',
 9: 'acer_pseudoplatanus',
 10: 'acer_rubrum',
 11: 'acer_saccharinum',
 12: 'acer_saccharum',
 13: 'aesculus_flava',
 14: 'aesculus_glabra',
 15: 'aesculus_hippocastamon',
 16: 'aesculus_pavi',
 17: 'ailanthus_altissima',
 18: 'albizia_julibrissin',
 19: 'amelanchier_arborea',
 20: 'amelanchier_canadensis',
 21: 'amelanchier_laevis',
 22: 'asimina_triloba',
 23: 'betula_alleghaniensis',
 24: 'betula_jacqemontii',
...
 171: 'ulmus_parvifolia',
 172: 'ulmus_procera',
 173: 'ulmus_pumila',
 174: 'ulmus_rubra',
 175: 'zelkova_serrata'}

6、创建函数,是否要冻住模型的前面的一些层

def set_parameter_requires_grad(model, feature_extracting):
    """
    是否要冻住模型的前面的一些层
    """
    if feature_extracting:
        model = model
        for param in model.parameters():
            param.requires_grad = False

7、ResNet模型

def resnet_model(num_classes,feature_exact=False):
    """
    ResNet model
    """
    model_ft = torchvision.models.resnet50(pretrained=True)
    set_parameter_requires_grad(model_ft, feature_exact)
    num_ftrs = model_ft.fc.in_features
    model_ft.fc = nn.Sequential(nn.Linear(num_ftrs, num_classes))
    return model_ft

8、定义超参数

k_folds = 5
num_epochs = 20
learning_rate = 1e-4
weight_decay = 1e-3
loss_function = nn.CrossEntropyLoss()
# 结果
results = {}
# 随机种子
torch.manual_seed(1)
# device
device = d2l.try_gpu()
# Define the k-fold Cross Validator
kfold = KFold(n_splits=k_folds, shuffle=True)

9、进行模型的训练,进行梯度的更新,利用5折交叉验证,将训练集分为训练集和测试集,并计算训练和验证集合的准确率和损失

for fold,(train_ids,valid_ids) in enumerate(kfold.split(train_dataset)):
    print(f'Fold {fold}')
    print('-' * 20)
    # Sample elements randomly from a given list of ids,no replacement
    train_subsampler = torch.utils.data.SubsetRandomSampler(train_ids) # 训练过程中每次按照指定的索引从整个数据集中随机抽取样本。
    valid_subsampler = torch.utils.data.SubsetRandomSampler(valid_ids)
    # Define data loaders for training and testing data is this fold
    trainloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, sampler=train_subsampler,num_workers=4)
    validloader = torch.utils.data.DataLoader(train_dataset, batch_size=64, sampler=valid_subsampler,num_workers=4)
    # Initialize a model and put it on the device specified
    model = resnet_model(176)
    model = model.to(device)
    model.device = device
    # Initialize optimizer
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate, weight_decay=weight_decay)
    # According to cos Function change lr in training
    scheduler = CosineAnnealingLR(optimizer,T_max=10) # 根据余弦函数的形状在训练过程中调整学习率

    # Run the training loop for defined number of epochs
    for epoch in range(num_epochs):
        model.train()
        # print epoch
        print(f'Searching epoch {epoch+1}')
        # These are used to record information in training
        train_losses = []
        train_accs = []
        # Iterate the training set by batches and show Progress bar
        for batch in tqdm(trainloader):
            # Move images and labels to GPU
            imgs, labels = batch
            imgs = imgs.to(device)
            labels = labels.to(device)
            # Predict value / Forward the data
            logits = model(imgs)
            # Calculate loss
            loss = loss_function(logits, labels)
            # Clear gradients for parameters
            optimizer.zero_grad()
            # Compute gradients for parameters
            loss.backward()
            # Update the parameters with computed gradients
            optimizer.step()
            # Compute the accuracy for current batch/dim=-1: last dimension
            acc = (logits.argmax(dim=-1)==labels).float().mean()
            # Record the loss and accuracy
            train_losses.append(loss.cpu().detach().numpy())
            train_accs.append(acc.cpu().detach().numpy())
        # Update change lr condition
        scheduler.step()
        # The average loss and accuracy of the training set is the average of the recorded values
        train_loss = (np.sum(train_losses)/len(train_losses))
        train_acc = (np.sum(train_accs)/len(train_accs))
        # Print the information  current epoch/aggregate epoch  average loss average acc
        print(f'[Train | {epoch+1:03d}/{num_epochs:03d}] loss={train_loss:.4f} acc={train_acc:.4f}')
        # Train process(all epochs) is complete
        # print('Training process has finished.Saving trained model')
        print('Starting validation')

        # Start Validation
        model.eval()
        # These are used to record information in training
        valid_losses = []
        valid_accs = []
        with torch.no_grad():
            for batch in tqdm(validloader):
                imgs, labels = batch
                imgs = imgs.to(device)
                labels = labels.to(device)
                # No gradient in validation
                logits = model(imgs)
                loss = loss_function(logits, labels)
                acc = (logits.argmax(dim=-1)==labels).float().mean()
                # Record loss and accuracy
                valid_losses.append(loss.cpu().detach().numpy())
                valid_accs.append(acc.cpu().detach().numpy())
            # The average loss and accuracy
            valid_loss = np.sum(valid_losses)/len(valid_losses)
            valid_acc = np.sum(valid_accs)/len(valid_accs)
        print(f'[Valid | {epoch + 1:03d}/{num_epochs:03d}] loss={valid_loss:.4f} acc={valid_acc:.4f}')
        print('-'*10)
    print(f'Accuracy for fold{fold}:{valid_acc}')
    results[fold] = valid_acc
    # Saving the model
    print(f'saving model with loss{train_loss:.3f}')
    save_path = f'./model/resnet-fold-{fold}.pth'
    torch.save(model.state_dict(), save_path)
    print('-'*20)

10、计算平均验证精度

print(f'K-FOLD CROSS VALIDATION RESULTS FOR {k_folds} FOLDS')
print('-'*10)
total_summation = 0.0
for key,value in results.items():
    print(f'FOLD {key}:{value}')
    total_summation += value

print(f'Average:{total_summation/len(results.items())}')

'''
K-FOLD CROSS VALIDATION RESULTS FOR 5 FOLDS
----------
FOLD 0:0.9174123961350014
FOLD 1:0.9178925875959725
FOLD 2:0.9110054147654566
FOLD 3:0.9100460184031519
FOLD 4:0.9173442577493602
Average:0.9147401349297886
'''

11、模型预测,定义测试加载器,将每轮的模型进行对测试集合的预测,保存为5个csv文件

testLoader = torch.utils.data.DataLoader(test_dataset, batch_size=64, num_workers=4)
# create model and load weights from checkpoint
model = resnet_model(176)
model = model.to(device)
# load the all folds
for test_fold in range(k_folds):
    model_path = f'./model/resnet-fold-{test_fold}.pth'
    saveFilename = f'./submission/submission-{test_fold}.csv'
    # Load model
    model.load_state_dict(torch.load(model_path))
    # Make sure the model is in eval mode
    model.eval()
    # Data Increase to increase robust and accuracy
    tta_model = tta.ClassificationTTAWrapper(model,tta.aliases.five_crop_transform(200,200))
    # Initialize a list to store the predictions
    predictions = []
    # Iterate the testing set by batches
    for batch in tqdm(testLoader):
        imgs = batch
        with torch.no_grad():
            logits = tta_model(imgs.to(device))
            # extend can add predictions in iterate data
            # Take the class with greatest logit as prediction and record it
            predictions.extend(logits.argmax(dim=-1).cpu().numpy().tolist())
    preds = []
    for i in predictions:
        preds.append(label_inv_map[i])

    test_data = pd.read_csv('./classify-leaves/test.csv')
    test_data['label'] = pd.Series(preds)
    submission = pd.concat([test_data['image'], test_data['label']], axis=1)
    submission.to_csv(saveFilename, index=False)
    print('ResNet Model Results Done !!!!')

12、加载每个图片的5个模型的预测编号

df0 = pd.read_csv('./submission/submission-0.csv')
df1 = pd.read_csv('./submission/submission-1.csv')
df2 = pd.read_csv('./submission/submission-2.csv')
df3 = pd.read_csv('./submission/submission-3.csv')
df4 = pd.read_csv('./submission/submission-4.csv')
# Convert the result to a number
list_num_label0,list_num_label1,list_num_label2,list_num_label3,list_num_label4 = [],[],[],[],[]
for i in range(len(df0)):
    list_num_label0.append(label_map[df0['label'][i]])
    list_num_label1.append(label_map[df1['label'][i]])
    list_num_label2.append(label_map[df2['label'][i]])
    list_num_label3.append(label_map[df3['label'][i]])
    list_num_label4.append(label_map[df4['label'][i]])
    
# Concat all the data
df_all = df0.copy()
df_all.drop(['label'], axis=1, inplace=True)
df_all['num_label0'] = list_num_label0
df_all['num_label1'] = list_num_label1
df_all['num_label2'] = list_num_label2
df_all['num_label3'] = list_num_label3
df_all['num_label4'] = list_num_label4
df_all.head()

13、将预测结果转置输出,得到了每一个模型对每个图片的预测,目的是为了计算每一列的众数,投票得到最有可能的值

df_all_transpose = df_all.copy().drop(['image'],axis=1).transpose()
df_all_transpose.head()

14、进行投票,计算每一列的众数(每一列最多的那个数字),再进行转置,其中第一列的数字是每个图片的最有可能的编号

df_mode = df_all_transpose.mode().transpose()
df_mode.head()

15、将每个编号的真实标签取出来,加在之前的表格上

voting_class = []
for each in df_mode[0]:
    voting_class.append(label_inv_map[each])
df_all['label'] = voting_class
df_all.head()

16、最后,保存图片路径这一列和label标签这一列,为submission.csv

# save the best result as csv
# choose columns image and lable as the result
df_submission = df_all[['image','label']].copy()
# save the the result file
df_submission.to_csv('/kaggle/working/submission.csv', index=False)
print('Voting results of resnest successfully saved!')

多GPU训练

此操作在kaggle下进行,选用GPU T4*2,进行多GPU训练

1、实现一个简单的ResNet18网络

import torch
from torch import nn
from d2l import torch as d2l

def resnet18(num_classes,in_channels=1):
    def resnet_block(in_channels,out_channels,
                     num_residuals,first_block=False):
        blk = []
        for i in range(num_residuals):
            if i==0 and not first_block:
                blk.append(
                    d2l.Residual(in_channels,out_channels,use_1x1conv=True,strides=2)
                )
            else:
                blk.append(d2l.Residual(out_channels,out_channels))
        
        return nn.Sequential(*blk)
    net = nn.Sequential(
        nn.Conv2d(in_channels,64,kernel_size=3,stride=1,padding=1),
        nn.BatchNorm2d(64),
        nn.ReLU()
    )
    net.add_module('resnet_block1',resnet_block(64,64,2,first_block=True))
    net.add_module('resnet_block2',resnet_block(64,128,2))
    net.add_module('resnet_block3',resnet_block(128,256,2))
    net.add_module('resnet_block4',resnet_block(256,512,2))
    net.add_module('global_avg_pool',nn.AdaptiveAvgPool2d((1,1)))
    net.add_module('fc',nn.Sequential(nn.Flatten(),nn.Linear(512,num_classes)))

    return net

2、网络的初始化,获取GPU的列表

net = resnet18(10)
devices = d2l.try_all_gpus()

net
'''
Sequential( (0): Conv2d(1, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
  (1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
  (2): ReLU()
  (resnet_block1): Sequential(
    (0): Residual(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
    (1): Residual(
      (conv1): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv2): Conv2d(64, 64, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (bn1): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (bn2): BatchNorm2d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
      (relu): ReLU(inplace=True)
    )
  )
  (resnet_block2): Sequential(
    (0): Residual(
      (conv1): Conv2d(64, 128, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
      (conv2): Conv2d(128, 128, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (conv3): Conv2d(64, 128, kernel_size=(1, 1), stride=(2, 2))
...  (fc): Sequential(
    (0): Flatten(start_dim=1, end_dim=-1)
    (1): Linear(in_features=512, out_features=10, bias=True)
  )
)
'''

devices
'''
[device(type='cuda', index=0), device(type='cuda', index=1)]
'''

3、训练,选择不同的GPU进行训练

def train(net,num_gpus,batch_size,lr):
    train_iter,test_iter = d2l.load_data_fashion_mnist(batch_size)
    devices = [d2l.try_gpu(i) for i in range(num_gpus)]
    def init_weights(m):
        if type(m) in [nn.Linear,nn.Conv2d]:
            nn.init.normal_(m.weight,std=0.01)
    net.apply(init_weights)
    # 多个GPU设置模型
    net = nn.DataParallel(net,device_ids=devices)
    trainer = torch.optim.SGD(net.parameters(),lr)
    loss = nn.CrossEntropyLoss()
    timer,num_epochs = d2l.Timer(),10
    animator = d2l.Animator('epoch', 'test acc', xlim=[1, num_epochs])
    for epoch in range(num_epochs):
        net.train()
        timer.start()
        for X,y in train_iter:
            trainer.zero_grad()
            X,y = X.to(devices[0]),y.to(devices[0])
            l = loss(net(X),y)
            l.backward()
            trainer.step()
        timer.stop()
        animator.add(epoch + 1, (d2l.evaluate_accuracy_gpu(net, test_iter)))
    print(f'测试精度:{animator.Y[0][-1]:.2f},{timer.avg():.1f}秒/轮,'
          f'在{str(devices)}')

4、单个GPU训练

train(net, num_gpus=1, batch_size=256, lr=0.1)

5、2个GPU训练

train(net, num_gpus=2, batch_size=256, lr=0.1)

可以看到,测试精度有所提高,并且每轮训练的速度得到显著提升。

个人总结

本周主要学习了批量归一化和现在非常流行和广泛使用的卷积神经网络架构ResNet,下周将继续学习深度学习的其他知识点,并且阅读相应的文献,理论与实践相结合。

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值