Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

文章索引

代码:PRIS-CV/PMG-Progressive-Multi-Granularity-Training
论文:Fine-Grained Visual Classification via Progressive Multi-Granularity Training of Jigsaw Patches

说明

Fine-Grained Visual Classification viaProgressive Multi-Granularity Training ofJigsaw Patches代码解读,本文只针对PMG中重点代码记录分析,算是对文章更深层的理解,具体模型可以参考原论文。

正文

在这里插入图片描述
在这里插入图片描述

首先,作者先把图片进行切割,切割的尺寸大小是渐渐增大的,对应渐进式训练,在论文中阐述的是用不同的step训练,代码如下:

#train.py
# 数据处理代码省略....
# 从训练开始
    for epoch in range(start_epoch, nb_epoch):
        print('\nEpoch: %d' % epoch)
        net.train()
        # 这里设置了5个loss,分别对应1-3个step的loss和一个concat的losss,最后train_losss表示总loss
        train_loss = 0
        train_loss1 = 0
        train_loss2 = 0
        train_loss3 = 0
        train_loss4 = 0
        correct = 0
        total = 0
        idx = 0
        for batch_idx, (inputs, targets) in enumerate(trainloader):
            idx = batch_idx
            if inputs.shape[0] < batch_size:
                continue
            if use_cuda:
                inputs, targets = inputs.to(device), targets.to(device)
            inputs, targets = Variable(inputs), Variable(targets)

            # update learning rate
            for nlr in range(len(optimizer.param_groups)):
                optimizer.param_groups[nlr]['lr'] = cosine_anneal_schedule(epoch, nb_epoch, lr[nlr])

            # Step 1
            optimizer.zero_grad()
            inputs1 = jigsaw_generator(inputs, 8) #这里其实就是论文中说的拼图生成器,输入一张原图,根据第二个参数进行不同程度的切割再拼图
            output_1, _, _, _ = netp(inputs1) # 调用网络,具体为什么是四个输出,这四个输出有什么作用后面解释
            loss1 = CELoss(output_1, targets) * 1
            loss1.backward()
            optimizer.step()

            # Step 2
            optimizer.zero_grad()
            inputs2 = jigsaw_generator(inputs, 4)
            _, output_2, _, _ = netp(inputs2)
            loss2 = CELoss(output_2, targets) * 1
            loss2.backward()
            optimizer.step()

            # Step 3
            optimizer.zero_grad()
            inputs3 = jigsaw_generator(inputs, 2)
            _, _, output_3, _ = netp(inputs3)
            loss3 = CELoss(output_3, targets) * 1
            loss3.backward()
            optimizer.step()

            # Step 4
            optimizer.zero_grad()
            _, _, _, output_concat = netp(inputs)
            concat_loss = CELoss(output_concat, targets) * 2
            concat_loss.backward()
            optimizer.step()

            #  training log
            _, predicted = torch.max(output_concat.data, 1)
            total += targets.size(0)
            correct += predicted.eq(targets.data).cpu().sum()

            train_loss += (loss1.item() + loss2.item() + loss3.item() + concat_loss.item())
            train_loss1 += loss1.item()
            train_loss2 += loss2.item()
            train_loss3 += loss3.item()
            train_loss4 += concat_loss.item()
        # 下面打印输出省略

很明显,在train.py中,作者通过jigswa_generator()生成不同尺度的拼图(inputs1,inputs2,inputs3,inputs),再分别把这他们输入到网络netp,得到不同尺度的输出,再分别做了一个loss,最后把所有loss加起来做一个loss,这就是这段代码的具体含义,至于这几个inputs输入到网络中发生了什么,就要看看PMG网络模型了:

首先加载模型,可以从下面的代码开出,使用的基础模型是resnet50。这里resnet50会输出不同层的tensor,分别是第一到四层的tensor,分别表示了不同层的特征。再通过调用PMG将net模型传入。

def load_model(model_name, pretrain=True, require_grad=True):
    print('==> Building model..')
    if model_name == 'resnet50_pmg':
        net = resnet50(pretrained=pretrain)
        for param in net.parameters():
            param.requires_grad = require_grad
        net = PMG(net, 512, 200)

    return net

首先PMG定义了很多block,直接看forwoard调用过程,先调用model也就是传进来的resnet模型,得到5个不同层的tensor输出,可以发现作者只去了后三层的输出,即xf3, xf4, xf5,丢弃xf1,xf2,具体为什么丢弃文章中没有做解释,猜测是前面层噪声大,加与不加作用不大。然后把这三个tensor进行卷积,得到xl1,xl2,xl3,最后对这三个tensor分别卷积分类。x_concat 就是把这几个tensor结合起来。

class PMG(nn.Module):
    def __init__(self, model, feature_size, classes_num):
        super(PMG, self).__init__()

        self.features = model
        self.max1 = nn.MaxPool2d(kernel_size=56, stride=56)
        self.max2 = nn.MaxPool2d(kernel_size=28, stride=28)
        self.max3 = nn.MaxPool2d(kernel_size=14, stride=14)
        self.num_ftrs = 2048 * 1 * 1
        self.elu = nn.ELU(inplace=True)

        self.classifier_concat = nn.Sequential(
            nn.BatchNorm1d(1024 * 3),
            nn.Linear(1024 * 3, feature_size),
            nn.BatchNorm1d(feature_size),
            nn.ELU(inplace=True),
            nn.Linear(feature_size, classes_num),
        )

        self.conv_block1 = nn.Sequential(
            BasicConv(self.num_ftrs//4, feature_size, kernel_size=1, stride=1, padding=0, relu=True),
            BasicConv(feature_size, self.num_ftrs//2, kernel_size=3, stride=1, padding=1, relu=True)
        )
        self.classifier1 = nn.Sequential(
            nn.BatchNorm1d(self.num_ftrs//2),
            nn.Linear(self.num_ftrs//2, feature_size),
            nn.BatchNorm1d(feature_size),
            nn.ELU(inplace=True),
            nn.Linear(feature_size, classes_num),
        )

        self.conv_block2 = nn.Sequential(
            BasicConv(self.num_ftrs//2, feature_size, kernel_size=1, stride=1, padding=0, relu=True),
            BasicConv(feature_size, self.num_ftrs//2, kernel_size=3, stride=1, padding=1, relu=True)
        )
        self.classifier2 = nn.Sequential(
            nn.BatchNorm1d(self.num_ftrs//2),
            nn.Linear(self.num_ftrs//2, feature_size),
            nn.BatchNorm1d(feature_size),
            nn.ELU(inplace=True),
            nn.Linear(feature_size, classes_num),
        )

        self.conv_block3 = nn.Sequential(
            BasicConv(self.num_ftrs, feature_size, kernel_size=1, stride=1, padding=0, relu=True),
            BasicConv(feature_size, self.num_ftrs//2, kernel_size=3, stride=1, padding=1, relu=True)
        )
        self.classifier3 = nn.Sequential(
            nn.BatchNorm1d(self.num_ftrs//2),
            nn.Linear(self.num_ftrs//2, feature_size),
            nn.BatchNorm1d(feature_size),
            nn.ELU(inplace=True),
            nn.Linear(feature_size, classes_num),
        )

    def forward(self, x):
    	#   x = torch.Size([8, 3, 448, 448])
        # xf1 = torch.Size([8, 64, 112, 112])
        # xf2 = torch.Size([8, 256, 112, 112])
        # xf3 = torch.Size([8, 512, 56, 56])
        # xf4 = torch.Size([8, 1024, 28, 28])
        # xf5 = torch.Size([8, 2048, 14, 14])
        xf1, xf2, xf3, xf4, xf5 = self.features(x)

        xl1 = self.conv_block1(xf3)
        xl2 = self.conv_block2(xf4)
        xl3 = self.conv_block3(xf5)
        
        xl1 = self.max1(xl1)
        xl1 = xl1.view(xl1.size(0), -1)
        xc1 = self.classifier1(xl1)

        xl2 = self.max2(xl2)
        xl2 = xl2.view(xl2.size(0), -1)
        xc2 = self.classifier2(xl2)

        xl3 = self.max3(xl3)
        xl3 = xl3.view(xl3.size(0), -1)
        xc3 = self.classifier3(xl3)
          
        x_concat = torch.cat((xl1, xl2, xl3), -1)
        x_concat = self.classifier_concat(x_concat)
        return xc1, xc2, xc3, x_concat

所以这里的四个返回值就对应了resnet中不同层的再次卷积得到的结果。

总结:

到这里,整个训练就完成了,最后总结下:所谓的渐进式训练,其实就是把不同层的tensor拿出来,得到分类结果,然后对应train.py中不同的inputs,这几个tensor最后分别看做一个指标,进行训练。其中inputs加了不同的拼图。
和传统分类改进的点:传统分类方法是只把最后一层的输出作为指标进行分类,PMG把中间几层也加入到指标当中,并且加入了不同尺度的拼图,可以让网络更关注细节特征,用一句话说,增加了网络的容错率吧。

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值