语义分割系列12-APCNet（pytorch实现）

yumaomi

已于 2022-06-15 23:24:52 修改

阅读量1.5k

点赞数

分类专栏：语义分割文章标签： pytorch 深度学习人工智能计算机视觉

于 2022-06-15 19:17:38 首次发布

本文链接：https://blog.csdn.net/yumaomi/article/details/125279662

版权

语义分割专栏收录该内容

31 篇文章 235 订阅

订阅专栏

APCNet（Adaptive Pyramid Context Network）发布于2019CVPR。

论文地址：Adaptive Pyramid Context Network for Semantic Segmentation

引言

如先前的工作：

PSPNet 提出了PPM（Pyramid Pooling Module）来聚合全局的上下文信息；
ParseNet 则是使用了简单但效果不错的GAP（Global Average Pooling）来编码全局上下文；
DANet 采用了self-attention机制来捕获任意距离内的全局信息；
PSANet 是在引入PSA模块（注意力机制的一种）来聚合信息，也是通过注意力机制来编码全局的上下文信息。

这些工作都提到了全局信息的融合，以及提升多尺度检测效果的一些方式。

作者在回顾完先前的工作后，提出了三个拟解决问题：

多尺度问题(Multi-scale)
自适应区域(Adaptive)
全局和局部的信息融合权重(Global-guided Local Affinity, GLA)

首先是多尺度问题，对于语义分割任务而言，物体往往存在尺寸不同、位置不同的特点，对于一些没有聚合上下文信息的模型来说，检测这种尺寸相差较大的物体比较困难，同时会丢失一些细节信息。
对于自适应区域的问题：在图像中，并不是所有的区域都与被分割物体有关，或者说，有些像素点对于物体正确分割的影响大，而有些像素点则没有什么影响。同时，这些像素点或者叫相关区域的位置不一定就在被分割物体的周围，也有可能远离被分割物体。这就要求模型具有自适应选择区域的能力，能够识别这些重要区域帮助物体的正确分割。
而对于GLA而言，也是许多模型存在的问题，就是在构建了上下文向量之后，如何将上下文向量和原始特征图进行加权，这个权重如何该去选择和计算。

通过这三个问题的引入，作者提出了PSANet来提供解决方案。

模型

APCNet的金字塔层由若干个ACM模块来构成，类似于PSPNet中的PPM模块。每一个ACM模块接收一个scale（s）参数，来确定区域大小s。

ACM（Adaptive Context Module）

作者提出了ACM模块来解决所提出的三个问题。

ACM本质上就是利用GLA来计算每个局部位置的上下文向量，并将这个向量加权到特征图上，实现一个聚合上下文信息的作用。

ACM由两个分支构成，本文称为GLA分支和Aggregate分支。

GLA分支

在GLA分支中，backbone输出的特征图记为X，X先经过一个1x1的Conv来得到一个特征映射x，通过一个空间全局池化，将x映射成一个全局信息向量（Global Information）g(X)。随后，将x和g(X)通过一个1x1的Conv和Sigmoid激活，生成一个GLA vector - $\alpha^ s$ ，将这个向量 $\alpha^ s$ reshape后得到结果。

Aggregate分支

在Aggregate分支中，特征图X通过AdaptivePooling（size=s）、Conv（kernel size=1x1）、reshape后得到的形状为s**2*512大小的 $y^s$ ，这个 $y^s$ 与GLA分支中的GLA向量 $\alpha^ s$ 进行矩阵乘法，生成hw*512的结果。在这一步完成初步的特征融合。最后reshape成原始大小，并与GLA部分的残差相加，最终输出总的融合结果。

整个过程也可以用公式来说明：

其中：α = f(x,g,j)

模型复现

模型部分

backbone ResNet50

import torch
import torch.nn as nn

class BasicBlock(nn.Module):
    expansion: int = 4
    def __init__(self, inplanes, planes, stride = 1, downsample = None, groups = 1,
        base_width = 64, dilation = 1, norm_layer = None):
        
        super(BasicBlock, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        if groups != 1 or base_width != 64:
            raise ValueError("BasicBlock only supports groups=1 and base_width=64")
        if dilation > 1:
            raise NotImplementedError("Dilation > 1 not supported in BasicBlock")
        # Both self.conv1 and self.downsample layers downsample the input when stride != 1
        self.conv1 = nn.Conv2d(inplanes, planes ,kernel_size=3, stride=stride, 
                               padding=dilation,groups=groups, bias=False,dilation=dilation)
        
        self.bn1 = norm_layer(planes)
        self.relu = nn.ReLU(inplace=True)
        self.conv2 = nn.Conv2d(planes, planes ,kernel_size=3, stride=stride, 
                               padding=dilation,groups=groups, bias=False,dilation=dilation)
        
        self.bn2 = norm_layer(planes)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)

        return out


class Bottleneck(nn.Module):
    expansion = 4

    def __init__(self, inplanes, planes, stride=1, downsample= None,
        groups = 1, base_width = 64, dilation = 1, norm_layer = None,):
        super(Bottleneck, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        width = int(planes * (base_width / 64.0)) * groups
        # Both self.conv2 and self.downsample layers downsample the input when stride != 1
        self.conv1 = nn.Conv2d(inplanes, width, kernel_size=1, stride=1, bias=False)
        self.bn1 = norm_layer(width)
        self.conv2 = nn.Conv2d(width, width, kernel_size=3, stride=stride, bias=False, padding=dilation, dilation=dilation)
        self.bn2 = norm_layer(width)
        self.conv3 = nn.Conv2d(width, planes * self.expansion, kernel_size=1, stride=1, bias=False)
        self.bn3 = norm_layer(planes * self.expansion)
        self.relu = nn.ReLU(inplace=True)
        self.downsample = downsample
        self.stride = stride

    def forward(self, x):
        identity = x

        out = self.conv1(x)
        out = self.bn1(out)
        out = self.relu(out)

        out = self.conv2(out)
        out = self.bn2(out)
        out = self.relu(out)

        out = self.conv3(out)
        out = self.bn3(out)

        if self.downsample is not None:
            identity = self.downsample(x)

        out += identity
        out = self.relu(out)
        return out


class ResNet(nn.Module):
    def __init__(
        self,block, layers,num_classes = 1000, zero_init_residual = False, groups = 1,
        width_per_group = 64, replace_stride_with_dilation = None, norm_layer = None):
        super(ResNet, self).__init__()
        if norm_layer is None:
            norm_layer = nn.BatchNorm2d
        self._norm_layer = norm_layer
        self.inplanes = 64
        self.dilation = 2
        if replace_stride_with_dilation is None:
            # each element in the tuple indicates if we should replace
            # the 2x2 stride with a dilated convolution instead
            replace_stride_with_dilation = [False, False, False]
            
        if len(replace_stride_with_dilation) != 3:
            raise ValueError(
                "replace_stride_with_dilation should be None "
                f"or a 3-element tuple, got {replace_stride_with_dilation}"
            )
        self.groups = groups
        self.base_width = width_per_group
        self.conv1 = nn.Conv2d(3, self.inplanes, kernel_size=7, stride=2, padding=3, bias=False)
        self.bn1 = norm_layer(self.inplanes)
        self.relu = nn.ReLU(inplace=True)
        self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
        self.layer1 = self._make_layer(block, 64, layers[0])
        self.layer2 = self._make_layer(block, 128, layers[1], stride=1, dilate=replace_stride_with_dilation[0])
        self.layer3 = self._make_layer(block, 256, layers[2], stride=2, dilate=replace_stride_with_dilation[1])
        self.layer4 = self._make_layer(block, 512, layers[3], stride=1, dilate=replace_stride_with_dilation[2])
        self.avgpool = nn.AdaptiveAvgPool2d((1, 1))
        self.fc = nn.Linear(512 * block.expansion, num_classes)

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                nn.init.kaiming_normal_(m.weight, mode="fan_out", nonlinearity="relu")
            elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
                nn.init.constant_(m.weight, 1)
                nn.init.constant_(m.bias, 0)

        # Zero-initialize the last BN in each residual branch,
        # so that the residual branch starts with zeros, and each residual block behaves like an identity.
        # This improves the model by 0.2~0.3% according to https://arxiv.org/abs/1706.02677
        if zero_init_residual:
            for m in self.modules():
                if isinstance(m, Bottleneck):
                    nn.init.constant_(m.bn3.weight, 0)  # type: ignore[arg-type]
                elif isinstance(m, BasicBlock):
                    nn.init.constant_(m.bn2.weight, 0)  # type: ignore[arg-type]

    def _make_layer(
        self,
        block,
        planes,
        blocks,
        stride = 1,
        dilate = False,
    ):
        norm_layer = self._norm_layer
        downsample = None
        previous_dilation = self.dilation
        if dilate:
            self.dilation *= stride
            stride = stride
            
        if stride != 1 or self.inplanes != planes * block.expansion:
            downsample = nn.Sequential(
                nn.Conv2d(self.inplanes,  planes * block.expansion, kernel_size=1, stride=stride, bias=False),
                norm_layer(planes * block.expansion))

        layers = []
        layers.append(
            block(
                self.inplanes, planes, stride, downsample, self.groups, self.base_width, previous_dilation, norm_layer
            )
        )
        self.inplanes = planes * block.expansion
        for _ in range(1, blocks):
            layers.append(
                block(
                    self.inplanes,
                    planes,
                    groups=self.groups,
                    base_width=self.base_width,
                    dilation=self.dilation,
                    norm_layer=norm_layer,
                )
            )
        return nn.Sequential(*layers)

    def _forward_impl(self, x):

        x = self.conv1(x)
        x = self.bn1(x)
        x = self.relu(x)
        x = self.maxpool(x)
        x = self.layer1(x)
        x = self.layer2(x)
        x = self.layer3(x)
        x = self.layer4(x)
        
        return x

    def forward(self, x) :
        return self._forward_impl(x)
    def _resnet(block, layers, pretrained_path = None, **kwargs,):
        model = ResNet(block, layers, **kwargs)
        if pretrained_path is not None:
            model.load_state_dict(torch.load(pretrained_path),  strict=False)
        return model
    
    def resnet50(pretrained_path=None, **kwargs):
        return ResNet._resnet(Bottleneck, [3, 4, 6, 3],pretrained_path,**kwargs)
    
    def resnet101(pretrained_path=None, **kwargs):
        return ResNet._resnet(Bottleneck, [3, 4, 23, 3],pretrained_path,**kwargs)

APCNet主体结构

import torch
import torch.nn as nn
import torch.nn.functional as F
from torchvision.transforms import Resize
class ACMModle(nn.Module):
    def __init__(self, in_channels=2048, channels=512, pool_scale=1, fusion=True):
        super(ACMModle, self).__init__()
        self.pool_scale = pool_scale
        self.in_channels = in_channels
        self.channels = channels
        self.fusion = fusion
        
        # Global Information vector
        self.reduce_Conv = nn.Conv2d(self.in_channels, self.channels, 1)
        self.reduce_Pool_Conv = nn.Conv2d(self.in_channels, self.channels, 1)
        
        self.residual_conv = nn.Conv2d(self.channels, self.channels, 1)
        self.global_info = nn.Conv2d(self.channels, self.channels, 1)
        self.gla = nn.Conv2d(self.channels, self.pool_scale**2, 1, 1, 0)
        
        if self.fusion:
            self.fusion_conv = nn.Conv2d(self.channels, self.channels, 1)


    def forward(self, x):
        batch_size, c, w, h = x.shape
        pooled_x = self.reduce_Pool_Conv(x)
        pooled_x = pooled_x.view(batch_size, self.channels,-1).permute(0, 2, 1).contiguous()
        x = self.reduce_Conv(x)
        GI = self.global_info(F.adaptive_avg_pool2d(x, 1))
        GI = Resize(x.shape[2:])(GI)
        
        Affinity_matrix = self.gla(x + GI).permute(0, 2, 3, 1).reshape(batch_size, -1, self.pool_scale**2)
        
        Affinity_matrix = F.sigmoid(Affinity_matrix)
        
        
        pooled_x = F.adaptive_avg_pool2d(x, self.pool_scale)
        pooled_x = pooled_x.view(batch_size, -1, self.pool_scale**2).permute(0, 2, 1).contiguous()
        MatrixProduct = torch.matmul(Affinity_matrix, pooled_x)
        MatrixProduct = MatrixProduct.permute(0, 2, 1).contiguous()
        MatrixProduct = MatrixProduct.view(batch_size, self.channels, x.size(2), x.size(3))
        MatrixProduct = self.residual_conv(MatrixProduct)
        Z_out = F.relu(MatrixProduct + x)
        
        if self.fusion:
            Z_out = self.fusion_conv(Z_out)
        return Z_out
    
    
class ACMModuleList(nn.ModuleList):
    def __init__(self, pool_scales = [1,2,3,6], in_channels = 2048, channels = 512):
        super(ACMModuleList, self).__init__()
        self.pool_scales = pool_scales
        self.in_channels = in_channels
        self.channels = channels
        
        for pool_scale in pool_scales:
            self.append(
                ACMModle(in_channels, channels, pool_scale)
            )
            
    def forward(self, x):
        out = []
        for ACM in self:
            ACM_out = ACM(x)
            out.append(ACM_out)
        return out
    
class APCNet(nn.Module):
    def __init__(self, num_classes):
        super(APCNet, self).__init__()
        self.num_classes = num_classes
        self.backbone = ResNet.resnet50(replace_stride_with_dilation=[1,2,4])
        self.in_channels = 2048
        self.channels = 512
        self.ACM_pyramid = ACMModuleList(pool_scales=[1,2,3,6], in_channels=self.in_channels, channels=self.channels)
        self.conv1 = nn.Sequential(
            nn.Conv2d(4*self.channels + self.in_channels, self.channels, 3, padding=1),
            nn.BatchNorm2d(self.channels),
            nn.ReLU()
        )
        self.cls_conv = nn.Conv2d(self.channels, self.num_classes, 3, padding=1)
        
    def forward(self, x):
        x = self.backbone(x)
        ACM_out = self.ACM_pyramid(x)
        ACM_out.append(x)
        x = torch.cat(ACM_out, dim=1)
        x = self.conv1(x)
        x = Resize((8*x.shape[-2], 8*x.shape[-1]))(x)
        x = self.cls_conv(x)
        return x

数据集部分

数集使用Camvid .

# 导入库
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '0'

import torch
import torch.nn as nn
import torch.optim as optim
import torch.nn.functional as F
from torch import optim
from torch.utils.data import Dataset, DataLoader, random_split
from tqdm import tqdm
import warnings
warnings.filterwarnings("ignore")
import os.path as osp
import matplotlib.pyplot as plt
from PIL import Image
import numpy as np
import albumentations as A
from albumentations.pytorch.transforms import ToTensorV2

torch.manual_seed(17)
# 自定义数据集CamVidDataset
class CamVidDataset(torch.utils.data.Dataset):
    """CamVid Dataset. Read images, apply augmentation and preprocessing transformations.
    
    Args:
        images_dir (str): path to images folder
        masks_dir (str): path to segmentation masks folder
        class_values (list): values of classes to extract from segmentation mask
        augmentation (albumentations.Compose): data transfromation pipeline 
            (e.g. flip, scale, etc.)
        preprocessing (albumentations.Compose): data preprocessing 
            (e.g. noralization, shape manipulation, etc.)
    """
    
    def __init__(self, images_dir, masks_dir):
        self.transform = A.Compose([
            A.Resize(224, 224),
            A.HorizontalFlip(),
            A.VerticalFlip(),
            A.Normalize(),
            ToTensorV2(),
        ]) 
        self.ids = os.listdir(images_dir)
        self.images_fps = [os.path.join(images_dir, image_id) for image_id in self.ids]
        self.masks_fps = [os.path.join(masks_dir, image_id) for image_id in self.ids]

    
    def __getitem__(self, i):
        # read data
        image = np.array(Image.open(self.images_fps[i]).convert('RGB'))
        mask = np.array( Image.open(self.masks_fps[i]).convert('RGB'))
        image = self.transform(image=image,mask=mask)
        
        return image['image'], image['mask'][:,:,0]
        
    def __len__(self):
        return len(self.ids)
    
    
# 设置数据集路径
DATA_DIR = r'dataset\camvid' # 根据自己的路径来设置
x_train_dir = os.path.join(DATA_DIR, 'train_images')
y_train_dir = os.path.join(DATA_DIR, 'train_labels')
x_valid_dir = os.path.join(DATA_DIR, 'valid_images')
y_valid_dir = os.path.join(DATA_DIR, 'valid_labels')
    
train_dataset = CamVidDataset(
    x_train_dir, 
    y_train_dir, 
)
val_dataset = CamVidDataset(
    x_valid_dir, 
    y_valid_dir, 
)

train_loader = DataLoader(train_dataset, batch_size=8, shuffle=True,drop_last=True)
val_loader = DataLoader(val_dataset, batch_size=8, shuffle=True,drop_last=True)

model = APCNet(num_classes=33).cuda()
#model.load_state_dict(torch.load(r"checkpoints/resnet101-5d3b4d8f.pth"), strict=False)

模型训练

from d2l import torch as d2l
from tqdm import tqdm
import pandas as pd
#损失函数选用多分类交叉熵损失函数
lossf = nn.CrossEntropyLoss(ignore_index=255)
#选用adam优化器来训练
optimizer = optim.SGD(model.parameters(), lr=0.1)
scheduler = torch.optim.lr_scheduler.StepLR(optimizer, step_size=50, gamma=0.5, last_epoch=-1)

#训练50轮
epochs_num = 100
def train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs,scheduler,
               devices=d2l.try_all_gpus()):
    timer, num_batches = d2l.Timer(), len(train_iter)
    animator = d2l.Animator(xlabel='epoch', xlim=[1, num_epochs], ylim=[0, 1],
                            legend=['train loss', 'train acc', 'test acc'])
    net = nn.DataParallel(net, device_ids=devices).to(devices[0])
    
    loss_list = []
    train_acc_list = []
    test_acc_list = []
    epochs_list = []
    time_list = []
    
    for epoch in range(num_epochs):
        # Sum of training loss, sum of training accuracy, no. of examples,
        # no. of predictions
        metric = d2l.Accumulator(4)
        for i, (features, labels) in enumerate(train_iter):
            timer.start()
            l, acc = d2l.train_batch_ch13(
                net, features, labels.long(), loss, trainer, devices)
            metric.add(l, acc, labels.shape[0], labels.numel())
            timer.stop()
            if (i + 1) % (num_batches // 5) == 0 or i == num_batches - 1:
                animator.add(epoch + (i + 1) / num_batches,
                             (metric[0] / metric[2], metric[1] / metric[3],
                              None))
        test_acc = d2l.evaluate_accuracy_gpu(net, test_iter)
        animator.add(epoch + 1, (None, None, test_acc))
        scheduler.step()
        print(f"epoch {epoch+1} --- loss {metric[0] / metric[2]:.3f} ---  train acc {metric[1] / metric[3]:.3f} --- test acc {test_acc:.3f} --- cost time {timer.sum()}")
        
        #---------保存训练数据---------------
        df = pd.DataFrame()
        loss_list.append(metric[0] / metric[2])
        train_acc_list.append(metric[1] / metric[3])
        test_acc_list.append(test_acc)
        epochs_list.append(epoch+1)
        time_list.append(timer.sum())
        
        df['epoch'] = epochs_list
        df['loss'] = loss_list
        df['train_acc'] = train_acc_list
        df['test_acc'] = test_acc_list
        df['time'] = time_list
        df.to_excel("savefile/APCNet_camvid.xlsx")
        #----------------保存模型-------------------
        if np.mod(epoch+1, 5) == 0:
            torch.save(model.state_dict(), f'checkpoints/APCNet_{epoch+1}.pth')
train_ch13(model, train_loader, val_loader, lossf, optimizer, epochs_num,scheduler)