Aggregated Residual Transformations for Deep Neural Networks论文解读
论文链接:https://github.com/prlz77/ResNeXt.pytorch
摘要:
VGG展示了一个简单而有效的策略来构建非常深的网络:堆叠相同尺寸的block,这一规则的简单性可以减少将超参数过度适应于特定数据集的风险。Inception module 证明了精心设计的拓扑能够在低理论复杂性的情况下实现令人信服的精度,他采取的策略是是分裂-转换-合并策略。在Inception module中,输入被分成几个低维嵌入(1x1卷积),再通过3x3或者5x5进行专门转换,最后进行连接合并,可以证明,Inception module 的分裂-转换-合并行为有望接近大型和密集层的表示能力,但在计算复杂度相当低的水平。
在本文中,我们提出了一个简单的架构,采用VGG/ResNets的重复层策略,同时利用分裂-转换-合并策略。网络中的一个模块执行一组转换,每个转换都在一个低维嵌入上,其输出通过求和聚合(如下图)。这种设计允许我们在没有专门设计的情况下扩展到任何大量的转换。
Block:
block的建造遵从以下两个规则:
(i)如果生成相同大小的特征图,则这些块共享相同的超参数(宽度和卷积核大小)
(ii)每次将特征图降采样2倍时,这些块的宽度乘以2倍
我们设定一个新的超参数基数C,作为聚合的数量,即下图中的32,实验证明,基数C在实验中比宽度和深度更加有效。
设计的block结构主要如下图所示,其中三者都是等价的,不同于Inception Net和ResNet网络,我们在多条路径之间共享相同的拓扑结构。我们的模块对设计每个路径只需要最小的额外努力。同时,在图C中,我们利用分组卷积的特性,使上述模块变得更加简洁,其中group的大小等于基数C,但图C中不适用于所有情况。
网络结构:
我们的模型需要尽可能地保持了模型的复杂性和参数的数量,因为参数的复杂性和数量代表了模型的固有能力。因此,在这个过程中,我们将Resnet的block作为对照,通过来保持模型的复杂性和参数数量与Resnet相似,来获得基数C和宽度d的数值,实验结果如下图,其中第三行数值由第一二行相乘获得。
模型结构如下图,其中stage3,4,5的下采样发生在每个stage的第一个block中。
代码:
import torch.nn as nn
import torch.nn.functional as F
from torch.nn import init
import torch
class ResNeXtBottleneck(nn.Module):
"""
RexNeXt bottleneck type C (https://github.com/facebookresearch/ResNeXt/blob/master/models/resnext.lua)
"""
def __init__(self, in_channels, out_channels, stride, cardinality, base_width, widen_factor):
""" Constructor
Args:
in_channels: input channel dimensionality
out_channels: output channel dimensionality
stride: conv stride. Replaces pooling layer.
cardinality: num of convolution groups.
base_width: base number of channels in each group.
widen_factor: factor to reduce the input dimensionality before convolution.
"""
super(ResNeXtBottleneck, self).__init__()
width_ratio = out_channels / (widen_factor * 64.)
D = cardinality * int(base_width * width_ratio)
self.conv_reduce = nn.Conv2d(in_channels, D, kernel_size=1, stride=1, padding=0, bias=False)
self.bn_reduce = nn.BatchNorm2d(D)
self.conv_conv = nn.Conv2d(D, D, kernel_size=3, stride=stride, padding=1, groups=cardinality, bias=False)
self.bn = nn.BatchNorm2d(D)
self.conv_expand = nn.Conv2d(D, out_channels, kernel_size=1, stride=1, padding=0, bias=False)
self.bn_expand = nn.BatchNorm2d(out_channels)
self.shortcut = nn.Sequential()
if in_channels != out_channels:
self.shortcut.add_module('shortcut_conv',
nn.Conv2d(in_channels, out_channels, kernel_size=1, stride=stride, padding=0,
bias=False))
self.shortcut.add_module('shortcut_bn', nn.BatchNorm2d(out_channels))
def forward(self, x):
bottleneck = self.conv_reduce.forward(x)
bottleneck = F.relu(self.bn_reduce.forward(bottleneck), inplace=True)
bottleneck = self.conv_conv.forward(bottleneck)
bottleneck = F.relu(self.bn.forward(bottleneck), inplace=True)
bottleneck = self.conv_expand.forward(bottleneck)
bottleneck = self.bn_expand.forward(bottleneck)
residual = self.shortcut.forward(x)
return F.relu(residual + bottleneck, inplace=True)
class CifarResNeXt(nn.Module):
"""
ResNext optimized for the Cifar dataset, as specified in
https://arxiv.org/pdf/1611.05431.pdf
"""
def __init__(self, cardinality, depth, nlabels, base_width, widen_factor=4):
""" Constructor
Args:
cardinality: number of convolution groups.
depth: number of layers.
nlabels: number of classes
base_width: base number of channels in each group.
widen_factor: factor to adjust the channel dimensionality
"""
super(CifarResNeXt, self).__init__()
self.cardinality = cardinality
self.depth = depth
self.block_depth = (self.depth - 2) // 9
self.base_width = base_width
self.widen_factor = widen_factor
self.nlabels = nlabels
self.output_size = 64
self.stages = [64, 64 * self.widen_factor, 128 * self.widen_factor, 256 * self.widen_factor]
self.conv_1_3x3 = nn.Conv2d(3, 64, 3, 1, 1, bias=False)
self.bn_1 = nn.BatchNorm2d(64)
self.stage_1 = self.block('stage_1', self.stages[0], self.stages[1], 1)
self.stage_2 = self.block('stage_2', self.stages[1], self.stages[2], 2)
self.stage_3 = self.block('stage_3', self.stages[2], self.stages[3], 2)
self.classifier = nn.Linear(self.stages[3], nlabels)
init.kaiming_normal(self.classifier.weight)
for key in self.state_dict():
if key.split('.')[-1] == 'weight':
if 'conv' in key:
init.kaiming_normal(self.state_dict()[key], mode='fan_out')
if 'bn' in key:
self.state_dict()[key][...] = 1
elif key.split('.')[-1] == 'bias':
self.state_dict()[key][...] = 0
def block(self, name, in_channels, out_channels, pool_stride=2):
""" Stack n bottleneck modules where n is inferred from the depth of the network.
Args:
name: string name of the current block.
in_channels: number of input channels
out_channels: number of output channels
pool_stride: factor to reduce the spatial dimensionality in the first bottleneck of the block.
Returns: a Module consisting of n sequential bottlenecks.
"""
block = nn.Sequential()
for bottleneck in range(self.block_depth):
name_ = '%s_bottleneck_%d' % (name, bottleneck)
if bottleneck == 0:
block.add_module(name_, ResNeXtBottleneck(in_channels, out_channels, pool_stride, self.cardinality,
self.base_width, self.widen_factor))
else:
block.add_module(name_,
ResNeXtBottleneck(out_channels, out_channels, 1, self.cardinality, self.base_width,
self.widen_factor))
return block
def forward(self, x):
x = self.conv_1_3x3.forward(x)
x = F.relu(self.bn_1.forward(x), inplace=True)
x = self.stage_1.forward(x)
x = self.stage_2.forward(x)
x = self.stage_3.forward(x)
x = F.avg_pool2d(x, 8, 1)
x = x.view(-1, self.stages[3])
return self.classifier(x)
if __name__ == '__main__':
input = torch.empty((1,3,224,224))
net = CifarResNeXt(8,29,10,112,14)
out = net(input)
print(out)