cardinality
cardinality, 指的是repeat layer的个数,下图右边cardinality为32。左图是ResNet的基本结构,输入channel size为64,右图是ResNeXt的基本结构,输入channel size是128,但两者具有相近的参数量。
有三种等价的ResNeXt Block,如下图,a是ResNeXt基本单元,如果把输出那里的1x1合并到一起,得到等价网络b拥有和Inception-ResNet相似的结构,
而进一步把输入的1x1也合并到一起,得到等价网络c则和通道分组卷积的网络有相似的结构。(分组卷积的含义:我们假设上一层的feature map总共有N个,即通道数channel=N,也就是说上一层有N个卷积核。再假设群卷积的群数目M。那么该群卷积层的操作就是,先将channel分成M份。每一个group对应N/M个channel,与之独立连接。然后各个group卷积完成后将输出叠在一起(concatenate),作为这一层的输出channel)
ResNeXt网络结构
下图表示ResNeXt-50(32x4d)的网络结构,卷积层和全连接层总数为50层,32表示的是cardinality,4d表示每一个repeat layer的channel数为4,所以整个block的通道数是32x4=128.
与基础版的不同之处只在于这里是三个卷积,分别是1x1,3x3,1x1,分别用来压缩维度,卷积处理,恢复维度,inplane是输入的通道数,plane是输出的通道数,expansion是对输出通道数的倍乘,在basic中expansion是1,此时完全忽略expansion这个东东,输出的通道数就是plane,然而bottleneck就是不走寻常路,它的任务就是要对通道数进行压缩,再放大,于是,plane不再代表输出的通道数,而是block内部压缩后的通道数,输出通道数变为plane*expansion。接着就是网络主体了。对应上面图示c里面的从128到256到512到1024到2048。
shortcut的作用是把输入数据的channel扩展到与输出数据一致来进行残差网络里面的相加。
class Block(nn.Module):
'''
Grouped convolution block(c).
'''
expansion = 2
def __init__(self, in_planes, cardinality=32, bottleneck_width=4, stride=1):
'''
in_planes: channel size of input
cardinality: number of groups
bottleneck_width: channel size of each group
'''
super(Block, self).__init__()
group_width = cardinality * bottleneck_width
self.conv1 = nn.Conv2d(in_planes, group_width, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(group_width)
# divide into 32 groups which 32 is cardinality
self.conv2 = nn.Conv2d(group_width, group_width, kernel_size=3, stride=stride, padding=1, groups=cardinality, bias=False)
self.bn2 = nn.BatchNorm2d(group_width)
self.conv3 = nn.Conv2d(group_width, self.expansion*group_width, kernel_size=1, bias=False)
self.bn3 = nn.BatchNorm2d(self.expansion*group_width)
self.shortcut = nn.Sequential()
if stride != 1 or in_planes != self.expansion*group_width:
self.shortcut = nn.Sequential(
nn.Conv2d(in_planes, self.expansion*group_width, kernel_size=1, stride=stride, bias=False),
nn.BatchNorm2d(self.expansion*group_width)
)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = F.relu(self.bn2(self.conv2(out)))
out = self.bn3(self.conv3(out))
out += self.shortcut(x)
out = F.relu(out)
return out
class ResNeXt(nn.Module):
def __init__(self, num_blocks, cardinality, bottleneck_width, num_classes=10):
'''
num_blocks: list type, channel size of input
cardinality: number of groups
bottleneck_width: channel size of each group
'''
super(ResNeXt, self).__init__()
self.cardinality = cardinality
self.bottleneck_width = bottleneck_width
self.in_planes = 64
self.conv1 = nn.Conv2d(3, 64, kernel_size=1, bias=False)
self.bn1 = nn.BatchNorm2d(64)
# size 32x32
self.layer1 = self._make_layer(num_blocks[0], 1)
# size 32x32
self.layer2 = self._make_layer(num_blocks[1], 2)
# size 16x16
self.layer3 = self._make_layer(num_blocks[2], 2)
# size 8x8
self.linear = nn.Linear(cardinality*bottleneck_width*8, num_classes)
def _make_layer(self, num_blocks, stride):
strides = [stride] + [1]*(num_blocks-1) #一个stride加上num_blocks-1个1
layers = []
for stride in strides:
layers.append(Block(self.in_planes, self.cardinality, self.bottleneck_width, stride))
self.in_planes = Block.expansion * self.cardinality * self.bottleneck_width
# Increase bottleneck_width by 2 after each stage.
self.bottleneck_width *= 2
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = F.avg_pool2d(out, 8)
out = out.view(out.size(0), -1)
out = self.linear(out)
return out
作业2:
定义一个ResNeXt-32(16x8d)(补全下面一行代码),仿照DenseNet的方式训练和测试这个网络得出前20个epoch结果,分析这个结果好或坏的原因
ResNeXt32_16x8d = ResNeXt([3, 4, 3], 16, 8)
run(ResNeXt32_16x8d, num_epochs=20)
Epoch: 0
Train loss: 2.167 | Train Acc: 18.150% (363/2000)
Test Loss: 4.153 | Test Acc: 10.000% (100/1000)
Epoch: 1
Train loss: 1.680 | Train Acc: 37.750% (755/2000)
Test Loss: 1.449 | Test Acc: 49.000% (490/1000)
Epoch: 2
Train loss: 1.060 | Train Acc: 63.650% (1273/2000)
Test Loss: 2.061 | Test Acc: 39.500% (395/1000)
Epoch: 3
Train loss: 0.667 | Train Acc: 78.150% (1563/2000)
Test Loss: 1.388 | Test Acc: 59.700% (597/1000)
Epoch: 4
Train loss: 0.528 | Train Acc: 81.150% (1623/2000)
Test Loss: 0.491 | Test Acc: 84.300% (843/1000)
Epoch: 5
Train loss: 0.387 | Train Acc: 87.450% (1749/2000)
Test Loss: 0.523 | Test Acc: 83.800% (838/1000)
Epoch: 6
Train loss: 0.326 | Train Acc: 89.450% (1789/2000)
Test Loss: 0.901 | Test Acc: 73.400% (734/1000)
Epoch: 7
Train loss: 0.284 | Train Acc: 91.100% (1822/2000)
Test Loss: 0.897 | Test Acc: 71.400% (714/1000)
Epoch: 8
Train loss: 0.198 | Train Acc: 94.150% (1883/2000)
Test Loss: 0.510 | Test Acc: 81.900% (819/1000)
Epoch: 9
Train loss: 0.168 | Train Acc: 95.600% (1912/2000)
Test Loss: 0.294 | Test Acc: 90.700% (907/1000)
Epoch: 10
Train loss: 0.158 | Train Acc: 95.500% (1910/2000)
Test Loss: 0.614 | Test Acc: 81.300% (813/1000)
Epoch: 11
Train loss: 0.152 | Train Acc: 95.250% (1905/2000)
Test Loss: 0.292 | Test Acc: 90.100% (901/1000)
Epoch: 12
Train loss: 0.167 | Train Acc: 95.050% (1901/2000)
Test Loss: 0.421 | Test Acc: 86.900% (869/1000)
Epoch: 13
Train loss: 0.125 | Train Acc: 96.350% (1927/2000)
Test Loss: 0.189 | Test Acc: 94.500% (945/1000)
Epoch: 14
Train loss: 0.090 | Train Acc: 97.700% (1954/2000)
Test Loss: 0.178 | Test Acc: 94.300% (943/1000)
Epoch: 15
Train loss: 0.089 | Train Acc: 97.550% (1951/2000)
Test Loss: 0.401 | Test Acc: 88.500% (885/1000)
Epoch: 16
Train loss: 0.094 | Train Acc: 97.150% (1943/2000)
Test Loss: 0.282 | Test Acc: 91.300% (913/1000)
Epoch: 17
Train loss: 0.096 | Train Acc: 97.450% (1949/2000)
Test Loss: 0.189 | Test Acc: 94.300% (943/1000)
Epoch: 18
Train loss: 0.098 | Train Acc: 97.550% (1951/2000)
Test Loss: 0.216 | Test Acc: 93.000% (930/1000)
Epoch: 19
Train loss: 0.127 | Train Acc: 96.000% (1920/2000)
Test Loss: 0.156 | Test Acc: 94.900% (949/1000)
和DenseNet相比,ResNeXt-32在准确率的提升和最终在测试集和训练集的表现上都有一些进步,但是其训练过程不够稳定,会出现loss的上升或者是accu的下降。
train_loader, _ = data_loader()
dataiter = iter(train_loader)
images, _ = dataiter.next() # get a batch of images
with SummaryWriter(comment="ResNeXt32") as s:
s.add_graph(ResNeXt32_16x8d, (Variable(images),))
s.close()