本系列文章介绍一种最近比较火的设计网络的思想,即利用网络的重参数化,把多层合成一层,进行网络加速。
本文介绍的RepVGG: Making VGG-style ConvNets Great Again是参数重组的开创性论文,将参数重组应用在vgg分类网络上。
1. 理论部分
RepVGG有5个阶段,每一阶段开始时通过stride=2的卷积进行下采样。下图展示了一个特定阶段的前4个层次。
RepVGG的创新点在于,它只在训练阶段使用shortcut结构,而在测试阶段将参数重组,目的是减少参数数量,并在测试时去掉shortcut结构。
1 BN层和卷积层的参数重组为卷积层
令
(
μ
,
σ
,
γ
,
β
)
(\mu ,\sigma ,\gamma ,\beta )
(μ,σ,γ,β)表示BN层的统计参数,
M
∈
R
N
×
C
1
×
H
1
×
W
1
M\in R^{N\times C_1\times H_1\times W_1}
M∈RN×C1×H1×W1表示输入矩阵,则BN层的输出为:
令
就可以得到等价的卷积层参数
因此就将BN层和卷积层的参数合并为卷积层的参数。
2 三条路径参数重组为一条
用3,1,0的上角标分别表示3x3conv+bn,1x1conv+bn和shortcut+bn三条路径, M ( 2 ) M^{(2)} M(2)和 M ( 1 ) M^{(1)} M(1)分别表示输入和输出矩阵,他们的关系为:
在三条路径上应用之前的变换,并将shortcut视为卷积核为恒等矩阵的1x1conv,就将3x3conv+bn,1x1conv+bn和shortcut+bn转换为1个3×3conv,2个1×1conv,和三个偏差向量。
然后:
- 将三个偏差向量相加得到最终偏差,
- 将1×1conv加0填充为3x3卷积
- 将3个3×3conv相加得到最终的3x3卷积
下图为上述步骤的示意图:
作者设计的ResVGG每层的结构如下:
其中a和b为可设置的参数:
2. 代码部分
2.1 参数重组
参数重组部分由get_equivalent_kernel_bias()
实现:
def get_equivalent_kernel_bias(self):
kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
其中包含4步:
- 融合bn层
def _fuse_bn_tensor(self, branch):
if branch is None:
return 0, 0
if isinstance(branch, nn.Sequential):
kernel = branch.conv.weight
running_mean = branch.bn.running_mean
running_var = branch.bn.running_var
gamma = branch.bn.weight
beta = branch.bn.bias
eps = branch.bn.eps
else:
assert isinstance(branch, nn.BatchNorm2d)
if not hasattr(self, 'id_tensor'):
input_dim = self.in_channels // self.groups
kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
for i in range(self.in_channels):
kernel_value[i, i % input_dim, 1, 1] = 1
self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
kernel = self.id_tensor
running_mean = branch.running_mean
running_var = branch.running_var
gamma = branch.weight
beta = branch.bias
eps = branch.eps
std = (running_var + eps).sqrt()
t = (gamma / std).reshape(-1, 1, 1, 1)
return kernel * t, beta - running_mean * gamma / std
- 将三个偏差向量相加得到最终偏差
bias = bias3x3 + bias1x1 + biasid
- 将1×1conv加0填充为3x3卷积
def _pad_1x1_to_3x3_tensor(self, kernel1x1):
if kernel1x1 is None:
return 0
else:
return torch.nn.functional.pad(kernel1x1, [1,1,1,1])
- 将3个3×3conv相加得到最终的3x3卷积
kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid
2.2 转换模式
首先将训练模型以正常方法训练完毕,然后将其通过switch_to_deploy
转换到部署模式:
- 计算等价的测试模型的kernal和bias
- 将等价参数部署到
Conv2d
上
def switch_to_deploy(self):
if hasattr(self, 'rbr_reparam'):
return
kernel, bias = self.get_equivalent_kernel_bias()
self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels, out_channels=self.rbr_dense.conv.out_channels,
kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation, groups=self.rbr_dense.conv.groups, bias=True)
self.rbr_reparam.weight.data = kernel
self.rbr_reparam.bias.data = bias
for para in self.parameters():
para.detach_()
self.__delattr__('rbr_dense')
self.__delattr__('rbr_1x1')
if hasattr(self, 'rbr_identity'):
self.__delattr__('rbr_identity')
if hasattr(self, 'id_tensor'):
self.__delattr__('id_tensor')
self.deploy = True