python batchnorm2d_PyTorch 卷积与BatchNorm的融合

最新推荐文章于 2024-06-26 11:57:46 发布

weixin_39882870

最新推荐文章于 2024-06-26 11:57:46 发布

阅读量365

点赞数

文章标签： python batchnorm2d

该博客介绍了如何在PyTorch中实现卷积层与批量归一化层的融合，以此提高模型运行速度。作者提供了一个简单的Python实现，并展示了融合操作后的速度提升和精度损失。通过测试表明，融合操作尤其是在BN层较多的情况下能显著提升速度，虽然会带来一定的精度下降，但适合于需要速度优化的场景。

摘要由CSDN通过智能技术生成

2020-05-27 更新PyTorch已经官方支持了合并操作：Captain Jack：MergeBN && Quantization PyTorch 官方解决方案zhuanlan.zhihu.com

2. 有用户爸爸/妈妈(我是讲女权的)在用我的这套代码的时候出现了各种错误，如果还是打算用这套，我将最新版同步到了github上，后面也会不定期同步：https://github.com/qinjian623/pytorch_toys/blob/master/post_quant/fusion.pygithub.com

原文：2018-11-11(本文最后一次更新的时间，神tm的日子...)

融合Conv和BatchNorm是个很基本的优化提速方法，很多框架应该都提供了功能。自己因为一个Weekend Project的需求，需要在PyTorch的Python里直接这个事情给做了。

这个融合优化属于经济上净赚的事情，精度理论上无损(实际上有损，但是很小，既然都提速了，八成要弄量化，这个精度掉的更夸张)，速度有大幅度提升，尤其是BN层接的特别多的情况。

融合原理

卷积的工作：

BN的工作：

带入的话可以推出来，融合后的新卷积：

新的卷积就直接顺路完成BN的工作。

测试结果：

在我的笔记本上的测试，CPU版本应该是同步的吧，否则这个结果也是不靠谱的，当然这个结果也不是严肃结果，没平均，没热机。不过能定性说明问题就OK，单位是秒。

import torch

import torch.nn as nn

import torchvision as tv

class DummyModule(nn.Module):

def __init__(self):

super(DummyModule, self).__init__()

def forward(self, x):

# print("Dummy, Dummy.")

return x

def fuse(conv, bn):

w = conv.weight

mean = bn.running_mean

var_sqrt = torch.sqrt(bn.running_var + bn.eps)

beta = bn.weight

gamma = bn.bias

if conv.bias is not None:

b = conv.bias

else:

b = mean.new_zeros(mean.shape)

w = w * (beta / var_sqrt).reshape([conv.out_channels, 1, 1, 1])

b = (b - mean)/var_sqrt * beta + gamma

fused_conv = nn.Conv2d(conv.in_channels,

conv.out_channels,

conv.kernel_size,

conv.stride,

conv.padding,

bias=True)

fused_conv.weight = nn.Parameter(w)

fused_conv.bias = nn.Parameter(b)

return fused_conv

def fuse_module(m):

children = list(m.named_children())

c = None

cn = None

for name, child in children:

if isinstance(child, nn.BatchNorm2d):

bc = fuse(c, child)

m._modules[cn] = bc

m._modules[name] = DummyModule()

c = None

elif isinstance(child, nn.Conv2d):

c = child

cn = name

else:

fuse_module(child)

def test_net(m):

p = torch.randn([1, 3, 224, 224])

import time

s = time.time()

o_output = m(p)

print("Original time: ", time.time() - s)

fuse_module(m)

s = time.time()

f_output = m(p)

print("Fused time: ", time.time() - s)

print("Max abs diff: ", (o_output - f_output).abs().max().item())

assert(o_output.argmax() == f_output.argmax())

# print(o_output[0][0].item(), f_output[0][0].item())

print("MSE diff: ", nn.MSELoss()(o_output, f_output).item())

def test_layer():

p = torch.randn([1, 3, 112, 112])

conv1 = m.conv1

bn1 = m.bn1

o_output = bn1(conv1(p))

fusion = fuse(conv1, bn1)

f_output = fusion(p)

print(o_output[0][0][0][0].item())

print(f_output[0][0][0][0].item())

print("Max abs diff: ", (o_output - f_output).abs().max().item())

print("MSE diff: ", nn.MSELoss()(o_output, f_output).item())

m = tv.models.resnet152(True)

m.eval()

print("Layer level test: ")

test_layer()

print("============================")

print("Module level test: ")

m = tv.models.resnet18(True)

m.eval()

test_net(m)

weixin_39882870

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
复制链接

分享到 QQ

分享到新浪微博

扫一扫