maskrcnn_benchmark理解记录——关于batch norm、relu、dropout 的相对顺序以及dropout可不可用

最新推荐文章于 2024-07-25 22:34:22 发布

业精于勤荒于嬉-行成于思而毁于随

最新推荐文章于 2024-07-25 22:34:22 发布

阅读量3.6k

点赞数 1

分类专栏：姿态估计逐步 maskrcnn理解记录

本文链接：https://blog.csdn.net/m0_37644085/article/details/89205770

版权

本文讨论了在卷积神经网络中，Batch Normalization（BN）、ReLU和Dropout的相对顺序。BN应在ReLU之前，而Dropout通常在全连接层使用，对卷积层效果不佳。现代网络结构中，BN配合全局平均池化可减少过拟合，而Dropout应用减少。

摘要由CSDN通过智能技术生成

ps：

1.如何在卷积神经网络中实现全局平均池化。在此之前，建议阅读 ResNet这篇论文，以了解全局平均池化操作的好处。代替全连接层。

2.~~dropout只可能在box分支的两个全连接层那里，这个可以后期finetuning下。~~全连接网络可以使feature map的维度减少，进而输入到softmax，但是又会造成过拟合，可以用pooling来代替全连接。那就解决了之前的问题：要不要在fc层使用dropout。使用AVP就不要了。

一、batch norm、relu、dropout 等的相对顺序

conv2d + init.weight→bn→relu

conv2d + init.weight and bias→ relu

conv→relu→..........→conv→relu

补充：pytorch的torch.nn.init 对参数和偏置初始化方式

二、Dropout 层(2012年)与BN层 Dropout 层是否有效

一、batch norm、relu、dropout 等的相对顺序

在 Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift 一文中，作者指出，“we would like to ensure that for any parameter values, the network always produces activations with the desired distribution”（produces activations with the desired distribution，为激活层提供期望的分布）。

因此 Batch Normalization 层恰恰插入在 Conv 层或全连接层之后，而在 ReLU等激活层之前。而对于 dropout 则应当置于 activation layer 之后。

-> CONV/FC -> BatchNorm -> ReLu(or other activation) -> Dropout -> CONV/FC ->；

那么看到我的有几种方式，全部未用dropout：

conv2d + init.weight→bn→relu
conv2d + init.weight and bias→ relu
conv→relu→..........→conv→relu

第一种在ResNet.py

先在结构层定义了 conv2d、bn和并对conv2d进行nn.init.kaiming_uniform_(l.weight, a=1)处理。
然后结合在forword中看到结构是：conv2d + init.weight→bn→relu

        self.conv1 = Conv2d(
            in_channels,
            bottleneck_channels,
            kernel_size=1,
            stride=stride_1x1,
            bias=False,
        )
        self.bn1 = norm_func(bottleneck_channels)
      
        self.conv2 = Conv2d(
            bottleneck_channels,
            bottleneck_channels,
            kernel_size=3,
            stride=stride_3x3,
            padding=dilation,
            bias=False,
            groups=num_groups,
            dilation=dilation
        )
        self.bn2 = norm_func(bottleneck_channels)

        self.conv3 = Conv2d(
            bottleneck_channels, out_channel