DY-ReLU-ECCV2020-性价比极高的激活函数 | Dynamic ReLU

最新推荐文章于 2023-06-28 14:00:23 发布

chenzy_hust

最新推荐文章于 2023-06-28 14:00:23 发布

阅读量2k

点赞数 3

分类专栏：注意力机制

本文链接：https://blog.csdn.net/weixin_42096202/article/details/109727151

版权

注意力机制专栏收录该内容

7 篇文章 14 订阅

订阅专栏

很早之前就出来的文章，简单mark一下
论文地址：https://arxiv.org/pdf/2003.10027.pdf

Abstract：

Rectified linear units (ReLU)通常在深度神经网络中使用。到目前为止，ReLU及其衍生版本（非参数或参数）都是静态的，对所有输入样本无差别。在本文中，我们提出了动态ReLU（DY-ReLU），这是一种动态修正器，其参数由超函数在所有输入元素上生成。关键见解是DY-ReLU将全局上下文编码为超函数，并相应地调整了分段线性激活函数。与静态的版本相比，DY-ReLU的额外计算成本可忽略不计，但表示能力却明显提高，尤其是对于轻量级神经网络而言。只需将DY-ReLU用于MobileNetV2，ImageNet分类的前1位准确性就可以从72.0％提高到76.2％，而仅增加5％的FLOPs。

Introduction:

现有的激活函数如下图所示，常用的Relu及其衍生版本均是对所有输入表现相同，这种静态的处理方法，让作者提出疑问：激活函数应该是静态的还是应该为动态的?
在这里插入图片描述
因此，针对上述疑问，本文提出了动态的ReLU激活函数(Dynamic ReLU),一种参数化分片线性函数它参数通过辅助函数计算得到。下图给出了该动态激活函数示意图，其核心观点在于：通过辅助函数编码输入的全局上下文信息并用于指导后续的分片线性激活函数。
在这里插入图片描述

Dynamic ReLU：

A.DY-ReLU
在这里插入图片描述
其中主要包含辅助和激活两个函数：

辅助函数Q(x)：计算激活函数的参数，
激活函数F-q(x): 用于计算输入的激活输出，它的参数通过上述辅助函数生成

B.Variations of Dynamic ReLU

本文总共提出了3种类型的DyReLU：
1.DyReLUA：跨空间与通道共享，

2.DyReLUB：跨空间共享，通道不共享

3.DyReLUC：空间与通道均不共享。
在这里插入图片描述
作者通过实验得出以下几点发现：

1.DyReLUB与DyReLUC更适合于图像分类任务；

2.DyReLUB与DyReLUC更适合于关键点检测的骨干网络，而DyReLUC更适合于关键点检测的head网络;

3.在图像分类方面，DyReLU在MobileNetV2的嵌入应用可以得到4.2% 的性能提升；

4.在关键点检测方面，DyReLU的应用可以得到3.5AP的性能提升。

代码：

import torch
import torch.nn as nn


class DyReLU(nn.Module):
    def __init__(self, channels, reduction=4, k=2, conv_type='2d'):
        super(DyReLU, self).__init__()
        self.channels = channels
        self.k = k
        self.conv_type = conv_type
        assert self.conv_type in ['1d', '2d']

        self.fc1 = nn.Linear(channels, channels // reduction)
        self.relu = nn.ReLU(inplace=True)
        self.fc2 = nn.Linear(channels // reduction, 2 * k)
        self.sigmoid = nn.Sigmoid()

        self.register_buffer('lambdas', torch.Tensor([1.] * k + [0.5] * k).float())
        self.register_buffer('init_v', torch.Tensor([1.] + [0.] * (2 * k - 1)).float())

    def get_relu_coefs(self, x):
        theta = torch.mean(x, axis=-1)
        if self.conv_type == '2d':
            theta = torch.mean(theta, axis=-1)
        theta = self.fc1(theta)
        theta = self.relu(theta)
        theta = self.fc2(theta)
        theta = 2 * self.sigmoid(theta) - 1
        return theta

    def forward(self, x):
        raise NotImplementedError


class DyReLUA(DyReLU):
    def __init__(self, channels, reduction=4, k=2, conv_type='2d'):
        super(DyReLUA, self).__init__(channels, reduction, k, conv_type)
        self.fc2 = nn.Linear(channels // reduction, 2 * k)

    def forward(self, x):
        assert x.shape[1] == self.channels
        theta = self.get_relu_coefs(x)

        relu_coefs = theta.view(-1, 2 * self.k) * self.lambdas + self.init_v
        # BxCxL -> LxCxBx1
        x_perm = x.transpose(0, -1).unsqueeze(-1)
        output = x_perm * relu_coefs[:, :self.k] + relu_coefs[:, self.k:]
        # LxCxBx2 -> BxCxL
        result = torch.max(output, dim=-1)[0].transpose(0, -1)

        return result


class DyReLUB(DyReLU):
    def __init__(self, channels, reduction=8, k=2, conv_type='2d'):
        super(DyReLUB, self).__init__(channels, reduction, k, conv_type)
        self.fc2 = nn.Linear(channels // reduction, 2 * k * channels)

    def forward(self, x):
        assert x.shape[1] == self.channels
        theta = self.get_relu_coefs(x)

        relu_coefs = theta.view(-1, self.channels, 2 * self.k) * self.lambdas + self.init_v

        if self.conv_type == '1d':
            # BxCxL -> LxBxCx1
            x_perm = x.permute(2, 0, 1).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]
            # LxBxCx2 -> BxCxL
            result = torch.max(output, dim=-1)[0].permute(1, 2, 0)

        elif self.conv_type == '2d':
            # BxCxHxW -> HxWxBxCx1
            x_perm = x.permute(2, 3, 0, 1).unsqueeze(-1)
            output = x_perm * relu_coefs[:, :, :self.k] + relu_coefs[:, :, self.k:]
            # HxWxBxCx2 -> BxCxHxW
            result = torch.max(output, dim=-1)[0].permute(2, 3, 0, 1)

        return result
        
class DyReLUC(nn.Module):
    def __init__(self,
                 channels,
                 reduction=4,
                 k=2,
                 tau=10,
                 gamma=1/3):
        super().__init__()
        self.channels = channels
        self.reduction = reduction
        self.k = k
        self.tau = tau
        self.gamma = gamma

        self.coef = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Conv2d(channels, channels // reduction, 1),
            nn.ReLU(),
            nn.Conv2d(channels // reduction, 2 * k * channels, 1),
            nn.Sigmoid()
        )
        self.sptial = nn.Conv2d(channels, 1, 1)

        # default parameter setting
        # lambdaA = 1.0, lambdaB = 0.5;
        # alphaA1 = 1, alphaA2=alphaB1=alphaB2=0
        self.register_buffer('lambdas', torch.Tensor([1.] * k + [0.5] * k).float())
        self.register_buffer('bias', torch.Tensor([1.] + [0.] * (2 * k - 1)).float())

    def forward(self, x):
        N, C, H, W = x.size()
        coef = self.coef(x)
        coef = 2 * coef - 1

        # coefficient update
        coef = coef.view(-1, self.channels, 2 * self.k) * self.lambdas + self.bias

        # spatial
        gamma = self.gamma * H * W
        spatial = self.sptial(x)
        spatial = spatial.view(N, self.channels, -1) / self.tau
        spatial = torch.softmax(spatial, dim=-1) * gamma
        spatial = torch.clamp(spatial, 0, 1).view(N, 1, H, W)

        # activations
        # NCHW --> HWNC1
        x_perm = x.permute(2, 3, 0, 1).unsqueeze(-1)
        # HWNC1 * NCK --> HWNCK
        output = x_perm * coef[:, :, :self.k] + coef[:, :, self.k:]

        # permute spatial from NCHW to HWNC1
        spatial = spatial.permute(2, 3, 0, 1).unsqueeze(-1)
        output = spatial * output

        # maxout and HWNC --> NCHW
        result = torch.max(output, dim=-1)[0].permute(2, 3, 0, 1)
        return result

Experiments：

1.vs RELU及其衍生版本：
在这里插入图片描述
2.DY-RELU提升：

chenzy_hust

关注

3
点赞
踩
26

收藏

觉得还不错? 一键收藏
6
评论
DY-ReLU-ECCV2020-性价比极高的激活函数 | Dynamic ReLU

很早之前就出来的文章，简单mark一下论文地址：https://arxiv.org/pdf/2003.10027.pdfAbstract：Rectified linear units (ReLU)通常在深度神经网络中使用。到目前为止，ReLU及其衍生版本（非参数或参数）都是静态的，对所有输入样本无差别。在本文中，我们提出了动态ReLU（DY-ReLU），这是一种动态修正器，其参数由超函数在所有输入元素上生成。关键见解是DY-ReLU将全局上下文编码为超函数，并相应地调整了分段线性激活函数。.
复制链接

扫一扫