Bipolar ReLU激活函数
论文链接:[Shifting Mean Activation Towards Zero with Bipolar Activation Functions]
年份:2018
介绍
Bipolar ReLU是对ReLU的一个扩展,允许将一层的平均激活向0移动,结合适当的权值初始化,可以增强BatchNorm 层的工作需求。在没有进行批归一化的CNN中,Bipolar ReLU可以加强训练时误差的快速下降。
在Bipolar ReLU提出之前,ReLU的一些变形已经被提出,将负区域输出零值替换为输出负值,从而允许平均激活值更为接近于零。
方法
在神经网络中,ReLU只保留其正输入,从而将平均激活向正方向偏移,然而,当输入为负值时,则输出的平均激活值将不会保持0中心化,由此本文定义Bipolar ReLU为:
f
(
x
i
)
=
{
f
(
x
i
)
,
i
m
o
d
2
=
0
−
f
(
−
x
i
)
,
i
m
o
d
2
≠
0
f(x_i) = \begin{cases} f(x_i), & i \mod 2 = 0\\ -f(-x_i), & i \mod 2 \neq 0 \end{cases}
f(xi)={f(xi),−f(−xi),imod2=0imod2=0
对于卷积层,将激活函数翻转一半的特征映射。Bipolar ReLU确保了无论输入为何值,输出的平均激活值会向零偏移。
下图为Bipolar ReLU的曲线:
下图为Bipolar ELU的曲线:
Bipolar ReLU或Bipolar ELU具有更稳定的动力学,不容易出现均值和方差爆炸。
Pytorch 代码
import torch
from torch.autograd import Variable,Function
import torch.nn as nn
from torch.nn.parameter import Parameter # import Parameter to create custom activations with learnable parameters
from torch import optim # import optimizers for demonstrations
import torch.nn.functional as F # import torch functions
from torchvision import datasets, transforms
# Implementation of BReLU activation function with custom backward step
class brelu(Function):
'''
Implementation of BReLU activation function.
Shape:
- Input: (N, *) where * means, any number of additional
dimensions
- Output: (N, *), same shape as the input
References:
- See BReLU paper:
https://arxiv.org/pdf/1709.04054.pdf
Examples:
>>> brelu_activation = brelu.apply
>>> t = torch.randn((5,5), dtype=torch.float, requires_grad = True)
>>> t = brelu_activation(t)
'''
#both forward and backward are @staticmethods
@staticmethod
def forward(ctx, input):
"""
In the forward pass we receive a Tensor containing the input and return
a Tensor containing the output. ctx is a context object that can be used
to stash information for backward computation. You can cache arbitrary
objects for use in the backward pass using the ctx.save_for_backward method.
"""
ctx.save_for_backward(input) # save input for backward pass
# get lists of odd and even indices
input_shape = input.shape[0]
even_indices = [i for i in range(0, input_shape, 2)]
odd_indices = [i for i in range(1, input_shape, 2)]
# clone the input tensor
output = input.clone()
# apply ReLU to elements where i mod 2 == 0
output[even_indices] = output[even_indices].clamp(min=0)
# apply inversed ReLU to inversed elements where i mod 2 != 0
output[odd_indices] = 0 - output[odd_indices] # reverse elements with odd indices
output[odd_indices] = - output[odd_indices].clamp(min = 0) # apply reversed ReLU
return output
@staticmethod
def backward(ctx, grad_output):
"""
In the backward pass we receive a Tensor containing the gradient of the loss
with respect to the output, and we need to compute the gradient of the loss
with respect to the input.
"""
grad_input = None # set output to None
input, = ctx.saved_tensors # restore input from context
# check that input requires grad
# if not requires grad we will return None to speed up computation
if ctx.needs_input_grad[0]:
grad_input = grad_output.clone()
# get lists of odd and even indices
input_shape = input.shape[0]
even_indices = [i for i in range(0, input_shape, 2)]
odd_indices = [i for i in range(1, input_shape, 2)]
# set grad_input for even_indices
grad_input[even_indices] = (input[even_indices] >= 0).float() * grad_input[even_indices]
# set grad_input for odd_indices
grad_input[odd_indices] = (input[odd_indices] < 0).float() * grad_input[odd_indices]
return grad_input
- 通过简单的试验发现Bipolar ReLU的损失下降速度较慢,没有文章中说的那么好