笔记小结：常见的激活函数一览-CSDN博客

本文链接：https://blog.csdn.net/qq_62827972/article/details/140323177

本文均基于torch实现，用于个人复习记录学习历程，适用于初学者

ReLU函数

最受欢迎的激活函数是修正线性单元（Rectified linear unit，ReLU），因为它实现简单，同时在各种预测任务中表现良好。 [ReLU提供了一种非常简单的非线性变换]。给定元素𝑥，ReLU函数被定义为该元素与0的最大值：

(ReLU(𝑥)=max(𝑥,0).)

通俗地说，ReLU函数通过将相应的活性值设为0，仅保留正元素并丢弃所有负元素。为了直观感受一下，我们可以画出函数的曲线图。正如从图中所看到，激活函数是分段线性的。

%matplotlib inline
import torch
from matplotlib.pyplot import plot

x = torch.arange(-8.0, 8.0, 0.1, requires_grad=True)
y = torch.relu(x)
plot(x.detach(), y.detach())

下面我们绘制ReLU函数的导数。

x.grad.zero_()
y.backward(torch.ones_like(x), retain_graph=True)
plot(x.detach(), x.grad)

sigmoid函数

sigmoid通常称为挤压函数（squashing function）：它将范围（-inf, inf）中的任意输入压缩到区间（0, 1）中的某个值：

$sigmoid(x)=\frac{1}{1+e^{-x}}$

y = torch.sigmoid(x)
plot(x.detach(), y.detach())

sigmoid函数的导数为下面的公式：

$\frac{d}{dx}sigmoid(x)=\frac{exp(-x)}{(1+exp(-x))^{2}}=sigmoid(x)(1-sigmoid(x))$

sigmoid函数的导数图像如下所示。注意，当输入为0时，sigmoid函数的导数达到最大值0.25；而输入在任一方向上越远离0点时，导数越接近0。

x.grad.data.zero_()
y.backward(torch.ones_like(x),retain_graph=True)
plot(x.detach(), x.grad)

tanh函数

与sigmoid函数类似，tanh(双曲正切)函数也能将其输入压缩转换到区间(-1, 1)上。
tanh函数的公式如下：

$\operatorname{tanh}(x) = \frac{1 - \exp(-2x)}{1 + \exp(-2x)}$

y = torch.tanh(x)
plot(x.detach(), y.detach())

tanh函数的导数是：

$\frac{d}{dx} \operatorname{tanh}(x) = 1 - \operatorname{tanh}^2(x)$

tanh函数的导数图像如下所示。当输入接近0时，tanh函数的导数接近最大值1。与我们在sigmoid函数图像中看到的类似，输入在任一方向上越远离0点，导数越接近0。

x.grad.data.zero_()
y.backward(torch.ones_like(x),retain_graph=True)
plot(x.detach(), x.grad)