【激活函数gelu relu 原理和实现代码】

Ai君臣

于 2024-08-12 14:23:20 发布

阅读量171

点赞数 3

文章标签：深度学习 python pytorch 激活函数

本文链接：https://blog.csdn.net/liuchenbaidu/article/details/141070709

版权

激活函数gelu relu

GELU (Hendrycks and Gimpel 2016)用多种实现；其精确版本定义为 $GELU(x)=x\cdot \phi(x)$ ，其中 $\phi(x)$ 是标准高斯分布的累积分布函数。
在实际应用中，常常采用计算成本较低的近似形式： $\text{GELU}(x) \approx 0.5 \cdot x \cdot \left(1 + \tanh\left[\sqrt{\frac{2}{\pi}} \cdot \left(x + 0.044715 \cdot x^3\right)\right]\right)$ （原始的GPT-2模型也是使用这个近似形式进行训练的）。

class GELU(nn.Module):
    def __init__(self):
        super().__init__()

    def forward(self, x):
        return 0.5 * x * (1 + torch.tanh(
            torch.sqrt(torch.tensor(2.0 / torch.pi)) * 
            (x + 0.044715 * torch.pow(x, 3))
        ))
import matplotlib.pyplot as plt

gelu, relu = GELU(), nn.ReLU()

x = torch.linspace(-3, 3, 100)
y_gelu, y_relu = gelu(x), relu(x)

plt.figure(figsize=(8, 3))
for i, (y, label) in enumerate(zip([y_gelu, y_relu], ["GELU", "ReLU"]), 1):
    plt.subplot(1, 2, i)
    plt.plot(x, y)
    plt.title(f"{label} activation function")
    plt.xlabel("x")
    plt.ylabel(f"{label}(x)")
    plt.grid(True)

plt.tight_layout()
plt.show()