神经网络常见激活函数 10-GELU函数

最新推荐文章于 2025-02-24 23:07:37 发布

亲持红叶

最新推荐文章于 2025-02-24 23:07:37 发布

阅读量1k

点赞数 26

分类专栏：神经网络常见激活函数文章标签：神经网络 neo4j 人工智能

本文链接：https://blog.csdn.net/hbkybkzw/article/details/145621594

版权

神经网络常见激活函数专栏收录该内容

12 篇文章

订阅专栏

GELU

高斯误差线性单元（Gaussian error linear unit）

函数+导函数

GELU函数的公式 (近似表达式)
$\rm GELU(x) = x*P(X<=x)=x*\Phi(x)$
其中 $\Phi(x)$ 指的是 x 的高斯正太分布的累积分布函数(CDF),进一步地，可得该函数的具体表达为
$\int^x_{-\infty} \frac{e^{-\frac{(X-\mu)^2}{2\sigma^2}}}{\sqrt{2\pi}{}\sigma} dX$

其中 $\mu ,\sigma$ 分别代表正太分布的均值和方差，由于上面这个函数是无法直接计算的，所以研究证在研究过程中发现可以被近似的表示为
$\rm GELU(x) = 0.5 \times x \times \left( 1 + \tanh{\left[\sqrt{\frac{2}{\pi}}{} \times (x+0.044715x^3)\right]} \right)$

或者
$\rm GELU(x) = x * \sigma(1.702x)$
其中 $\sigma(x) $ 代表 Sigmoid 函数。

GELU函数的公式 (误差函数 erf 的表达式)
$\begin{aligned} \rm GELU(x) &= x\cdot\Phi(x) \\ &=x\cdot\frac{1}{2}(1+erf(\frac{x}{\sqrt{2}{}})) \end{aligned}$
其中 $\Phi(x)$ 指的是 x 的高斯正太分布的累积分布函数(CDF)， $er f (x)$ 是误差函数
$\rm erf(x) = \frac{2}{\sqrt{\pi}{}} \int^x_0e^{-t^2} dt$

GELU函数导数

已知 $\phi(x)$ 是标准正太分布的概率密度函数 (PDF)
$\phi(x) = \frac{1}{\sqrt{2\pi}{}} e^{-\frac{x^2}{2}}$

以及 $\Phi(x)$
$\Phi(x) = x\cdot\frac{1}{2}(1+erf(\frac{x}{\sqrt{2}{}}))$

则
$\begin{aligned} \frac{d}{dx} \rm GELU(x) & = (x\cdot\Phi(x))' \\ &=\Phi(x) + x \cdot(\Phi'(x)) \\ &=\Phi(x) + x \cdot \frac{1}{\sqrt{2\pi}} e^{-\frac{1}{x^2}} \\ &= \Phi(x) + x\cdot\phi(x) \end{aligned}$

函数和导函数图像

画图

import numpy as np
from scipy.special import erf
import matplotlib.pyplot as plt

# 定义 GELU 函数
def gelu(x):
    return 0.5 * x * (1 + erf(x / np.sqrt(2)))  # 使用误差函数 erf 计算[^1^][^3^]

# 定义 GELU 的导数
def gelu_derivative(x):
    return 0.5 * (1 + erf(x / np.sqrt(2))) + (x / np.sqrt(2 * np.pi)) * np.exp(-x ** 2 / 2)  # 导数公式[^3^]

# 生成数据
x = np.linspace(-5, 5, 1000)
y = gelu(x)
y1 = gelu_derivative(x)

# 绘制图形
plt.figure(figsize=(12, 8))
ax = plt.gca()
plt.plot(x, y, label='GELU')
plt.plot(x, y1, label='Derivative')
plt.title('GELU and Derivative')

# 设置上边和右边无边框
ax.spines['right'].set_color('none')
ax.spines['top'].set_color('none')

# 设置 x 坐标刻度数字或名称的位置
ax.xaxis.set_ticks_position('bottom')

# 设置边框位置
ax.spines['bottom'].set_position(('data', 0))
ax.yaxis.set_ticks_position('left')
ax.spines['left'].set_position(('data', 0))

plt.legend(loc=2)
plt.show()

GELU

优缺点

GELU 的优点
1. 当方差为无穷大，均值为 0 的时候，GeLU就等价于ReLU了。GELU可以当作为RELU的一种平滑策略。GELU是非线性输出，具有一定的连续性。GELU有一个概率解释，因为它是一个随机正则化器的期望。
2. GELU的实用技巧。首先，建议在使用GELU训练时使用具有动量的优化器，这是深度神经网络的标准。其次，使用对高斯分布的累积分布函数的密切近似是很重要的
GELU 的缺点
1. 计算复杂度增加：GELU的计算比ReLU等更简单的选择更为复杂，因为它涉及误差函数或其近似值。
2. 性能依赖于任务：激活函数的有效性可能因任务和数据集而异。
3. 复杂性导致解释性降低：由于其复杂的行为，可能更难以解释和调试。

pytorch中的GELU函数

代码

import torch

# 定义 GELU 函数
f = torch.nn.GELU()  # PyTorch 提供的 GELU 激活函数模块
x = torch.randn(2)   # 生成一个随机张量作为输入

gelu_x = f(x)        # 应用 GELU 函数

print(f"x: \n{x}")
print(f"gelu_x:\n{gelu_x}")

"""输出"""
x: 
tensor([ 1.6743, -1.2534])
gelu_x:
tensor([ 1.5956, -0.1316])

tensorflow 中的GELU函数

代码

python: 3.10.9

tensorflow: 2.18.0

import tensorflow as tf

# 创建 GELU 激活函数
gelu = tf.keras.activations.gelu

# 生成随机输入
# x = tf.random.normal([2])
x = [ 1.6743, -1.2534]

# 应用 GELU 激活函数
gelu_x = gelu(x)

print(f"x: \n{x}")
print(f"gelu_x:\n{gelu_x}")

"""输出"""
x: 
[1.6743, -1.2534]
gelu_x:
[ 1.5955479  -0.13164471]