手撕SwiGLU和GELU

Czi.

已于 2024-10-08 10:43:14 修改

阅读量646

点赞数 4

文章标签： prompt 深度学习机器学习

于 2024-10-08 10:29:08 首次发布

本文链接：https://blog.csdn.net/weixin_46460463/article/details/142753271

版权

GELU（Gaussian Error Linear Unit）：

公式：
$\text{GELU}(x) = x \cdot \Phi(x) = x \cdot \frac{1}{2}\left(1 + \text{erf}\left(\frac{x}{\sqrt{2}}\right)\right)$
近似公式（在实践中经常使用的版本）：
$\text{GELU}(x) \approx 0.5 \cdot x \cdot \left(1 + \tanh\left(\sqrt{\frac{2}{\pi}} \cdot (x + 0.044715 \cdot x^3)\right)\right)$

SwiGLU（Swish-Gated Linear Unit）：

公式：
$\text{SwiGLU}(x) = \sigma(\text{Linear}(x_1)) \cdot \text{Swish}(\text{Linear}(x_2))$
其中，Swish 是一个平滑的激活函数：
$\text{Swish}(x) = x \cdot \sigma(x) = \frac{x}{1 + e^{-x}}$

GELU 实现（PyTorch 内置）：

import torch
import torch.nn as nn

# GELU 激活函数 (PyTorch 内置)
gelu = nn.GELU()

# 输入张量
x = torch.randn(2, 5)
output = gelu(x)
print(output)

import torch
import torch.nn as nn

class GELUApprox(nn.Module):
    def forward(self, x):
        # GELU 近似实现
        return 0.5 * x * (1 + torch.tanh(torch.sqrt(torch.tensor(2.0 / torch.pi)) * (x + 0.044715 * x ** 3)))

# 示例
x = torch.randn(2, 5)
gelu_approx = GELUApprox()
output = gelu_approx(x)
print(output)

SwiGLU 实现：

import torch
import torch.nn as nn

class SwiGLU(nn.Module):
    def __init__(self, d_model):
        super(SwiGLU, self).__init__()
        # 两个线性层，用于将输入拆分成两部分
        self.linear1 = nn.Linear(d_model, d_model)
        self.linear2 = nn.Linear(d_model, d_model)
    
    def forward(self, x):
        return torch.sigmoid(self.linear1(x)) * torch.nn.functional.silu(self.linear2(x))  # SiLU 是 Swish 的实现

# 输入张量
x = torch.randn(2, 5)

# SwiGLU 激活函数
swiglu = SwiGLU(d_model=5)
output = swiglu(x)
print(output)