GLU（Gated Linear Unit，门控线性单元）简介

最新推荐文章于 2025-04-05 12:21:01 发布

coder1479

最新推荐文章于 2025-04-05 12:21:01 发布

阅读量2.7w

点赞数 30

分类专栏：深度学习文章标签： pytorch 深度学习

本文链接：https://blog.csdn.net/m0_48742971/article/details/123431686

版权

深度学习专栏收录该内容

29 篇文章

订阅专栏

前言

简单介绍门控线性单元的结构。

原始论文

《Language Modeling with Gated Convolutional Networks》提出GLU，2017年（第一版是2016年）

网络结构

下图的结构只是其中一层，可以堆叠。
在这里插入图片描述

计算公式

每一层的计算公式如下。
在这里插入图片描述

其中：
X代表输入。
W、V、b、c都是要学习的参数。
$\sigma$ 在原论文中是sigmoid函数。
⊗是对应元素相乘（element-wise product），也称为哈达玛积(Hadamard product)。

从公式可以看到，输入X分两路，其中一路的运算结果不做处理，另一路则经过激活函数。

Pytorch文档

torch.nn.GLU(dim=-1)

GLU(a,b)=a⊗σ(b)  

（先随便读一下文档，后面有代码示例和详细解释）

参数
dim (int) – the dimension on which to split the input. Default: -1
指定从哪个维度对input进行拆分，默认值：-1

参数形状
Input: (∗1,N,∗2) where * means, any number of additional dimensions
Output: (∗1,M,∗2) where M=N/2

需要说明，如果用pytorch的GLU模型计算原论文中GLU层，需要自己构造a和b两个部分。

代码示例

理解参数dim的作用

1. 不设置dim，使用默认值-1。


>>> m = nn.GLU()  # dim默认是-1
>>> input = torch.randn(4, 2)
>>> input
tensor([[ 0.4562,  0.7670],
        [ 1.7934,  0.7769],
        [-0.3021, -0.1275],
        [-1.4728,  0.7495]])
>>> output = m(input)
>>> output
tensor([[ 0.3115],
        [ 1.2285],
        [-0.1414],
        [-1.0001]])

解释：dim=-1，意味着最后一个维度，对于二维矩阵，也就是列的维度，所以是按列拆分。

2. 设置dim=0。

>>> input = torch.randn(4, 3)
>>> m = nn.GLU(dim=0)    # dim=0
>>> output = m(input)
>>> output
tensor([[-0.9414, -0.0830, -0.5450],
        [-0.1251, -1.1556,  0.6469]])