前馈全连接层
什么是前馈全连接层:
在Transformer中前馈全连接层就是具有两层线性层的全连接网络
前馈全连接层的作用:
考虑注意力机制可能对复杂过程的拟合程度不够,通过增加两层网络来增强模型的能力
code
# 前馈全连接层
class PositionwiseFeedForward(nn.Module):
def __init__(self, d_model, d_ff,dropout=0.1) -> None:
"""
d_mode :第一个线下层的输入维度
d_ff :隐藏层的维度
dropout:
"""
super(PositionwiseFeedForward,self).__init__()
self.line1 = nn.Linear(d_model,d_ff)
self.line2 = nn.Linear(d_ff,d_model)
self.dropout = nn.Dropout(dropout)
def forward(self,x):
return self.line2(self.dropout(F.relu(self.line1(x))))
测试:
输出
ff_result.shape = torch.Size([2, 4, 512])
ff_result = tensor([[[-0.0589, -1.3885, -0.8852, ..., -0.4463, -0.9892, 2.7384],
[ 0.2426, -1.1040, -1.1298, ..., -0.9296, -1.5262, 1.0632],
[ 0.0318, -0.8362, -0.9389, ..., -1.6359, -1.8531, -0.1163],
[ 1.1119, -1.2007, -1.5487, ..., -0.8869, 0.1711, 1.7431]],
[[-0.2358, -0.9319, 0.8866, ..., -1.2987, 0.2001, 1.5415],
[-0.1448, -0.7505, -0.3023, ..., -0.2585, -0.8902, 0.6206],
[ 1.8106, -0.8460, 1.6487, ..., -1.1931, 0.0535, 0.8415],
[ 0.2669, -0.3897, 1.1560, ..., 0.1138, -0.2795, 1.8780]]],
grad_fn=<ViewBackward0>)
规范化层
规范化层的作用:
它是所有深层网络模型都需要的标准网络层,因为随着网络层数的增加,通过多层的计算后参数可能开始出现过大或过小的情况,这样可能会导致学习过程出现异常,模型可能收敛非常的慢,因此都会在一定层数后接规范化层进行数值的规范化,使其特征数值在合理范围内.
code
class LayerNorm(nn.Module):
def __init__(self,features,eps=1e-6) -> None:
# features 词嵌入的维度
# eps 足够小的数据,防止除0,放到分母上
super(LayerNorm,self).__init__()
# 规范化层的参数,后续训练使用的
self.w = nn.parameter(torch.ones(features))
self.b = nn.Parameter(torch.zeros(features))
self.eps = eps
def forward(self, x):
mean = x.mean(-1,keepdim=True)
stddev = x.std(-1,keepdim=True)
# * 代表对应位置进行相乘,不是点积
return self.w*(x-mean)/(stddev,self.eps) + self.b
test
输出:
ln_result.shape = torch.Size([2, 4, 512])
ln_result = tensor([[[ 1.3255e+00, 7.7968e-02, -1.7036e+00, ..., -1.3097e-01,
4.9385e-01, 1.3975e-03],
[-1.0717e-01, -1.8999e-01, -1.0603e+00, ..., 2.9285e-01,
1.0337e+00, 1.0597e+00],
[ 1.0801e+00, -1.5308e+00, -1.6577e+00, ..., -1.0050e-01,
-3.7577e-02, 4.1453e-01],
[ 4.2174e-01, -1.1476e-01, -5.9897e-01, ..., -8.2557e-01,
1.2285e+00, 2.2961e-01]],
[[-1.3024e-01, -6.9125e-01, -8.4373e-01, ..., -4.7106e-01,
2.3697e-01, 2.4667e+00],
[-1.8319e-01, -5.0278e-01, -6.6853e-01, ..., -3.3992e-02,
-4.8510e-02, 2.3002e+00],
[-5.7036e-01, -1.4439e+00, -2.9533e-01, ..., -4.9297e-01,
9.9002e-01, 9.1294e-01],
[ 2.8479e-02, -1.2107e+00, -4.9597e-01, ..., -6.0751e-01,
3.1257e-01, 1.7796e+00]]], grad_fn=<AddBackward0>)
子层连接结构
什么是子层连接结构:
如图所示,输入到每个子层以及规范化层的过程中,还使用了残差链接(跳跃连接),因此我们把这一部分结构整体叫做子层连接(代表子层及其链接结构),在每个编码器层中,都有两个子层,这两个子层加上周围的链接结构就形成了两个子层连接结构.
code
# 子层连接结构
class SublayerConnection(nn.Module):
def __init__(self,size,dropout=0.1