BERT--学习

yayakoko

已于 2024-07-02 12:41:33 修改

阅读量390

点赞数 2

文章标签： bert 深度学习人工智能 pytorch

于 2024-07-01 17:14:38 首次发布

本文链接：https://blog.csdn.net/yayakoko/article/details/140098231

版权

一、Transformer

Transformer，是由编码块和解码块两部分组成，其中编码块由多个编码器组成，解码块同样也是由多个解码块组成。

编码器：自注意力 + 全连接

多头自注意力：Q、K、V
公式： $softmax(\frac{QK^{T}}{\sqrt{d_{k}}})V$

解码块：自注意力 + 编码 - 解码自注意力 +全连接

多头自注意力： $softmax(\frac{QK^{T}\bigodot M}{\sqrt{d_{k}}})V$
编码—解码自注意力：Q上个解码器的输出

K、V最后一个编码器输出

二、BERT

bert，是由Transformer的多个编码器组成。
Base ：12层编码器，每个编码器有12个多头，隐藏维度为768。
Large： 24层编码器，每个编码器16个头，隐层维度为1024
bert结构：

import torch
class MultiHeadAttention(nn.Module):
    def__init__(self,hidden_size,head_num):
        super().__init__()
        self.head_size = hidden_size / head_num
        self.query = nn.Linear(hidden_size, hidden_size)
        self.key = nn.Linear(hidden_size, hidden_size)
        self.value = nn.Linear(hidden_size, hidden_size)
    def transpose_dim(self,x):
        x_new_shape = x.size()[:-1]+(self.head_num, head_size)
        x = x.view(*x_new_shape)
        return x.permute(0,2,1,3)

    def forward(self,x,attention_mask):
        Quary_layer = self.query(x)
        Key_layer = self.key(x)
        Value_layer = self.value(x)

        '''
        B = Quary_layer.shape[0]
        N = Quary_layer.shape[1]
        multi_quary = Quary_layer.view(B,N,self.head_num,self.head_size).transpose(1,2)
        '''
        
        multi_quary =self.transpose_dim(Quary_layer)
        multi_key =self.transpose_dim(Key_layer)
        multi_value =self.transpose_dim(Value_layer)

        attention_scores = torch.matmul(multi_quary, multi_key.transpose(-1,-2))
        attention_scores = attention_scores / math.sqrt(self.head_size)

        attention_probs = nn.Softmax(dim=-1)(attention_scores) 
        context_layer = torch.matmul(attention_probs,values_layer)
        context_layer = context_layer.permute(0,2,1,3).contiguous()
        context_layer_shape =  context_layer.size()[:-2]+(self.hidden_size)
        context_layer = cotext_layer.view(*context_layer_shape 

        return context_layer