Tensorflow2.0之Bert模型实例实体命名识别（NER）

最新推荐文章于 2024-08-09 08:21:36 发布

辰溪0502

最新推荐文章于 2024-08-09 08:21:36 发布

阅读量2.8k

点赞数 2

文章标签：深度学习 tensorflow 神经网络

本文链接：https://blog.csdn.net/weixin_43788143/article/details/108270656

版权

关于Tensorflow2.0版本的Bert模型我在网上找了很久也没找到。大家应该都知道Bert模是用了Transformer模型的Encoder部分。并且我找到了Tensorflow2.0版本下的Transformer模型而且还做了个中英翻译所以我就把Tansformer模型稍微该了下，把Decoder部分去掉只剩下Encoder部分，并找了一些数据做了一个实体命名识别的例子。最后模型训练完准确度在86%左右，我感觉还可以就拿出来分享下。但这不一定完全正确如有不对希望大家指正。

数据预处理

数据预处理大家可以看这篇。我提前已经把数据预处理做好了就等Bert模型完善以后就可以运行了。

代码简介

一位置信息
我们知道Transformer优势就是不受时间和空间上的限制。它是在同一时间对每个向量进行不同的加权处理。所以它对向量的位置是没有要求的。但实际上：我从北京到上海我从上海到北京这两句话如果没有位置要求的话是一样的但显然这两句话的意义有很大的差距。所以我们要在文本向量加上其位置编码.同样Bert模型也要加上位置信息
在这里插入图片描述
二. 构建掩码

三.self-attention 和 Mutil-Head Attention

在这里插入图片描述

四.编码器
编码器和解码器都是有多个编码层和解码层堆叠而来的，代码太多就不一一展示了。

class EncoderLayer(tf.keras.layers.Layer):
    def __init__(self, d_model, n_heads, ddf, dropout_rate=0.1):
        super(EncoderLayer, self).__init__()

        self.mha = MutilHeadAttention(d_model, n_heads)
        self.ffn = point_wise_feed_forward_network(d_model, ddf)

        self.layernorm1 = LayerNormalization(epsilon=1e-6)
        self.layernorm2 = LayerNormalization(epsilon=1e-6)

        self.dropout1 = tf.keras.layers.Dropout(dropout_rate)
        self.dropout2 = tf.keras.layers.Dropout(dropout_rate)

    def call(self, inputs, training, mask):
        # 多头注意力网络
        att_output, _ = self.mha(inputs, inputs, inputs, mask)
        att_output = self.dropout1(att_output, training=training)
        out1 = self.layernorm1(inputs + att_output)  # (batch_size, input_seq_len, d_model)
        # 前向网络
        ffn_output = self.ffn(out1)
        ffn_output = self.dropout2(ffn_output, training=training)
        out2 = self.layernorm2(out1 + ffn_output)  # (batch_size, input_seq_len, d_model)
        return out2

五整合

class BERT(tf.keras.Model):
    def __init__(self, n_layers, d_model, n_heads, diff,
                input_vocab_size, target_vocab_size,
                max_seq_len, drop_rate=0.1):
        super(BERT, self).__init__()

        self.encoder = Encoder(n_layers, d_model, n_heads,diff,
                              input_vocab_size, max_seq_len, drop_rate)


        self.final_layer = tf.keras.layers.Dense(target_vocab_size)
    def call(self, inputs,training, encode_padding_mask):

        encode_out = self.encoder(inputs, training, encode_padding_mask)
       # print(encode_out.shape)

        final_out = self.final_layer(encode_out)

        return final_out
        
   def train_step(inputs, targets):
    # 构造掩码
    encode_padding_mask = create_mask(inputs)


    with tf.GradientTape() as tape:
        predictions = BERT(inputs, True,encode_padding_mask)
        print(predictions[1])
        print(targets[1])
        loss = loss_fun(targets , predictions)
    # 求梯度
    gradients = tape.gradient(loss, BERT.trainable_variables)
    # 反向传播
    optimizer.apply_gradients(zip(gradients, BERT.trainable_variables))

    # 记录loss和准确率
    train_loss(loss)
    train_accuracy(targets, predictions)