BERT+BiLSTM-CRF-NER用于做ner识别

最新推荐文章于 2025-03-17 13:35:56 发布

旭旭_哥

最新推荐文章于 2025-03-17 13:35:56 发布

阅读量5.9w

点赞数 17

分类专栏： python编程机器学习

本文链接：https://blog.csdn.net/luoyexuge/article/details/84728649

版权

机器学习同时被 2 个专栏收录

114 篇文章

订阅专栏

python编程

85 篇文章

订阅专栏

本文介绍如何将BERT模型应用于命名实体识别(NER)任务，通过替换传统的word2vec模型，利用BERT的上下文敏感特性提高NER准确性。文中详细描述了BERT与BiLSTM-CRF结合的实现过程，包括数据预处理、模型搭建及训练步骤。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

本周五快下班的时候看到别人写了个bert语言模型作为输入，用于做ner识别，后面可以是cnn或者直接是crf层，bert在这里作为word2vec模型的替换着，原始地址https://github.com/macanv/BERT-BiLSTM-CRF-NER，在这里需要注意的是TensorFlow版本需要1.9版本：

整理逻辑还是比较简单，别看谷歌写了那么多代码，实际就是把bert模型替换了原来网络的word2vec部分，然后用google训练好的bert模型对下游任务进行微调，google开源的代码大多数都是用estimator接口，你可以完全不用，具体逻辑是对原始你的数据转化为tfrecord的形式，dataset or 其他形式读取，然后使用bert模型进行embeding，然后加载google预训练的bert模型，得到embeding之后的tensor进入你的自己的网络，无论是cnn、rnn都可以，以后再也不需要tf.nn.embeding_lookup操作了，word2vec模型最大的缺点是就是在歧义词上效果较差，如苹果，可以是水果、公司、电影，在word2vec模型只要一个向量，无论前后文是什么，但是在bert里面可以是不同的向量，我已经用自己的代码逻辑实现了bert+lstm+crf用于做ner，



class   BertLstmNer(object):
    def __init__(self,bert_config, is_training, input_ids, input_mask,
                 segment_ids, labels, num_labels, use_one_hot_embeddings,init_checkpoint):
        self.bert_config=bert_config
        self.is_training=is_training
        self.input_ids=input_ids
        self.input_mask=input_mask
        self.segment_ids=segment_ids
        self.labels=labels
        self.num_labels=num_labels
        self.use_one_hot_embeddings=use_one_hot_embeddings
        self.init_checkpoint=init_checkpoint


        model = modeling.BertModel(
            config=self.bert_config,
            is_training=self.is_training,
            input_ids=self.input_ids,
            input_mask=self.input_mask,
            token_type_ids=self.segment_ids,
            use_one_hot_embeddings=self.use_one_hot_embeddings
        )
        # 获取对应的embedding 输入数据[batch_size, seq_length, embedding_size]
        embedding = model.get_sequence_output()
        max_seq_length = embedding.shape[1].value

        used = tf.sign(tf.abs(input_ids))
        lengths = tf.reduce_sum(used, reduction_indices=1)  # [batch_size] 大小的向量，包含了当前batch中的序列长度

        blstm_crf = BLSTM_CRF(embedded_chars=embedding, hidden_unit=FLAGS.lstm_size, cell_type=FLAGS.cell,
                              num_layers=FLAGS.num_layers,
                              droupout_rate=FLAGS.droupout_rate, initializers=initializers, num_labels=num_labels,
                              seq_length=max_seq_length, labels=labels, lengths=lengths, is_training=is_training)

        (self.total_loss, logits, trans, self.pred_ids) = blstm_crf.add_blstm_crf_layer()

        with tf.name_scope("train_op"):
            self.train_op = tf.train.AdamOptimizer().minimize(self.total_loss)

如果要运行我刚刚贴的GitHub中的的代码的话，tensorflow版本需要至少1.9的版本，用于其中依赖了一些tpu的接口，用的是estimator高阶接口，下面是一些基本操作：

In [1]: import tensorflow  as tf

 [2]: tf.__version__
Out[2]: '1.9.0'

其他低版本运行会报下面的错误：

AttributeError: module 'tensorflow.contrib.tpu' has no attribute 'InputPipelineConfig'

运行之前下载Google开源的中文语言模型：

https://storage.googleapis.com/bert_models/2018_11_03/chinese_L-12_H-768_A-12.zip

代码修改为：

else:
    bert_path = '/Users/zhoumeixu/Documents/python/BERT-BiLSTM-CRF-NER/data/chinese_L-12_H-768_A-12/'
    root_path = '/Users/zhoumeixu/Documents/python/BERT-BiLSTM-CRF-NER'

运行命令：

python bert_lstm_ner.py     --task_name="NER" --do_train=True --do_eval=True    --do_predict=True  --data_dir=NERdata      --max_seq_length=128     --train_batch_size=32       --learning_rate=2e-5  --num_train_epochs=3.0 --output_dir=./output/result_dir/

运行情况会先吧原始文件转化为tfrecord文件，所以要事先建立一个目录output/result_dir

运行过程：

INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/kernel:0, shape = (768, 3072), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/intermediate/dense/bias:0, shape = (3072,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/kernel:0, shape = (3072, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/beta:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/encoder/layer_11/output/LayerNorm/gamma:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/kernel:0, shape = (768, 768), *INIT_FROM_CKPT*
INFO:tensorflow:  name = bert/pooler/dense/bias:0, shape = (768,), *INIT_FROM_CKPT*
INFO:tensorflow:  name = rnn_layer/bidirectional_rnn/fw/basic_lstm_cell/kernel:0, shape = (896, 512)
INFO:tensorflow:  name = rnn_layer/bidirectional_rnn/fw/basic_lstm_cell/bias:0, shape = (512,)
INFO:tensorflow:  name = rnn_layer/bidirectional_rnn/bw/basic_lstm_cell/kernel:0, shape = (896, 512)
INFO:tensorflow:  name = rnn_layer/bidirectional_rnn/bw/basic_lstm_cell/bias:0, shape = (512,)
INFO:tensorflow:  name = project/hidden/W:0, shape = (256, 128)
INFO:tensorflow:  name = project/hidden/b:0, shape = (128,)
INFO:tensorflow:  name = project/logits/W:0, shape = (128, 11)
INFO:tensorflow:  name = project/logits/b:0, shape = (11,)
INFO:tensorflow:  name = crf_loss/transitions:0, shape = (11, 11)
INFO:tensorflow:Done calling model_fn.
INFO:tensorflow:Create CheckpointSaverHook.
INFO:tensorflow:Graph was finalized.
INFO:tensorflow:Running local_init_op.
INFO:tensorflow:Done running local_init_op.
INFO:tensorflow:Saving checkpoints for 0 into ./output/result_dir/model.ckpt.

最后祝福大家愉快的运行起来，模型准确率更上一层楼