BERT BertModel类源码解析

最新推荐文章于 2023-04-06 22:33:35 发布

冰__蓝

最新推荐文章于 2023-04-06 22:33:35 发布

阅读量2.7k

点赞数 4

分类专栏： NLP技术

本文为博主原创文章，未经博主允许不得转载。

本文链接：https://blog.csdn.net/ling620/article/details/97783966

版权

NLP技术专栏收录该内容

21 篇文章 6 订阅

订阅专栏

本文目录

- - 1. 调用示例
  - 2. 初始化函数

源码位于： bert/modeing.py

1. 调用示例

BertModel类给出的调用代码示例：

    # Already been converted into WordPiece token ids
    input_ids = tf.constant([[31, 51, 99], [15, 5, 0]])
    input_mask = tf.constant([[1, 1, 1], [1, 1, 0]])
    token_type_ids = tf.constant([[0, 0, 1], [0, 2, 0]])

    config = modeling.BertConfig(vocab_size=32000, hidden_size=512,
      num_hidden_layers=8, num_attention_heads=6, intermediate_size=1024)

    model = modeling.BertModel(config=config, is_training=True,
      input_ids=input_ids, input_mask=input_mask, token_type_ids=token_type_ids)

    label_embeddings = tf.get_variable(...)
    pooled_output = model.get_pooled_output()
    logits = tf.matmul(pooled_output, label_embeddings)

调用非常简单，model = modeling.BertModel()即可，可以通过类内的属性方法获取相应的结果。

2. 初始化函数

类BertModel几个主要参数：

config：BertConfig的实例，bert_config = modeling.BertConfig.from_json_file(FLAGS.bert_config_file)
is_training：是否训练模型，用于控制是否应用dropout
input_ids：int32类型的Tensor ，shape为[batch_size, seq_length]
input_mask：（可选）int32类型的Tensor ，shape为[batch_size, seq_length]
token_type_ids：（可选）int32类型的Tensor ，shape为[batch_size, seq_length]
use_one_hot_embeddings：（可选）bool类型，是否使用独热编码

主要包含以下几个步骤：

检查输入变量
embedding：得到 self.embedding_output
encoder：得到 self.all_encoder_layers
得到 self.sequence_output = self.all_encoder_layers[-1]
sequence_output 变量的 shape为 [batch_size, seq_length, hidden_size]
池化pooler：得到 self.pooled_output
self.pooled_output的shape为 [batch_size, hidden_size]

class BertModel(object):
    def __init__(self,
                 config,
                 is_training,
                 input_ids,
                 input_mask=None,
                 token_type_ids=None,
                 use_one_hot_embeddings=False,
                 scope=None):
        """Constructor for BertModel.
        Raises:
          ValueError: The config is invalid or one of the input tensor shapes
            is invalid.
        """
        config = copy.deepcopy(config)
        if not is_training:
            config.hidden_dropout_prob = 0.0
            config.attention_probs_dropout_prob = 0.0

        input_shape = get_shape_list(input_ids, expected_rank=2)
        batch_size = input_shape[0]
        seq_length = input_shape[1]

        if input_mask is None:
            input_mask = tf.ones(
                shape=[batch_size, seq_length], dtype=tf.int32)

        if token_type_ids is None:
            token_type_ids = tf.zeros(
                shape=[batch_size, seq_length], dtype=tf.int32)

        with tf.variable_scope(scope, default_name="bert"):
            with tf.variable_scope("embeddings"):
                # Perform embedding lookup on the word ids.
                (self.embedding_output, self.embedding_table) = embedding_lookup(
                    input_ids=input_ids,
                    vocab_size=config.vocab_size,
                    embedding_size=config.hidden_size,
                    initializer_range=config.initializer_range,
                    word_embedding_name="word_embeddings",
                    use_one_hot_embeddings=use_one_hot_embeddings)

                # Add positional embeddings and token type embeddings, then layer
                # normalize and perform dropout.
                self.embedding_output = embedding_postprocessor(
                    input_tensor=self.embedding_output,
                    use_token_type=True,
                    token_type_ids=token_type_ids,
                    token_type_vocab_size=config.type_vocab_size,
                    token_type_embedding_name="token_type_embeddings",
                    use_position_embeddings=True,
                    position_embedding_name="position_embeddings",
                    initializer_range=config.initializer_range,
                    max_position_embeddings=config.max_position_embeddings,
                    dropout_prob=config.hidden_dropout_prob)

            with tf.variable_scope("encoder"):
                # This converts a 2D mask of shape [batch_size, seq_length] to a 3D
                # mask of shape [batch_size, seq_length, seq_length] which is used
                # for the attention scores.
                attention_mask = create_attention_mask_from_input_mask(
                    input_ids, input_mask)

                # Run the stacked transformer.
                # `sequence_output` shape = [batch_size, seq_length, hidden_size].
                self.all_encoder_layers = transformer_model(
                    input_tensor=self.embedding_output,
                    attention_mask=attention_mask,
                    hidden_size=config.hidden_size,
                    num_hidden_layers=config.num_hidden_layers,
                    num_attention_heads=config.num_attention_heads,
                    intermediate_size=config.intermediate_size,
                    intermediate_act_fn=get_activation(config.hidden_act),
                    hidden_dropout_prob=config.hidden_dropout_prob,
                    attention_probs_dropout_prob=config.attention_probs_dropout_prob,
                    initializer_range=config.initializer_range,
                    do_return_all_layers=True)

            self.sequence_output = self.all_encoder_layers[-1]
            # The "pooler" converts the encoded sequence tensor of shape
            # [batch_size, seq_length, hidden_size] to a tensor of shape
            # [batch_size, hidden_size]. This is necessary for segment-level
            # (or segment-pair-level) classification tasks where we need a fixed
            # dimensional representation of the segment.
            with tf.variable_scope("pooler"):
                # We "pool" the model by simply taking the hidden state corresponding
                # to the first token. We assume that this has been pre-trained
                first_token_tensor = tf.squeeze(
                    self.sequence_output[:, 0:1, :], axis=1)
                self.pooled_output = tf.layers.dense(
                    first_token_tensor,
                    config.hidden_size,
                    activation=tf.tanh,
                    kernel_initializer=create_initializer(config.initializer_range))

几个获取指定层结果的函数如下：

get_pooled_output()：获取池化层输出
get_sequence_output()：获取编码器最后一层隐藏层输出
get_all_encoder_layers()：获取所有编码层
get_embedding_table()：

get_sequence_output就是获取的编码层最后一层隐藏层的输出：
self.sequence_output = self.all_encoder_layers[-1]

    def get_pooled_output(self):
        return self.pooled_output

    def get_sequence_output(self):
        """Gets final hidden layer of encoder.

        Returns:
          float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
          to the final hidden of the transformer encoder.
        """
        return self.sequence_output

    def get_all_encoder_layers(self):
        return self.all_encoder_layers

    def get_embedding_output(self):
        """Gets output of the embedding lookup (i.e., input to the transformer).

        Returns:
          float Tensor of shape [batch_size, seq_length, hidden_size] corresponding
          to the output of the embedding layer, after summing the word
          embeddings with the positional embeddings and the token type embeddings,
          then performing layer normalization. This is the input to the transformer.
        """
        return self.embedding_output

    def get_embedding_table(self):
        return self.embedding_table

冰__蓝

关注

4
点赞
踩
4

收藏

觉得还不错? 一键收藏
1
评论
BERT BertModel类源码解析

本文目录1. 调用示例2. 初始化函数源码位于： bert/modeing.py1. 调用示例BertModel类给出的调用代码示例： # Already been converted into WordPiece token ids input_ids = tf.constant([[31, 51, 99], [15, 5, 0]]) input_mask = ...
复制链接

扫一扫

专栏目录