BERT模型返回值

bert的输出是tuple类型的,包括4个:

Return:
        :obj:`tuple(torch.FloatTensor)` comprising various elements depending on the configuration (:class:`~transformers.BertConfig`) and inputs:
        last_hidden_state (:obj:`torch.FloatTensor` of shape :obj:`(batch_size, sequence_length, hidden_size)`):
            Sequence of hidden-states at the output of the last layer of the model.
        pooler_output (:obj:`torch.FloatTensor`: of shape :obj:`(batch_size, hidden_size)`):
            Last layer hidden-state of the first token of the sequence (classification token)
            further processed by a Linear layer and a Tanh activation function. The Linear
            layer weights are trained from the next sentence prediction (classification)
            objective during pre-training.
            This output is usually *not* a good summary
            of the semantic content of the input, you're often better with averaging or pooling
            the sequence of hidden-states for the whole input sequence.
        hidden_states (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_hidden_states=True``):
            Tuple of :obj:`torch.FloatTensor` (one for the output of the embeddings + one for the output of each layer)
            of shape :obj:`(batch_size, sequence_length, hidden_size)`.
            Hidden-states of the model at the output of each layer plus the initial embedding outputs.
        attentions (:obj:`tuple(torch.FloatTensor)`, `optional`, returned when ``config.output_attentions=True``):
            Tuple of :obj:`torch.FloatTensor` (one for each layer) of shape
            :obj:`(batch_size, num_heads, sequence_length, sequence_length)`.
            Attentions weights after the attention softmax, used to compute the weighted average in the self-attention
            heads.
  1. last_hidden_state:shape是(batch_size, sequence_length, hidden_size),hidden_size=768,它是模型最后一层输出的隐藏状态
  2. pooler_output:shape是(batch_size, hidden_size),这是序列的第一个token(classification token)的最后一层的隐藏状态,它是由线性层和Tanh激活函数进一步处理的,这个输出不是对输入的语义内容的一个很好的总结,对于整个输入序列的隐藏状态序列的平均化或池化通常更好。
  3. hidden_states:这是输出的一个可选项,如果输出,需要指定config.output_hidden_states=True,它也是一个元组,它的第一个元素是embedding,其余元素是各层的输出,每个元素的形状是(batch_size, sequence_length, hidden_size)
  4. attentions:这也是输出的一个可选项,如果输出,需要指定config.output_attentions=True,它也是一个元组,它的元素是每一层的注意力权重,用于计算self-attention heads的加权平均值

这里给出示例代码

import torch
from transformers import *

# Each architecture is provided with several class for fine-tuning on down-stream tasks, e.g.
BERT_MODEL_CLASSES = [BertModel]

# All the classes for an architecture can be initiated from pretrained weights for this architecture
# Note that additional weights added for fine-tuning are only initialized
# and need to be trained on the down-stream task
pretrained_weights = 'bert-base-uncased'
tokenizer = BertTokenizer.from_pretrained(pretrained_weights)
for model_class in BERT_MODEL_CLASSES:
    # Load pretrained model/tokenizer
    model = model_class.from_pretrained(pretrained_weights)

    # Models can return full list of hidden-states & attentions weights at each layer
    model = model_class.from_pretrained(pretrained_weights,
                                        output_hidden_states=True,
                                        output_attentions=True)
    input_ids = torch.tensor([tokenizer.encode("Let's see all hidden-states and attentions on this text")])
    last_hidden_state, pooler_output, all_hidden_states, all_attentions = model(input_ids)
    print(last_hidden_state.shape)
    print(pooler_output.shape)
    print(len(all_hidden_states))
    print(len(all_attentions))
    print(all_hidden_states[-2])

输出:

input_ids: tensor([[ 101, 2292, 1005, 1055, 2156, 2035, 5023, 1011, 2163, 1998, 3086, 2015, 2006, 2023, 3793, 102]])
last_hidden_state.shape: torch.Size([1, 16, 768])
pooler_output.shape: torch.Size([1, 768])
len(all_hidden_states): 13
len(all_attentions): 12
all_hidden_states[-2]: tensor([[[ 0.3522, -0.6508, 0.4068, …, -0.5943, -0.1012, 0.3161],
[ 0.9840, -0.2480, 0.0171, …, -0.0287, 1.1418, -0.4333],
[ 0.0406, 0.0278, -0.0156, …, -0.0117, -0.0351, 0.0244],
…,
[-0.4968, 0.1059, 0.1520, …, -1.0849, 0.3682, 0.6323],
[-0.0365, -0.2779, -0.3252, …, -0.0088, 0.0322, -0.4090],
[ 0.0271, 0.0178, -0.0082, …, 0.0126, -0.0168, 0.0107]]],
grad_fn=< NativeLayerNormBackward>)
all_hidden_states[-2].shape: torch.Size([1, 16, 768])

可以看出all_hidden_states=13,all_attentions=12,因为all_hidden_states比all_attentions多了一层的embedding.

### Tokenizer 函数返回值类型及内容解释 当调用 `BertTokenizer` 的编码方法时,通常会得到一个字典对象作为返回值。这个字典包含了多个键值对,用于表示输入字符串的不同形式的编码结果。 对于给定的例子: ```python from transformers import BertTokenizer tokenizer = BertTokenizer.from_pretrained('./bert-base-uncased') encoded_input = tokenizer("Hello, my dog is cute", padding='max_length', truncation=True, max_length=10, return_tensors="pt") ``` 上述代码片段创建了一个基于预训练模型 `'bert-base-uncased'` 的分词器实例,并通过该实例对一句话进行了编码操作。这里设置了几个重要的参数来控制输出格式[^1]。 #### 返回值结构解析 - **input_ids**: 这是一个由整数组成的一维列表(如果指定了 `return_tensors='pt'` 则为 PyTorch 张量),代表原始文本被映射到词汇表中的索引位置后的数值化表达。这些 ID 是 BERT 模型理解并处理自然语言的基础单位。 - **attention_mask**: 同样是以张量的形式呈现的数据项,默认情况下它用来指示哪些 token 应该参与注意力机制计算。一般而言,在有效 tokens 上取值为 1 ,而在填充部分则设为 0 。这有助于防止模型关注那些实际上不存在的信息[^2]。 - **token_type_ids** (可选): 如果传入的是两个句子组成的 pair,则还会额外获得此字段。它的作用在于区分不同句子之间的边界,帮助模型更好地捕捉句间关系。不过在单句话的情况下不会存在该项。 因此,最终获取的结果将会像下面这样: ```json { "input_ids": tensor([[ 101, 7592, 1037, 8246, 1037, 2293, 1012]]), "attention_mask": tensor([[1, 1, 1, 1, 1, 1, 1]]) } ``` 以上就是关于 `tokenizer.encode_plus()` 方法所生成的主要组成部分及其含义说明。
评论 12
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值