transformers 之 head介绍

BertForPreTraining

Bert Model with two heads on top as done during the pretraining: a masked language modeling head and a next sentence prediction (classification) head.

有两个头,

MLM head: 可以简单理解是一个全连接层(实际不是,先经过liner(hidden_size>hidden_size)>激活>layernorm>liner(hidden_size>vocab_size)),预测被mask的单词

nsp head:  nsp预测,也是一个全连接层, hidden_size->2

class BertPreTrainingHeads(nn.Module):
    def __init__(self, config):
        super().__init__()
        self.predictions = BertLMPredictionHead(config)  # MLM head
        self.seq_relationship = nn.Linear(config.hidden_size, 2)  # NSP HEAD

    def forward(self, sequence_output, pooled_output):
        prediction_scores = self.predictions(sequence_output)
        seq_relationship_score = self.seq_relationship(pooled_output)
        return prediction_scores, seq_relationship_score

BertLMHeadModel

Bert Model with a language modeling head on top for CLM fine-tuning.

只有一个MLM head, 训练目标是根据上一个词预测当前词,时因果语言建模(CLM)

import torch
from transformers import AutoTokenizer, BertLMHeadModel

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = BertLMHeadModel.from_pretrained("bert-base-uncased")

inputs = tokenizer("Hello, my dog is cute", return_tensors="pt")
outputs = model(**inputs, labels=inputs["input_ids"])
loss = outputs.loss
logits = outputs.logits

BertForMaskedLM

Bert Model with a language modeling head on top.

只有一个MLM head,训练目标就是预测mask

from transformers import AutoTokenizer, BertForMaskedLM
import torch

tokenizer = AutoTokenizer.from_pretrained("bert-base-uncased")
model = BertForMaskedLM.from_pretrained("bert-base-uncased")

inputs = tokenizer("The capital of France is [MASK].", return_tensors="pt")

with torch.no_grad():
    logits = model(**inputs).logits

# retrieve index of [MASK]
mask_token_index = (inputs.input_ids == tokenizer.mask_token_id)[0].nonzero(as_tuple=True)[0]

predicted_token_id = logits[0, mask_token_index].argmax(axis=-1)
tokenizer.decode(predicted_token_id)

labels = tokenizer("The capital of France is Paris.", return_tensors="pt")["input_ids"]
# mask labels of non-[MASK] tokens
labels = torch.where(inputs.input_ids == tokenizer.mask_token_id, labels, -100)

outputs = model(**inputs, labels=labels)
round(outputs.loss.item(), 2)

BertForNextSentencePrediction

Bert Model with a next sentence prediction (classification) head on top.

只有一个NSPhead

BertForSequenceClassification

Bert Model transformer with a sequence classification/regression head on top (a linear layer on top of the pooled output) e.g. for GLUE tasks.

一个全连接层head, 输出维度等于 类别数量

BertForMultipleChoice

Bert Model with a multiple choice classification head on top (a linear layer on top of the pooled output and a softmax) e.g. for RocStories/SWAG tasks.

一个全连接层head,输出维度1(可以理解是一个得分)

主要处理答案选择任务。比如我有一道题,然后有4个选项ABCD

构造数据如下:

[cls]问题[sep]A[sep]

[cls]问题[sep]B[sep]

[cls]问题[sep]C[sep]

[cls]问题[sep]D[sep]

把这4条数据分别送入模型

正确答案的得分应该最高,采用softmax激活,交叉熵损失(有点类似文本匹配的listwise loss)

BertForQuestionAnswering

Bert Model with a span classification head on top for extractive question-answering tasks like SQuAD (a linear layers on top of the hidden-states output to compute span start logits and span end logits).

一个全连接层head,做答案抽取的,预测答案的开始和结尾。

参考:

https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertForQuestionAnswering

  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值