【Coding】BERT finetune用任意几个Layer的output feature作为final logits
某些paper中表示用最后四个layer的output feature结果要好一些。根据huggingface model output输出
outputs = self.bert(input_ids,
attention_mask=attention_mask,
token_type_ids=token_type_ids,
position_ids=position_ids,
head_mask=head_mask)
hidden_states = outputs[1]
pooled_output = torch.cat(tuple([hidden_states[i] for i in [-4, -3, -2, -1]]), dim=-1)
pooled_output = pooled_output[:, 0, :]
pooled_output = self.dropout(pooled_output)
# classifier of course has to be 4 * hidden_dim, because we concat 4 layers
logits = self.classifier(pooled_output)
参考资料: