RoBERTa: A Robustly Optimized BERT Pretraining Approach
1. Setup
BERT takes as input a concatenation of two segments (sequences of tokens), x1, . . . , xN and y1, . . . , yM.
Segments usually consist of more than one natural sentence.
The two segments are presented as a single input sequence to BERT with spe
原创
2021-04-01 20:10:24 ·
262 阅读 ·
0 评论