StructBERT解读

最新推荐文章于 2025-03-27 10:31:35 发布

原创最新推荐文章于 2025-03-27 10:31:35 发布 · 5.6k 阅读

8 ·

CC 4.0 BY-SA版权

文章标签：

#机器学习 #人工智能 #自然语言处理 #深度学习

NLP 专栏收录该内容

18 篇文章

订阅专栏

介绍

StructBERT是阿里巴巴达摩院提出的NLP预训练模型，在传统BERT的基础上作出了相关改进，本文具体介绍StructBERT在BERT上的变化，论文参考《StructBERT: Incorporating Languages structures into pre-training for deep language understading》

StructBERT VS BERT

主要区别在于除了增加了两个预训练任务和目标

Word Structural Objective

在这里插入图片描述
该任务的启发是，一个良好的语言模型，应该有把打乱的句子重构的能力。改任务具体做法是，如上图所示，除了和BERT一样将15%的TOKEN进行MASK外，对未MASK的词，随机抽选一个trigram，打乱顺序后重构该顺序，相应被打乱的节点接softmax后预测需要重构之前的顺序，目标函数如下：
在这里插入图片描述
这里的K是打乱的训练长度(论文中K=3)，目标函数最大化输出序列为打乱顺序前的原有序列的概率。

Sentence Structural Objective

在这里插入图片描述
Next Sentence Prediction，即NSP任务是预测下一个句子是不是在原有句子之后，本质是个二分类任务。该任务对于BERT而言，NSP任务过于简单，通常有97%-98%的准确率，因此structBERT对NSP进行扩展，这里具体做法如上图所示改成三分类问题，即预测下一个句子是在当前句子之前，或者之后，后者任意抽取的句子。实际构造训练数据的时候上述各取三分之一。

总结

这篇论文的主要贡献在于以下两点

• We propose novel structural pre-training that extends BERT by incorporating the word structural objective and the sentence structural objective to leverage language structures in contextualized representation. This enables the StructBERT to explicitly model language structures by forcing it to reconstruct the right order of words and sentences for correct prediction.
• StructBERT significantly outperforms all published state-of-the-art models on a wide range of NLU tasks. This model extends the superiority of BERT, and boosts the performance in many language understanding applications such as semantic textual similarity, sentiment analysis, textual entailment, and question answering.