SpanBERT论文阅读记录

最新推荐文章于 2023-03-06 16:17:06 发布

绿箭薄荷

最新推荐文章于 2023-03-06 16:17:06 发布

阅读量326

点赞数

分类专栏：学习文章标签： bert 深度学习自然语言处理

本文链接：https://blog.csdn.net/qq_40478033/article/details/121140275

版权

学习专栏收录该内容

18 篇文章 0 订阅

订阅专栏

原文链接

https://aclanthology.org/2020.tacl-1.5/

1. 摘要spanBERT extends BERT by

(1) masking contiguous
random spans, rather than random tokens
mask连续随机跨度，不是随机tokens
(2) training the span boundary represen-
tations to predict the entire content of the
masked span, without relying on the indi-
vidual token representations within it.训练跨度边界表示，预测遮蔽跨度的整个内容，不依赖内部的token 表示

2.介绍

介绍一种新的span-boundary objective（SBO），模型学会从观察到的token 边界预测整个masked span 。
SBO使模型在边界tokens上,存储span-level 信息，在微调阶段容易获取。Figure 1 描述了作者方法。

在这里插入图片描述

3.模型

是一个自监督预训练模型，更好表示和预测文本的span
1.First, we use a different random process to mask spans of tokens, rather than individual ones.
2.SBO which tries to predict the entire masked span using only the representations of the tokens at the span’s boundary
3.每一个训练示例是单一连续文本（不是两个），不使用Bert的 next sen-
tence prediction objective

3.1 Span Masking

sequence of tokens X = (x1,x2,…xn)
迭代采样span of text 直到 mask budget（e.g.15% of X）选取子集 Y ⊆X
每次迭代，先采样span length（number of words）从及几何分布 l∼Geo(p）
倾向于短span
然后，随机选择被mask 的span的起点，采样是完整的单词，起止点是一个单词的开头。
根据初步实验作者选取p = 0.2 lmax=10.平均长度mean（l）=3.8，
all the tokens in a span are replaced with [MASK]or sampled tokens.
figure 2 为其分布

在这里插入图片描述

3.2 Span Boundary Objective（SBO）

理想状态下，希望end of span 的表示尽可能多的总结内部跨度内容，因此引入SBO涉及预测每一个被mask的span中的token，仅仅使用在边界观察到的tokens的表示。（figure1）
我们表示序列中每个标记的transformers编码器的输出通过 x1,…,xn。给定一个被屏蔽的令牌跨度(xs,…,xe) ∈ Y , 其中 (s,e) 表示它的开始和结束位置。作者使用外部边界token xs−1 和 xe+1的输出编码，和目标token的位置编码Pi-s+1，表示每一个在span中的token：

yi = f(xs−1,xe+1,pi−s+1)

其中位置嵌入P1，P2…代表相对于左边界token Xs-1的被mask的token的相对位置。
作者将f(.)实现为具有GeLU激活函数和层归一化（layer normalization）的2层前馈网络。
在这里插入图片描述
然后，使用向量表示 yi 预测token xi 和计算交叉熵损失（cross-entropy loss）

SpanBERT将两者loss相加，对于每一个token xi in span（xs,…xe） 1.span boundary 2.常规掩码语言模型 .。同时为MLM和SBO中的目标token 重用输入嵌入
在这里插入图片描述

3.3 Single-Sequence Training

simply sample a single contiguous segment of up to n = 512 tokens, rather than two half-segments that sum up to n tokens together

总结

总之，SpanBERT 通过以下方式预训练跨度表示：（1）使用基于几何分布的屏蔽方案（第 3.1 节）屏蔽全词的跨度，（2）除了使用 MLM 之外，还优化辅助跨度边界目标（第 3.2 节）单序列数据管道（第 3.3 节）。可以在附录 A 中找到程序说明

绿箭薄荷

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
SpanBERT论文阅读记录

原文链接https://aclanthology.org/2020.tacl-1.5/1. 摘要spanBERT extends BERT by(1) masking contiguousrandom spans, rather than random tokensmask连续随机跨度，不是随机tokens(2) training the span boundary represen-tations to predict the entire content of themasked s
复制链接

扫一扫

专栏目录