论文解读：Spelling Error Correction with Soft-Masked BERT

最新推荐文章于 2024-11-07 12:25:28 发布

华师数据学院·王嘉宁

最新推荐文章于 2024-11-07 12:25:28 发布

阅读量1.7k

点赞数 3

分类专栏：论文解读中文拼写纠错深度学习文章标签：自然语言处理深度学习机器学习

本文链接：https://blog.csdn.net/qq_36426650/article/details/121533938

版权

论文解读同时被 3 个专栏收录

104 篇文章

订阅专栏

深度学习

46 篇文章

订阅专栏

中文拼写纠错

10 篇文章

订阅专栏

论文解读：Spelling Error Correction with Soft-Masked BERT（2020ACL）

拼写错误纠错是一个比较重要且挑战的任务，非常依赖于人类的语言理解能力。本文关注中文的拼写错误纠错任务（Chinese Spelling Error Correction）。目前SOTA的方法是给予BERT模型，为句子中的每一个词，从候选的字符列表中挑选一个作为纠错的结果，然而这类方法容易陷入局部最优。然而，因为 BERT 没有足够的能力来检测每个位置是否有错误，显然是由于使用掩码语言建模对其进行预训练的方式。、
本文解决上述提到的问题，提出一种基于BERT端到端的新方法，包括error detection network和error correction network，这两个模块前后之间通过我们提出的soft-masking technique。

Our method of using ‘Soft-Masked BERT’ is general, and it may be employed in other language detection- correction problems.

拼写错误纠错任务可以用于搜索、OCR识别等下游任务中，本文关注与字符级别的纠错任务。

简要信息：

序号	属性	值
1	模型名称	SoftMasked BERT
2	所属领域	自然语言处理、中文拼写检测
3	研究内容	预训练语言模型
4	核心内容	BERT应用
5	GitHub源码	https://github.com/hiyoung123/SoftMaskedBert
6	论文PDF	https://aclanthology.org/2020.acl-main.82.pdf

一、挑战：

世界知识（World Knowledge）需要应用到拼写错误纠错上；
需要一定的推理（Inference）

二、相关工作与动机：

先前的拼写错误纠错方法可以分为传统的机器学习方法和深度学习方法：
BERT目前常用于拼写检错上，但是其错误检测能力还不够好。作者认为可能Masked Langauge Model模型只有15%的字符被mask，因此其可能只学习到mask的分布情况，并不会尝试进行纠错。

the way of pre-training BERT with mask language modeling in which only about 15% of the characters in the text are masked, and thus it only learns the distribution of masked tokens and tends to choose not to make any correction.

本文提出Soft-Masked BERT，包括detection network和correction network：

detection network：使用Bi-GRU用于预测每个位置的字符是否存在错误；概率则作为soft-masking
correction network：使用BERT预测纠正的词的概率；

soft-masking是hard-masking的一种拓展：

hard-masking，0/1向量，0表示不纠错，1表示纠错；
soft-masking：小数，每个位置的字符代表一个embedding向量，并喂入correction network中

三、方法

Soft-Masked BERT is composed of a detection network based on Bi-GRU and a correction network based on BERT. The detection network predicts the probabilities of errors and the correction network predicts the probabilities of error corrections, while the former passes its prediction results to the latter using soft masking.

模型架构如下图所示：
在这里插入图片描述

Detection Network

输入每个token，每个token的input embedding为word embedding、position embedding以及segment embedding，经过双向GRU网路，每个位置将会输出一个二分类标签（1表示该token是错的，0表示正确），并输出对应的标签为1的概率（即存在错误的概率）
soft masking：对input embedding和mask embedding进行加权求和：
$e_i' = p_i\cdot e_{mask} + (1 - p_i)\cdot e_i$

最终获得的 $e_i'$ 表示每个位置的soft masking embebding。

Correction Network

输入soft masking emebdding，喂入到BERT的Masked Langauge Modeling模型中
其中 $p_i$ 是该位置是错误的概率，得到最后一层的隐向量，同时通过残差连接方法与input embedding进行结合：
$h_i' = h_i^c + e_i$
每个位置的token进行多类分类，得到纠错后的结果。
learning
训练目标：detection network和correction network分别对应loss function：