A Robust Adversarial Training Approach to Machine Reading Comprehension

26 篇文章 1 订阅

A Robust Adversarial Training Approach to Machine Reading Comprehension

2020 AAAI 百度,北大,厦大

动机:

同样是针对robustness,one of the most promising ways is to augment the training dataset
Since the types of adversarial examples are innumerable, it is not adequate to manually design

In this paper, we propose a novel robust adversarial training approach to improve the robustness of MRC models in a more generic way.

Specificly, dynamically generates adversarial examples based on the parameters of current model and further trains the model by using the generated examples in an iterative schedule.
it does not require any specification of adversarial attack types

在这里插入图片描述
AddSent:(Jia and Liang 2017) generates the misleading text by modifying the question according to
certain rules and proofreads manually
AddAnsCtx:we generate the misleading text by removing the answer words in answer sentences.

方法:
  1. Takes a well trained MRC model as the adversarial generator, and trains perturbation embedding sequences to minimize output probabilities of real answers
  2. Greedily samples word sequences from perturbation embeddings as misleading texts to create and enrich our adversarial example set.
  3. Trains the MRC model to maximize probabilities of real answers to defend against those adversarial examples.

具体而言:
During the training, we treat the model as a generator and all model parameters are fixed.
the training method only tries to perturb each passage input ep with an additional perturbation embedding sequence.
在这里插入图片描述

k is the insert position index, 随机加入
l is the length of the e’
在这里插入图片描述
对于每个位置i来说,对每个词的权重和为1
在这里插入图片描述
where αij is a trainable parameter for wi

Ques: W和alpha 对不同的样例都是一样的?

在这里插入图片描述
1、To generate misleading answer texts and distract the MRC model,design a cross entropy loss
aims to cheat the model and make the model believe the answer is locating in perturbation embedding sequence

在这里插入图片描述

where sd is the distract answer span located in perturbation embedding sequence.

2、To generate misleading context texts, we design a loss function aims to minimize the model estimation on ground truth span sg
在这里插入图片描述
3、 define our training loss function as: loss越大越难骗过模型;loss越小,噪音越好
在这里插入图片描述
此外,Add a regularization term Rs ,
to control the similarity between perturbation embeddings and questions & answers
在这里插入图片描述
sim(·, ·) is defined as a bag-of-words cosine similarity function:
在这里插入图片描述

最后,
在这里插入图片描述
we repeat the training process for each instance until the loss L is converged or lower than a certain threshold, then return the weight matrix w for further sampling

贪心采样

在这里插入图片描述
We simply sample the maximum weighted

在这里插入图片描述

Therefore, for each instance, generating a misleading text is sampling a max weighted token sequence

Retraining with Adversarial Examples

we enrich training data with sampled adversarial examples and retrain our models on the enriched data
扩充数据
Given a misleading text and its corresponding triple data <q, p, s>, we insert the misleading text A back into its position k of the passage.在这里插入图片描述

整个流程
在这里插入图片描述

实验

Standard SQuAD development set and five different types of adversarial test sets.
在这里插入图片描述
实验设置
• We randomly insert perturbation embedding between sentences, k 不确定的
• We limit the perturbation sequence length l to be 10
• we randomly set λq, λp to be -10 or 10, and set λc to be 0.5.
• And we set sd with random length in the middle of each perturbation embedding.
• we set the threshold as 1.5 and we set the maximum training step as 200 (most training losses tend to be stable (differences are lower than 1e-3) around 200 steps.)
• In training iteration, we set maximum training time T to be 5, trainloss’s stopping threshold yta to be 12.0.
• we randomly sample 5% training data for adversarial training and larger ratios will not provide satisfied performance within a single iteration according to our early experiments.
• After sampling, we retrain MRC models follow the early stopping strategy
• 对每个batch 都会收集local 词表。for each training instance, we utilize a local vocabulary V , in which tokens are mainly related to questions and passages.
• To make the model easier to converge, the vocabulary size is limited to 200.

在这里插入图片描述

结果

在这里插入图片描述

在这里插入图片描述

  • ASD dataset has more overlaps with AS and AA.
  • Our data has a more extensive distribution in the space. Its extensiveness enable itself to cover more types of adversarial examples

Ablation study
在这里插入图片描述
生成的一个结果:

在这里插入图片描述

  • 1
    点赞
  • 1
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
Adversarial attacks are a major concern in the field of deep learning as they can cause misclassification and undermine the reliability of deep learning models. In recent years, researchers have proposed several techniques to improve the robustness of deep learning models against adversarial attacks. Here are some of the approaches: 1. Adversarial training: This involves generating adversarial examples during training and using them to augment the training data. This helps the model learn to be more robust to adversarial attacks. 2. Defensive distillation: This is a technique that involves training a second model to mimic the behavior of the original model. The second model is then used to make predictions, making it more difficult for an adversary to generate adversarial examples that can fool the model. 3. Feature squeezing: This involves converting the input data to a lower dimensionality, making it more difficult for an adversary to generate adversarial examples. 4. Gradient masking: This involves adding noise to the gradients during training to prevent an adversary from estimating the gradients accurately and generating adversarial examples. 5. Adversarial detection: This involves training a separate model to detect adversarial examples and reject them before they can be used to fool the main model. 6. Model compression: This involves reducing the complexity of the model, making it more difficult for an adversary to generate adversarial examples. In conclusion, improving the robustness of deep learning models against adversarial attacks is an active area of research. Researchers are continually developing new techniques and approaches to make deep learning models more resistant to adversarial attacks.
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

彭伟_02

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值