[paper]EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES（FGSM）

最新推荐文章于 2020-06-13 11:39:49 发布

ch1762

最新推荐文章于 2020-06-13 11:39:49 发布

阅读量172

点赞数

分类专栏： AEs 文章标签：深度学习机器学习

本文链接：https://blog.csdn.net/weixin_43150428/article/details/106028051

版权

AEs 专栏收录该内容

23 篇文章 3 订阅

订阅专栏

早期关于对抗样本产生原因的被认为是由于神经网络的非线性和过拟合，但是这篇论文证明对抗样本是由于神经网络在高维空间中的线性属性产生的。同时，这篇论文提出了一个能够简单快速的生成对抗样本的方法（FGSM）。

论文主要内容总结：

模型的线性属性让其更容易被训练，而其非线性让其容易抵御对抗扰动的攻击，即容易优化的模型也容易被扰动。
对抗性训练可以提高模型对对抗样本的鲁棒性。
通用的正则化策略不会改善模型对对抗样本的脆弱性，但如果使用非线性模型族（例如RBF网络）则可以保证模型的鲁棒性。
对抗样本是由于神经网络在高维空间中的线性属性产生的。
FGSM的实质是输入图片在模型的权重方向上增加了一些扰动（方向一样，点乘最大）。这样可以让图片在较小的扰动下出现较大的改变，从而得到对抗样本。
对抗样本往往存在于模型决策边界的附近，即在模型决策边界附近存在对抗区域。
相比于集成的模型，单个模型对对抗样本的鲁棒性更好一些。
对抗样本的迁移性可以解释为不同的模型学习到了相似的决策边界，即学习到的网络权重越一致则模型的决策边界越相似。
模型的线性属性让其容易被训练，而其非线性属性则增加模型对对抗样本的鲁棒性，即容易优化的模型也容易被扰动。

线性解释

设网络模型的权重为 $w^T$ ，则 $w^T\tilde{x} = w^Tx+w^T\eta$
对抗扰动使网络的误差增加了 $w^T\eta$ ，只要令 $\eta=sign(w)$ ，就可以最大化的增加模型的损失，当 $w^T$ 具有 $n$ 维，平均权重值为 $m$ ，那么激励就会增长 $\epsilon mn(||\eta||_{\infty}<\epsilon)$ ，但是 $||\eta||_{\infty}$ 却并不会因为维度的增加而增加。即当增加一个很小的扰动时，就会产生很大的变化。可以将这种特性看作"accidental steganography"。

We can think of this as a sort of “accidental steganography,” where a linear model is forced to attend exclusively to the signal that aligns most closely with its weights,even if multiple signals are present and other signals have much greater amplitude.

FGSM

模型的参数设为 $\theta$ ， $x$ 记为模型的输入， $y$ 记为模型得到的标签， $J(\theta,x,y)$ 记为损失函数，则可以通过以下的公式来得到对抗扰动：
在这里插入图片描述

对抗训练

基于FGSM的对抗训练公式为：
在这里插入图片描述

线性模型的对抗训练&权重衰减

以logistic回归为例，分析如何生成对抗样本。
假设模型是用 $P(y=1)=\sigma(w^Tx+b)$ 辨别标签 $y\in\{-1,1\}$ 。
使用梯度下降的训练过程可以描述为：
在这里插入图片描述
其中 $\zeta(z)=log(1+exp(z))$ ，对模型而言，梯度的符号其实等于 $- s i g n (w)$ ，而 $w^Tsign(w)=||w||_1$ 。则对于对抗样本来说，需要做的就是使下式最小化：

这个公式与 $L 1$ 正则化很相似，但有很大的不同。

This is somewhat similar to L1 regularization. However,there are some important differences. Most signiﬁcantly, the L1 penalty is subtracted off the model’s activation during training, rather than added to the training cost. This means that the penalty can eventually start to disappear if the model learns to make conﬁdent enough predictions that ζ saturates. This is not guaranteed to happen—in the underﬁtting regime, adversarial training will simply worsen underﬁtting. We can thus view L1 weight decay as being more “worst case” than adversarial training, because it fails to deactivate in the case of good margin.
If we move beyond logistic regression to multiclass softmax regression, L1 weight decay becomes even more pessimistic, because it treats each of the softmax’s outputs as independently perturbable, when in fact it is usually not possible to ﬁnd a single η that aligns with all of the class’s weight vectors. Weight decay over estimates the damage achievable with perturbation even more in the case of a deep network with multiple hidden units. Because L1 weight decay overestimates the amount of damage an adversary can do, it is necessary to use a smaller L1 weight decay coefﬁcient than the $\epsilon$ associated with the precision of our features.

ch1762

关注

0
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
[paper]EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES（FGSM）

早期关于对抗样本产生原因的被认为是由于神经网络的非线性和过拟合，但是这篇论文证明对抗样本是由于神经网络在高维空间中的线性属性产生的。同时，这篇论文提出了一个能够简单快速的生成对抗样本的方法（FGSM）。论文主要内容总结：模型的线性属性让其更容易被训练，而其非线性让其容易抵御对抗扰动的攻击，即容易优化的模型也容易被扰动。对抗性训练可以提高模型对对抗样本的鲁棒性。通用的正则化策略不会改善模型对对抗样本的脆弱性，但如果使用非线性模型族（例如RBF网络）则可以保证模型的鲁棒性。对抗样本是由于神经网络在高
复制链接

扫一扫

专栏目录