EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

最新推荐文章于 2023-11-01 21:00:14 发布

EnEn1998

最新推荐文章于 2023-11-01 21:00:14 发布

阅读量331

点赞数 1

分类专栏：深度学习文章标签：神经网络机器学习人工智能深度学习计算机视觉

本文链接：https://blog.csdn.net/weixin_39929275/article/details/108435846

版权

深度学习专栏收录该内容

6 篇文章 0 订阅

订阅专栏

阅读笔记 EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

ABSTRACT
INTRODUCTION
RELATED WORK
THE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLES
LINEAR PERTURBATION OF NON-LINEAR MODELS
ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY
问题

初次编辑于2020.9.7 by EnEn

ABSTRACT

本文认为神经网络易受对抗性扰动影响的主要原因是它们的线性性质。
同时，本文提供了新的简单的快速的产生对抗干扰样本的模型。

INTRODUCTION

在许多情况下，不同结构的模型，不同训练子集都会把相同的对抗性样本分类错误，这一事实说明，在我们的训练算法中存在盲点，使神经网络无法正确分类对抗性样本。许多人认为其中的原因来自深度神经网络的极端非线性，有可能包括纯监督学习下的不充分模型平均insufficient model averaging和不充分的正规化。
*The cause of these adversarial examples was a mystery, and speculative explanations have suggested it is due to extreme nonlinearity of deep neural networks, perhaps combined with insufficient model averaging and insufficient regularization of the purely supervised learning problem. *

而我们不这么认为，我们认为在高维空间中，线性行为也能产生对抗样本。我们发现把对抗性样本加入到训练集中，比仅仅由dropout带来的正则化的正确率更高。一般正则化策略（如dropout、pretraining和model averaging ）不会显著降低模型对对抗性样本的脆弱性，但利用非线性模型系列（如 RBF 网络）可以提高对对抗性样本的脆弱性。

我们认为，设计由于线性性而易于训练的模型与设计使用非线性效应来抵抗对抗扰动的模型之间具有根本的对立关系。从长远来看，通过设计更强大的优化方法可以成功训练更多的非线性模型，可以摆脱这种权衡。
Our explanation suggests a fundamental tension between designing models that are easy to train due to their linearity and designing models that use nonlinear effects to resist adversarial perturbation. In the long run, it may be possible to escape this tradeoff by designing more powerful optimization methods that can succesfully train more nonlinear models.

RELATED WORK

Box-constrained L-BFGS可以
在某些数据集上，对抗性样本和原图几乎相同，无法用肉眼区分
同一个对抗性样本可以被不同训练数据集的分类器或不同结构的分类器分类错误
Shallow softmax regression models也无法抵抗对抗性样本带来的错误率
利用对抗性样本可以帮助模型正则化，但是很不实际（因为代价昂贵）

THE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLES

本段来源 @鱼非子子.
线性模型中，对抗性样本存在的原因

因为样本输入特征(input feature)的精度有限(电子图像的每个像素是8bits, 样本中所有低于动态范围1/255的信息都会被丢弃)，所以当样本 $x$ 中每个元素值添加的扰动值 $\eta$ 小于样本输入特征精度时，分类器无法将样本 $x$ 和对抗样本 $\tilde{x}=x+\eta$ 区分开。也就是对一个分类良好的分类器而言，如果 $\epsilon$ 是一个足够小以至于被舍弃掉的值，那么只要 $||\eta||_{\infty}<$ $\epsilon$ ，分类器将认为 $x$ 和 $\tilde{x}$ 属于同一个类。
下图为 $y = s i g n (x)$ 在这里插入图片描述

下面考虑权重向量 $\omega^{\top}$ 和对抗样本 $\tilde{x}$ 的点积为 $\omega^{\top}\tilde{x}=\omega^{\top}(x+\eta)=\omega^{\top}x+\omega^{\top}\eta$ 。可以看出，对抗扰动使得activation增加了 $\omega^{\top}\eta$ ，作者提出让 $\eta=sign(\omega)$ 从而使 $\omega^{\top}\eta$ 最大化。假设权重向量 $\omega$ 有 $n$ 个维度，且权重向量中元素的平均量值是 $m$ ，那么activation将增加 $\epsilon mn(\Rightarrow \omega^{\top}\eta\leq n*m*\epsilon,$ $||\eta||_{\infty}<$ $\epsilon)$ 。虽然 $||\eta||_{\infty}$ 不会随着维度 $n$ 的变化而变化，但是由 $\eta$ 导致的activation的增加量 $\epsilon mn$ 会随着维度 $n$ 线性增长。那么对于一个高维度的问题，一个样本中大量维度的无限小的干扰加在一起就可以对输出造成很大的变化。

所以对抗样本的线性解释表明，对线性模型而言，如果其输入样本有足够大的维度，那么线性模型也容易受到对抗样本的攻击。

LINEAR PERTURBATION OF NON-LINEAR MODELS

假设NN是线性的，而对对抗性样本敏感，
且

$\theta$ 为模型的参数
$x$ 为模型的输入
$y$ 为 $x$ 模型对应的目标值
$J(\theta,x,y)$ 为神经网络的损失函数
对 $J(\theta,x,y)$ 在 $\theta$ 附近做线性近似，得到无穷范数限制的最优的扰动 $\eta=\epsilon sign(\nabla_x J(\theta,x,y))$

这是产生对抗性样本的“fast gradient sign method”方法，需要利用 backpropagation快速得到梯度。
当 $\epsilon=0.25$ 时，
shallow softMax 分类器在Mnist数据集上error rate $99.9\%$ average confidence $79.3\%$
maxout 分类器在Mnist数据集上 error rate $89.4\%$ average confidence $97.6\%$

ADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY

最简单的概率模型是逻辑回归模型。
假设要训练一个逻辑回归模型来识别标签 $y\in \{-1,1\}$ ，预测函数为 $P_{(y=1)}=\sigma(\omega^{\top}x+b)$ (PS:意味着 $P_{(y=-1)}=1-\sigma(\omega^{\top}x+b)$ )，其中 $\sigma$ 函数是sigmoid函数，那么该样本的损失函数为： $\mathbb{E}_{x,y\sim P_{data}}\zeta(-y(\omega^{\top}x+b))$ ，其中 $\zeta(z)=log(1+e^z)$ 是softplus函数。

对该模型使用FGSM方法，扰动量 $\eta=\epsilon sign(\nabla_x J(\theta,x,y))=\epsilon sign(\nabla_x \zeta(-y(\omega^{\top}x+b)))=\epsilon sign(-\omega^{\top}*\sigma(-(\omega^\top x+b)))=-\epsilon sign(-\omega)=-\epsilon(\omega)$ 且 $\omega^{\top}sign(\omega)=||\omega||_1$ , 那么逻辑回归模型的对抗形式即为: $\mathbb{E}_{x,y\sim P_{data}}$ $\zeta(-y(\omega^{\top}\tilde{x}+b))$ $\underbrace{=}_{\tilde{x}=x+\eta,\eta=\epsilon-sign(\omega)}$ $\mathbb{E}_{x,y\sim P_{data}}$ $\zeta(y(\epsilon||\omega||_1-\omega^{\top}x-b))$

上面公式非常类似于L1正则化，但不同的是 $L^1$ 正则化是在训练过程中，为模型的激活函数减去 $L^1$ 惩罚项；而本文的方法却是为模型的损失函数添加 $L^1$ 惩罚项。如果模型里的 $\zeta$ 饱和（损失函数饱和，即训练基本完成），则损失函数 $\zeta$ 已不再被惩罚。
在深度神经网络中，Weight decay高估了扰动带来的影响。因为 $L^1$ Weight decay高估了一个对抗性样本（adversary）的影响，所以它需要使用比与特征相关的 $e p s i l o n$ 更小的 $L^1$ Weight decay系数。（即，在使用 $L^1$ Weight decay时，权重系数一定需要比 $e p s i l o n$ 小）

问题

黄色荧光的这句话能否理解为 “adversarial training会使得模型无法拟合合适的函数，而 $L^1$ weight decay更不会拟合出适合的函数，因为它在good margin的情况下无法激活”？This is somewhat similar to $L^1$ regularization. However, there are some important differences. Most significantly, the $L^1$ penalty is subtracted off the model’s activation during training, rather than added to the training cost. This means that the penalty can eventually start to disappear if the model learns to make confident enough predictions that ζ saturates.This is not guaranteed to happen—in the underfitting regime, adversarial training will simply worsen underfitting. We can thus view $L^1$ weight decay as being more “worst case” than adversarial training, because it fails to deactivate in the case of good margin.
上一个问题中，good margin是什么？We can thus view $L^1$ weight decay as being more “worst case” than adversarial training, because it fails to deactivate in the case of good margin.
不能理解 $L^1$ weight decay高估对抗性样本（adversary）的破坏的原因？ Weight decay overestimates the damage achievable with perturbation even more in the case of a deep network with multiple hidden units. Because $L^1$ weight decay overestimates the amount of damage an adversary can do, it is necessary to use a smaller $L^1$ weight decay coefficient than the ε associated with the precision of our features.
$\mathbb{E}_{x,y\sim P_{data}}$ $\zeta(-y(\omega^{\top}\tilde{x}+b))$ $\underbrace{=}_{\tilde{x}=x+\eta,\eta=\epsilon-sign(\omega)}$ $\mathbb{E}_{x,y\sim P_{data}}$ $\zeta(y(\epsilon||\omega||_1-\omega^{\top}x-b))$ 中，
计算后得到
$-y(\omega^{\top}\tilde{x}+b)$
$=-y(\omega^{\top}(x+\epsilon-sign(w))+b)$
$=y(||\omega||_1-\omega^{\top}\epsilon-\omega^{\top}x-b)$
无法得到论文中的公式，即为什么 $||\omega||_1-\omega^{\top}\epsilon等于\epsilon||\omega||_1$ ?

EnEn1998

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES

阅读笔记 EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLESABSTRACTINTRODUCTIONRELATED WORKTHE LINEAR EXPLANATION OF ADVERSARIAL EXAMPLESLINEAR PERTURBATION OF NON-LINEAR MODELSADVERSARIAL TRAINING OF LINEAR MODELS VERSUS WEIGHT DECAY问题初次编辑于2020.9.7 by EnEnABSTR
复制链接

扫一扫