论文阅读:Explaining and Harnessing Adversarial Examples(解释分析对抗样本)


AdversarialExample的延伸,表明不仅仅只有NeuralNetwork有这种现象,在此文章之前对该问题的解释集中在nonlinearityoverfitting上。但是本文提出这些算法对于对抗样本脆弱性的主要原因正是在于它们线性的本质。并通过定量分析来解释Adversarial Example在不同架构不同数据集能够work的原因。并以此提出一种快速产生AdversarialExample的方法


对于Adversarial Example出现的推断性的解释是深度神经网络的高度非线性特征,以及纯粹的监督学习模型中不充分的模型平均和不充分的正则化所导致的过拟合。而本文认为线性模型只要在高维空间的情况下就足以产生Adversarial Example。并且本文提出对抗训练能起到类似于dropout等泛化作用(但因为训练时间效率问题,没有实际应用),但是传统的泛化手段不能解决对抗样本的问题,但是将其转换为如RBF的非线性网络就能达到这一点。故而需要权衡模型线性与非线性的关系,线性模型更容易训练,而非线性模型能抵御对抗样本问题。(In the long run, it may be possible to escape this tradeoff by designing morepowerful optimization methods that can succesfully train more nonlinear models.)


These results have often been interpretedas being a flaw in deep networks in particular, even though linear classifiers have the same problem。因为只有深度学习有能力去拟合一个非线性模型,而这是浅层模型所无法处理的。反倒是人们误以为深度学习的方法更容易受到对抗攻击,实际上深度学习能够学习到一个函数能够抵御对抗攻击而其他浅层模型不行。浅层模型不能做到说在不同输入给出不同输出的同时还要给临近的输入得到相同的输出。当然了,也没有理论证明算法是否能发现一个能够完全符合我们期望的函数,就像标准的有监督训练并不能保证能够学习到能够抵御对抗样本的函数,这个过程需要显式的在训练过程体现

快速生成对抗样本:Fast Gradient Sign Method

Fast Gradient Sign Method方法虽然简单,但是因为扰动是根据w来计算的,故而生成的对抗样本不会说100%使得网络进行误分类。作者表示,除此之外也可以通过其他方法产生对抗样本,如使x绕着梯度方向旋转一定的角度,不过文章后面提到这样生成的对抗训练泛化性能并不佳,可能是因为旋转等操作比较容易学习到(旋转矩阵的结构比较简单,However, we did not find nearly aspowerful of a regularizing result from this process, perhaps because thesekinds of adversarial examples are not as difficult to solve.)。这些生成的对抗样本反过来又证明了对抗样本的线性解释







对抗训练能起到一定的正则化作用,但是对抗样本又不同于其他数据增量模型,因为这些通过转换所增加的数据是期望出现在测试集上的,而对抗样本在自然情况下是不可能出现的,并且本文得到的对抗训练在benchmark上的表现也没有说要比dropout好,作者说这可能是使用要的对抗样本范围有限:it was difficult to experiment extensively with expensive adversarialexamples based on L-BFGS.

对抗训练可以对应于如下的正则化【这里的等价是要求这里的θ的参数是已经是训练好的模型下的】,其中在本文实验中α=0.5(随意测试的值,其他值或许更好),通过有dropout网络的对抗训练,能够比单纯的dropout的错误率低(from94% to 0.84%)





Onereason that the existence of adversarial examples can seem counter-intuitive is thatmost of us have poor intuitions forhigh dimensional spaces. Welive in three dimensions, so we are not used to small effects in hundreds of dimensions adding up to create a largeeffect. There is another way that ourintuitions serve us poorly.


The simple RBFnetworks  with low capacity are naturally immune to adversarialexamples, in the sense that they have low confidence when they are fooled. 虽然在FGSM下产生的对抗样本的错误率也有55.4%,但是对这些样本的confidence1.2%,而原本测试集的confidence60.6%We can’t expect a model with such low capacity to get the rightanswer at all points of space, but it does correctly respond by reducing its confidence considerably on points it does not “understand.” 即我们希望得到的性质是说如果“not understand”的样本,即便不具备泛化能力,也不能高置信的给出错误答案



前面提到从一个特定模型得到的对抗样本,在另外一个模型甚至另外一个数据集训练得到的网络上依旧有效,甚至它们还会将对抗样本误分为相同的类。这是因为对抗样本与模型的权值向量高度吻合,同时为了训练执行相同的任务,不同的模型学习了相似的函数。实验通过不同模型误分类的对抗样本的相关性证明的,先用maxout网络产生对抗样本,分别放入RBFsoftmax分类器,16.0%54.6%的错误率(不同模型有不同错误率)。在maxout误分的样本中,softmax错误率达84.6%RBF错误率达54.3%。在softmax误分类的样本中,RBF的错误率达53.6%。是否有必要查看softmaxmaxout都误分类的情况下,RBF的误分类结果?似乎是可算的:a significant proportion of them areconsistent with linear behavior being a major cause of cross-modelgeneralization.





  1. one hypothesis is that generative training could provide more constraint on the training processor cause the model to learn what to distinguish “real” from “fake” data and be confident only on “real” data
  2. Another hypothesis about why adversarial examples exist is that individual models have strange quirks but averaging over many models can cause adversarial examples to wash out。使用12maxout网络的ensemble,对抗样本的错误率还是达91.1%,对于单个的maxout网络产生的对抗样本,ensemble网络的错误率虽然降低到87.9%,所以ensemble对于抵御对抗样本虽然有效,但却有限



  1. Adversarial examples can be explained as a property of high-dimensional dot products. They are a result of models being too linear, rather than too nonlinear.
  2. • The generalization of adversarial examples across different models can be explained as a result of adversarial perturbations being highly aligned with the weight vectors of a model, and different models learning similar functions when trained to perform the same task.
  3. • The direction of perturbation, rather than the specific point in space, matters most. Space is not full of pockets of adversarial examples that finely tile the reals like the rational numbers.
  4. Because it is the direction that matters most, adversarial perturbations generalize across different clean examples.             ??? Why?
  5. • We have introduced a family of fast methods for generating adversarial examples.
  6. • We have demonstrated that adversarial training can result in regularization; even further regularization than dropout.
  7. • We have run control experiments that failed to reproduce this effect with simpler but less efficient regularizers including L1 weight decay and adding noise.
  8. • Models that are easy to optimize are easy to perturb.
  9. Linear models lack the capacity to resist adversarial perturbation; only structures with a hidden layer (where the universal approximator theorem applies) should be trained to resist adversarial perturbation.
  10. RBF networks are resistant to adversarial examples.
  11. Models trained to model the input distribution are not resistant to adversarial examples.
  12. Ensembles are not resistant to adversarial examples.

Some further observations concerning rubbish class examples are presented in the appendix:

  1. • Rubbish class examples are ubiquitous(普遍存在) and easily generated.
  2. • Shallow linear models are not resistant to rubbish class examples.
  3. • RBF networks are resistant to rubbish class examples.




