撰写时间:2017.12.15
Introduction
:给图像加干扰,来迷惑已有的模型,使模型误分类为指定label
adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset , such that the perturbed input results in the model outputting an incorrect answer with high confidence
fooling images–producing images that are unrecognizable to humans, but are nonetheless labeled as recognizable objects by DNNs,they demonstrated how a DNN will classify a noise-filled image constructed using their technique
as a television with high confidence.
reason – We argue instead that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature.
phenomenon – several machine learning models, including state-of-the-art neural networks, are vulnerable to adversarial examples.
example – it is possible to take an image that a state-of-the-art Convolutional Network thinks is one class (e.g. “panda”), and it is possible to change it almost imperceptibly to the human eye in such a way that the Convolutional Network suddenly classifies the image as any other class of choice (e.g. “gibbon”). We say that we break, or fool ConvNets. See the image below for an illustration:
How to calculate the small vector
FGSM(Fast Gradient Sign Method)
method:
也就是通过损失函数对输入x求导,生成变换矩阵.见上图,更新图片
JSMA
(1) compute the forward derivative ∇F(X∗) ,
i表示样本编号,j表示类别
(2) construct a saliency map S based on the derivative,
(3) modify an input feature imax=argmaxiS(X,Y∗)[i] by θ
总结:也就是使用输出Y对输入X求导,然后构建saliency map,更新x
But if We do not know the structure of the Model?Can we generize adversarial sample?
black-box attack
(a) has no information about the structure or parameters of the DNN,
(b) does not have access to any large training dataset.
strategy
1.Substitute Model Training : build a model F approximating the oracle model O’s decision boundaries.
2.Adversarial Sample Crafting : use substitute network F to craft adversarial samples, which are then misclassified by oracle O due to the transferability of adversarial samples.
substitute model training
- Initial Collection (1): The adversary collects a very small set S0 of inputs representative of the input domain
- Architecture Selection (2): The adversary selects an architecture to be trained as the substitute F.
- Substitute Training: The adversary iteratively trains more accurate substitute DNNs
Fρ
by repeating the following for
ρ∈0…ρmax
:
- Labeling (3): By querying for the labels output O(x) by oracle O, the adversary labels each sample x∈Sρ in its initial substitute training set Sρ .
- Training (4): The adversary trains the architecture chosen at step (2) using substitute training set Sρ in conjunction with classical training techniques.
- Augmentation (5): The adversary applies our augmentation technique on the initial substitute training set Sρ to produce a larger substitute training set Sρ+1 with more synthetic training points
总结:该方法就是先拟合一个替代模型F,与未知模型O有相同的决策边界.然后使用上面介绍的两种方法去修改输入X.
如何拟合F:因为没有原始的大量的数据,所以先找了一个小的数据集,然后使用论文中提到的Jacobian-based Dataset Augmentation.进行数据生成.
Reference:
Fast Gradient Sign Method : https://arxiv.org/pdf/1412.6572.pdf
JSMA : https://arxiv.org/pdf/1511.07528.pdf
Practical Black-Box Attacks against Machine Learning : https://arxiv.org/pdf/1602.02697.pdf