Adversarial Sample misleading the Model(生成对抗样本迷惑模型)

最新推荐文章于 2024-01-28 08:02:42 发布

selous

最新推荐文章于 2024-01-28 08:02:42 发布

阅读量3.5k

点赞数 1

分类专栏：机器学习

本文链接：https://blog.csdn.net/selous/article/details/78815393

版权

机器学习专栏收录该内容

25 篇文章 1 订阅

订阅专栏

撰写时间:2017.12.15

Introduction

:给图像加干扰,来迷惑已有的模型,使模型误分类为指定label
adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset , such that the perturbed input results in the model outputting an incorrect answer with high confidence

fooling images–producing images that are unrecognizable to humans, but are nonetheless labeled as recognizable objects by DNNs,they demonstrated how a DNN will classify a noise-filled image constructed using their technique
as a television with high confidence.

reason – We argue instead that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature.

phenomenon – several machine learning models, including state-of-the-art neural networks, are vulnerable to adversarial examples.

example – it is possible to take an image that a state-of-the-art Convolutional Network thinks is one class (e.g. “panda”), and it is possible to change it almost imperceptibly to the human eye in such a way that the Convolutional Network suddenly classifies the image as any other class of choice (e.g. “gibbon”). We say that we break, or fool ConvNets. See the image below for an illustration:

这里写图片描述

How to calculate the small vector

FGSM(Fast Gradient Sign Method)

method:

η = s i g n (\nabla x J (θ, x, y))

$η = sign (\nabla_xJ(θ,x, y))$
也就是通过损失函数对输入x求导,生成变换矩阵.见上图,更新图片

JSMA

(1) compute the forward derivative $\nabla F(X^∗)$ ,

\nabla F (X) = d F ( X ) d ( X ) = d F j ( X ) d x i, i \in 1 \dots N, j \in 1 \dots K

$\nabla F(X) = \frac{dF(X)}{d(X)} = \frac{dF_j(X)}{dx_i} ,i\in1 \dots N,j\in 1 \dots K$
i表示样本编号,j表示类别
(2) construct a saliency map S based on the derivative,

这里写图片描述

(3) modify an input feature $i_{max}= \arg\max_iS(X,Y^*)[i]$ by $\theta$

总结:也就是使用输出Y对输入X求导,然后构建saliency map,更新x

But if We do not know the structure of the Model?Can we generize adversarial sample?

black-box attack

(a) has no information about the structure or parameters of the DNN,
(b) does not have access to any large training dataset.

strategy

1.Substitute Model Training : build a model F approximating the oracle model O’s decision boundaries.
2.Adversarial Sample Crafting : use substitute network F to craft adversarial samples, which are then misclassified by oracle O due to the transferability of adversarial samples.

substitute model training

Initial Collection (1): The adversary collects a very small set S0 of inputs representative of the input domain
Architecture Selection (2): The adversary selects an architecture to be trained as the substitute F.
Substitute Training: The adversary iteratively trains more accurate substitute DNNs Fρ by repeating the following for ρ∈0…ρmax :
- Labeling (3): By querying for the labels output $O(x)$ by oracle O, the adversary labels each sample $x \in S_\rho$ in its initial substitute training set $S_\rho$ .
- Training (4): The adversary trains the architecture chosen at step (2) using substitute training set $S_\rho$ in conjunction with classical training techniques.
- Augmentation (5): The adversary applies our augmentation technique on the initial substitute training set $S_\rho$ to produce a larger substitute training set $S_{\rho+1}$ with more synthetic training points