Adversarial Sample misleading the Model(生成对抗样本迷惑模型)

撰写时间:2017.12.15

Introduction

:给图像加干扰,来迷惑已有的模型,使模型误分类为指定label
adversarial examples—inputs formed by applying small but intentionally worst-case perturbations to examples from the dataset , such that the perturbed input results in the model outputting an incorrect answer with high confidence

fooling images–producing images that are unrecognizable to humans, but are nonetheless labeled as recognizable objects by DNNs,they demonstrated how a DNN will classify a noise-filled image constructed using their technique
as a television with high confidence.

reason – We argue instead that the primary cause of neural networks’ vulnerability to adversarial perturbation is their linear nature.

phenomenon – several machine learning models, including state-of-the-art neural networks, are vulnerable to adversarial examples.

example – it is possible to take an image that a state-of-the-art Convolutional Network thinks is one class (e.g. “panda”), and it is possible to change it almost imperceptibly to the human eye in such a way that the Convolutional Network suddenly classifies the image as any other class of choice (e.g. “gibbon”). We say that we break, or fool ConvNets. See the image below for an illustration:

这里写图片描述

How to calculate the small vector

FGSM(Fast Gradient Sign Method)

method:

η=sign(xJ(θ,x,y))

也就是通过损失函数对输入x求导,生成变换矩阵.见上图,更新图片

JSMA

(1) compute the forward derivative F(X) ,

F(X)=dF(X)d(X)=dFj(X)dxi,i1N,j1K

i表示样本编号,j表示类别
(2) construct a saliency map S based on the derivative,

这里写图片描述

(3) modify an input feature imax=argmaxiS(X,Y)[i] by θ

总结:也就是使用输出Y对输入X求导,然后构建saliency map,更新x

But if We do not know the structure of the Model?Can we generize adversarial sample?

black-box attack

(a) has no information about the structure or parameters of the DNN,
(b) does not have access to any large training dataset.

strategy

1.Substitute Model Training : build a model F approximating the oracle model O’s decision boundaries.
2.Adversarial Sample Crafting : use substitute network F to craft adversarial samples, which are then misclassified by oracle O due to the transferability of adversarial samples.

substitute model training

  1. Initial Collection (1): The adversary collects a very small set S0 of inputs representative of the input domain
  2. Architecture Selection (2): The adversary selects an architecture to be trained as the substitute F.
  3. Substitute Training: The adversary iteratively trains more accurate substitute DNNs Fρ by repeating the following for ρ0ρmax :
    • Labeling (3): By querying for the labels output O(x) by oracle O, the adversary labels each sample xSρ in its initial substitute training set Sρ .
    • Training (4): The adversary trains the architecture chosen at step (2) using substitute training set Sρ in conjunction with classical training techniques.
    • Augmentation (5): The adversary applies our augmentation technique on the initial substitute training set Sρ to produce a larger substitute training set Sρ+1 with more synthetic training points

这里写图片描述

总结:该方法就是先拟合一个替代模型F,与未知模型O有相同的决策边界.然后使用上面介绍的两种方法去修改输入X.
如何拟合F:因为没有原始的大量的数据,所以先找了一个小的数据集,然后使用论文中提到的Jacobian-based Dataset Augmentation.进行数据生成.

这里写图片描述

Reference:
Fast Gradient Sign Method : https://arxiv.org/pdf/1412.6572.pdf
JSMA : https://arxiv.org/pdf/1511.07528.pdf
Practical Black-Box Attacks against Machine Learning : https://arxiv.org/pdf/1602.02697.pdf

评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值