NeurIPS2019-Learning to Confuse_ Generating Training Time Adversarial Data with Auto-Encoder 的理解

最新推荐文章于 2021-03-01 10:39:34 发布

想想虔诚怎么做

最新推荐文章于 2021-03-01 10:39:34 发布

阅读量668

点赞数 1

分类专栏：论文阅读文章标签：数据下毒对抗样本自动编码器

本文链接：https://blog.csdn.net/qq_41409438/article/details/100746997

版权

论文阅读专栏收录该内容

22 篇文章 4 订阅

订阅专栏

Start point

Importance

The task of adding imperceivably noises to the training data in order to confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when face clean test data.

Disadvantages of pervious work

Data poisoning

Focus on the restriction that only few training data is allowed to change, whereas this focus on adding bounder noises as small as possible.

Adversarial examples

Since the classifier is given and fixed, there is no two-party game involved.
Deep model is very sensitive to such adversarial examples due to the high-dimensionality of the input data and the linearity nature inside deep neural networks.

Background and hypothesis

Research background

In other words, we consider the task of adding imperceivably noises to the training data, hoping to maximally confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when facing clean test data.

Hypothesis

Use as imaginary neural network as the victim classifier
Measure the predictive accuracy between the true label and the predictions taking only adversarial noises as inputs.
It is the linearity inside deep models that make the adversarial effective.

Central idea

In this work, we propose a general framework for generating training time adversarial data by letting an auto-encoder watch and move against an imaginary victim classifier. We further proposed a simple yet effective training scheme to train both networks by decoupling the alternating update procedure for stability. Experiments on image data confirmed the effectiveness of the proposed method, in particular, such adversarial data is still effective even to use a different classifier, making it more useful in a realistic setting.

Plan

Module1 Deep Confuse

Goal

The goal for this work is to perturb the training data by adding artificially imperceivably noise such that during testing time, the classifier’s behavior will be dramatically different on the clean test-set.

Basic idea

First define a noise generator (take one training sample x in X and transform it into an imperceivably noise pattern in the same space X)choose the noise generator to be encoder-decoder neural network (find a noise generator have the worst performance on the cleaned test set)

Problem

This non-convex optimization problem is challenging, especially due to the nonlinear equality constraint.

Module2 Mem-Efficient Deep Confuse (using some commonly accepted tricks in reinforcement learning for stability)

Assumption

fθ and gξ to be neural networks

Goal

The same as module 1

Basic idea

Alternatively update fθ over adversarial training data via gradient descent and update gξ over clean data via gradient ascent.

Problem

if we directly using this alternating approach, both networks fθ and gξ won’t converge in practice.

Label Specific Adversaries

Goal

Want the classifier to make the wrong predictions
Want the classifier’s predictions

Main idea

Denote η as a predefined label transformation function which maps one label to another.

Problem

Replace the gradient ascent into gradient decent in line 10 in Algorithm 2 and replace η(y) to y in the same line while keeping others unchanged.

Experiment

Dataset

Classical MNIST，CIFAR-10 and a subset of ImageNet

Evaluation

Performance Evaluation of Training Time Adversary

Test accuracy when the classifier is trained on the original training set and the adversarial training set.
Concretely, we fit a PCA [24] model on the final hidden layer’s output for each fθ on the adversarial training data, then using the same projection model, we projected the clean data into the same space. It can be shown that the classifier trained on the adversarial data cannot differentiate the clean samples.
Test accuracy refers to the corresponding model performance trained on the different adversarial training data with different ε.
Examine the result when the training data is partially modified. Concretely, under different perturbation constraint, we varied the percentage of adversaries in the training data while keeping other configurations the same.
Evaluation of Transferability
After the adversarial data is obtained, we then train several different classifiers on the same adversarial data and evaluate their performance on the clean test set.
From the experimental results, it can be shown that adversarial noises produced by gξ are general enough such that even non-NN classifiers as random forest and SVM are also vulnerable and produce poor results as expected.
Test performance when using different model architectures.

想想虔诚怎么做

关注

1
点赞
踩
0

收藏

觉得还不错? 一键收藏
0
评论
NeurIPS2019-Learning to Confuse_ Generating Training Time Adversarial Data with Auto-Encoder 的理解

Start pointImportanceThe task of adding imperceivably noises to the training data in order to confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much a...
复制链接

扫一扫

专栏目录