NeurIPS2019-Learning to Confuse_ Generating Training Time Adversarial Data with Auto-Encoder 的理解

Start point

Importance

The task of adding imperceivably noises to the training data in order to confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when face clean test data.

Disadvantages of pervious work

Data poisoning

Focus on the restriction that only few training data is allowed to change, whereas this focus on adding bounder noises as small as possible.

Adversarial examples

  1. Since the classifier is given and fixed, there is no two-party game involved.
  2. Deep model is very sensitive to such adversarial examples due to the high-dimensionality of the input data and the linearity nature inside deep neural networks.

Background and hypothesis

Research background

In other words, we consider the task of adding imperceivably noises to the training data, hoping to maximally confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when facing clean test data.

Hypothesis

  1. Use as imaginary neural network as the victim classifier
  2. Measure the predictive accuracy between the true label and the predictions taking only adversarial noises as inputs.
  3. It is the linearity inside deep models that make the adversarial effective.

Central idea

In this work, we propose a general framework for generating training time adversarial data by letting an auto-encoder watch and move against an imaginary victim classifier. We further proposed a simple yet effective training scheme to train both networks by decoupling the alternating update procedure for stability. Experiments on image data confirmed the effectiveness of the proposed method, in particular, such adversarial data is still effective even to use a different classifier, making it more useful in a realistic setting.

Plan

Module1 Deep Confuse

Goal

The goal for this work is to perturb the training data by adding artificially imperceivably noise such that during testing time, the classifier’s behavior will be dramatically different on the clean test-set.

Basic idea

First define a noise generator (take one training sample x in X and transform it into an imperceivably noise pattern in the same space X)choose the noise generator to be encoder-decoder neural network (find a noise generator have the worst performance on the cleaned test set)

Problem

This non-convex optimization problem is challenging, especially due to the nonlinear equality constraint.

Module2 Mem-Efficient Deep Confuse (using some commonly accepted tricks in reinforcement learning for stability)

Assumption

fθ and gξ to be neural networks

Goal

The same as module 1

Basic idea

Alternatively update fθ over adversarial training data via gradient descent and update gξ over clean data via gradient ascent.

Problem

if we directly using this alternating approach, both networks fθ and gξ won’t converge in practice.

Label Specific Adversaries

Goal

  1. Want the classifier to make the wrong predictions
  2. Want the classifier’s predictions

Main idea

Denote η as a predefined label transformation function which maps one label to another.

Problem

Replace the gradient ascent into gradient decent in line 10 in Algorithm 2 and replace η(y) to y in the same line while keeping others unchanged.

Experiment

Dataset

Classical MNIST,CIFAR-10 and a subset of ImageNet

Evaluation

Performance Evaluation of Training Time Adversary

  1. Test accuracy when the classifier is trained on the original training set and the adversarial training set.
  2. Concretely, we fit a PCA [24] model on the final hidden layer’s output for each fθ on the adversarial training data, then using the same projection model, we projected the clean data into the same space. It can be shown that the classifier trained on the adversarial data cannot differentiate the clean samples.
  3. Test accuracy refers to the corresponding model performance trained on the different adversarial training data with different ε.
  4. Examine the result when the training data is partially modified. Concretely, under different perturbation constraint, we varied the percentage of adversaries in the training data while keeping other configurations the same.
    Evaluation of Transferability
  5. After the adversarial data is obtained, we then train several different classifiers on the same adversarial data and evaluate their performance on the clean test set.
  6. From the experimental results, it can be shown that adversarial noises produced by gξ are general enough such that even non-NN classifiers as random forest and SVM are also vulnerable and produce poor results as expected.
  7. Test performance when using different model architectures.
  • 1
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值