Start point
Importance
The task of adding imperceivably noises to the training data in order to confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when face clean test data.
Disadvantages of pervious work
Data poisoning
Focus on the restriction that only few training data is allowed to change, whereas this focus on adding bounder noises as small as possible.
Adversarial examples
- Since the classifier is given and fixed, there is no two-party game involved.
- Deep model is very sensitive to such adversarial examples due to the high-dimensionality of the input data and the linearity nature inside deep neural networks.
Background and hypothesis
Research background
In other words, we consider the task of adding imperceivably noises to the training data, hoping to maximally confuse any corresponding classifier trained on it by letting it to make the wrong predictions as much as possible when facing clean test data.
Hypothesis
- Use as imaginary neural network as the victim classifier
- Measure the predictive accuracy between the true label and the predictions taking only adversarial noises as inputs.
- It is the linearity inside deep models that make the adversarial effective.
Central idea
In this work, we propose a general framework for generating training time adversarial data by letting an auto-encoder watch and move against an imaginary victim classifier. We further proposed a simple yet effective training scheme to train both networks by decoupling the alternating update procedure for stability. Experiments on image data confirmed the effectiveness of the proposed method, in particular, such adversarial data is still effective even to use a different classifier, making it more useful in a realistic setting.
Plan
Module1 Deep Confuse
Goal
The goal for this work is to perturb the training data by adding artificially imperceivably noise such that during testing time, the classifier’s behavior will be dramatically different on the clean test-set.
Basic idea
First define a noise generator (take one training sample x in X and transform it into an imperceivably noise pattern in the same space X)choose the noise generator to be encoder-decoder neural network (find a noise generator have the worst performance on the cleaned test set)
Problem
This non-convex optimization problem is challenging, especially due to the nonlinear equality constraint.
Module2 Mem-Efficient Deep Confuse (using some commonly accepted tricks in reinforcement learning for stability)
Assumption
fθ and gξ to be neural networks
Goal
The same as module 1
Basic idea
Alternatively update fθ over adversarial training data via gradient descent and update gξ over clean data via gradient ascent.
Problem
if we directly using this alternating approach, both networks fθ and gξ won’t converge in practice.
Label Specific Adversaries
Goal
- Want the classifier to make the wrong predictions
- Want the classifier’s predictions
Main idea
Denote η as a predefined label transformation function which maps one label to another.
Problem
Replace the gradient ascent into gradient decent in line 10 in Algorithm 2 and replace η(y) to y in the same line while keeping others unchanged.
Experiment
Dataset
Classical MNIST,CIFAR-10 and a subset of ImageNet
Evaluation
Performance Evaluation of Training Time Adversary
- Test accuracy when the classifier is trained on the original training set and the adversarial training set.
- Concretely, we fit a PCA [24] model on the final hidden layer’s output for each fθ on the adversarial training data, then using the same projection model, we projected the clean data into the same space. It can be shown that the classifier trained on the adversarial data cannot differentiate the clean samples.
- Test accuracy refers to the corresponding model performance trained on the different adversarial training data with different ε.
- Examine the result when the training data is partially modified. Concretely, under different perturbation constraint, we varied the percentage of adversaries in the training data while keeping other configurations the same.
Evaluation of Transferability - After the adversarial data is obtained, we then train several different classifiers on the same adversarial data and evaluate their performance on the clean test set.
- From the experimental results, it can be shown that adversarial noises produced by gξ are general enough such that even non-NN classifiers as random forest and SVM are also vulnerable and produce poor results as expected.
- Test performance when using different model architectures.