The Limitations of Deep Learning in Adversarial Settings
paper notes:
This paper introduces the background of adversarial examples including adversarial goals, capabilities and then explains how to generate adversarial examples by forward gradient: just take the derivative of network on input features. that is,
another picture helps to understand:
The general algorithm they proposed:
In this paper, saliency map is soptlight. They induced this based on the forward derivative and help us understand the existing of adv examples and they succeed both by increasing or decreasing pixel intensities.
they found decreasing is more less successful because it reduces information entropy and makes harder to extract information by dnn to classify.
In evaluation, they study class pair(source-target) and found that there exists some pairs are harder. they do the hardness measure (measure of quantifying the distance between two classes) and adversarial distance (predictive measure from adversarial saliency maps). At last, they study the human perception of adversarial samples.
Strengths:
1.reducing the distortion (L0: the number of features altered)
2.induce the adversarial saliency map
3.mitigate the adversarial examples: measure hardness and adversarial distance.
Detailed comments, possible improvements, or related ideas:
1.defense is possible by evaluating the regularity of examples. for example, the squared difference between each pair of neighbouring pixels is always higher for adversarial examples than for benign examples.