2021.12.24 第10篇(CVPR2020) 粗读
论文链接:Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors
代码链接:Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors
Keywords
- detection of adversarial examples
- suitable for any pre-trained neural network classifier
- influence functions
- k-nearest neighbor
Contributions
We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN’s activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this correlation is much weaker for adversarial inputs. We train an adversarial detector using the k-NN ranks and distances and show that it successfully distinguishes adversarial examples.
Figure 1展示了正常样本的最近邻和最有用的训练样本在 PCA 嵌入空间中非常接近,而对抗样本则没有表现出相同的对应关系(这个有意思)。
Methods
上图是本文所提的NNIF算法,比较好理解,不过感觉核心的算法貌似没放出来,可能是因为我没仔细看,有兴趣的可以自己看论文。
Results
下面就是一些比先前工作好的图,不再介绍了: