Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors

最新推荐文章于 2024-09-03 22:57:13 发布

Daft shiner

最新推荐文章于 2024-09-03 22:57:13 发布

阅读量1.8k

点赞数

分类专栏：论文分享文章标签：深度学习机器学习计算机视觉

本文链接：https://blog.csdn.net/weixin_46782905/article/details/122129864

版权

论文分享专栏收录该内容

29 篇文章 5 订阅

订阅专栏

2021.12.24 第10篇(CVPR2020) 粗读
论文链接：Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors
代码链接：Detecting Adversarial Samples Using Influence Functions and Nearest Neighbors
在这里插入图片描述

Keywords

detection of adversarial examples
suitable for any pre-trained neural network classifier
influence functions
k-nearest neighbor

Contributions

We use influence functions to measure the impact of every training sample on the validation set data. From the influence scores, we find the most supportive training samples for any given validation example. A k-nearest neighbor (k-NN) model fitted on the DNN’s activation layers is employed to search for the ranking of these supporting training samples. We observe that these samples are highly correlated with the nearest neighbors of the normal inputs, while this correlation is much weaker for adversarial inputs. We train an adversarial detector using the k-NN ranks and distances and show that it successfully distinguishes adversarial examples.

在这里插入图片描述
Figure 1展示了正常样本的最近邻和最有用的训练样本在 PCA 嵌入空间中非常接近，而对抗样本则没有表现出相同的对应关系(这个有意思)。