2022-06-01
对 Using Honeypots to Catch Adversarial Attacks on Neural Networks (2020) 这篇文章的相关调研
Trojaning Attack on Neural Networks (2017)
首先反转神经元网络以生成通用木马触发器,然后使用外部数据集重新训练模型以将恶意行为注入模型。恶意行为仅由带有木马触发器标记的输入激活。
BadNets: Identifying Vulnerabilities in the Machine Learning Model Supply Chain (2017)
该论文描述了对一个神经网络模型注入后门的方法以及后门对模型的影响
Adversarial examples are not easily detected:Bypassing ten detection methods(2017)
A Partial Break of the Honeypots Defense to Catch Adversarial Attacks (2020)
该论文是对于Honeypots 的基础版本的防御性的分析,文中提出的攻击方式使得基础版本完全无效,而Honeypots的提出者也根据这篇论文引入了随机性和不同层来减轻这种攻击。
Evading Adversarial Example Detection Defenses with Orthogonal Projected Gradient Descent (2021)
规避 “对抗性示例检测防御” 需要同时满足 (a) 被模型错误分类; (b) 被检测为非对抗性的对抗性示例。我们发现,试图满足多个同时约束的现有攻击通常会以满足另一个约束为代价对一个约束进行过度优化。我们介绍了正交投影梯度下降,这是一种改进的攻击技术,用于生成对抗样本,在运行标准的基于梯度的攻击时通过正交化梯度来避免这个问题。该方法可以成功躲避Honeypots的检测
Feature-Indistinguishable Attack to Circumvent Trapdoor-Enabled Defense (2021)
论文提出了一种针对 Honeypots 的新型黑盒对抗性攻击,称为特征不可区分攻击 (FIA)。它通过制作在特征(即神经元激活)空间中与目标类别中的良性示例无法区分的对抗性示例来规避 Honeypots 的检测
Attack as Defense: Characterizing Adversarial Examples using Robustness (2021)
我们提出了一种新的特征来区分对抗样本和良性样本,基于对抗样本明显不如良性样本稳健的观察结果。由于现有的鲁棒性测量无法扩展到大型网络,我们提出了一种新的防御框架,称为攻击即防御 (A 2 D),通过有效评估示例的鲁棒性来检测对抗性示例。2 _D 使用攻击输入的成本进行稳健性评估,并将那些不太稳健的示例识别为对抗性的,因为不太稳健的示例更容易受到攻击。
其他的检测模型
Dongyu Meng and Hao Chen. 2017. Magnet: a two-pronged defense against adversarial examples. In Proc. of CCS.
Weilin Xu, David Evans, and Yanjun Qi. 2018. Feature squeezing: Detecting adversarial examples in deep neural networks. In Proc. of NDSS
Characterizing adversarial subspaces using local intrinsic dimensionality. In Proc. of ICLR. (2018)
Philip Sperl, Ching-yu Kao, Peng Chen, and Konstantin Böttinger. DLA: dense-layer analysis for adversarial example detection. CoRR, abs/1911.01921, 2019.
Detection based defense against adversarial examples from the steganalysis point of view. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4825–4834, 2019. (CVPR)
Detecting adversarial examples from sensitivity inconsistency of spatial-transform domain. arXiv preprint arXiv:2103.04302, 2021. (AAAI)