Towards Class-Oriented Poisoning Attacks Against Neural Networks 论文笔记

最新推荐文章于 2024-11-10 20:25:23 发布

wwweiyx

最新推荐文章于 2024-11-10 20:25:23 发布

阅读量577

点赞数

分类专栏： AI安全文章标签：论文阅读算法

本文链接：https://blog.csdn.net/weiyuxin107/article/details/127984495

版权

AI安全专栏收录该内容

19 篇文章 9 订阅

订阅专栏

#论文笔记#

1. 论文信息

论文名称	Towards Class-Oriented Poisoning Attacks Against Neural Networks
作者	Bingyin Zhao
会议/出版社	WACV 2022
pdf	📄在线pdf
代码	无

基于类别的 availability attacks，不同于原本的 availability attacks 只考虑降低模型的整体准确率，本文还考虑了降低特定类的准确率或迫使模型将其他类都预测为目标类。

2. introduction

本文提出了面向类别的 availability attacks，通过梯度优化的方法生成 posion data。使用该 posion data 训练出的模型在特定类别上的准确率发生异常。

availability attacks 的优化目标，bi-level optimization problem

$\underset{\mathcal{D}_{p}}{\arg \max } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{\text {val }}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y, \theta^{*}\right]$
S.t. $\theta^{*} \in \underset{\theta^{*} \in \Theta}{\arg \min } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{t r} \cup \mathcal{D}_{p}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y, \theta\right]$ ,
威胁模型
- 攻击者知道，算法结构，超参数以及训练数据集
- 攻击者可以对训练数据集注入有毒数据并且修改标签

3. method

两种攻击方式：

Class-Oriented availability attacks 可以为两种：

COEG class-oriented error-generic

目标：让模型将所有的输入都分类成目标类（supplanter class）

目标函数： $\underset{\mathcal{D}_{p}}{\arg \max } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{\text {val }}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y, \theta^{*}\right]$
s.t. $\quad \theta^{*} \in \underset{\theta^{*} \in \Theta}{\arg \min } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{t r} \cup \mathcal{D}_{p}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y_{s}, \theta\right]$ ,

$y_s$ 表示目标类
在这里插入图片描述

COES class-oriented error-specific

目标：降低 victim classes 的准确率，保持 non-victim classes(其他类) 的准确率不变

目标函数： $\underset{\mathcal{D}_{p}}{\arg \max } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{v a l}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y_{v}, \theta^{*}\right]$
s.t. $\quad \theta^{*} \in \underset{\theta^{*} \in \Theta}{\arg \min } \sum_{(\boldsymbol{x}, y) \in \mathcal{D}_{t r \cup \mathcal{D}_{p}}} L\left[\mathcal{F}_{\theta^{*}}(\boldsymbol{x}), y_{\bar{v}}, \theta\right]$ ,

$y_v$ 表示 victim classes， $y_{\bar{v}}$ 表示 non-victim classes

在这里插入图片描述

两种攻击方式的训练方法

COEG Attack

目标函数：
- $L=\lambda \cdot L_{f_{y_{s}}}-L_{f_{y_{o}}}$
- $L_{f_{y_{s}}}=f_{y_{s}}(\boldsymbol{x})$
- $f_{y_k}$ as the corresponding logit to the categorical label $y_k$
- $f_{y_{o}}(\boldsymbol{x})$ is the logit output of the groundtruth class
poisoned image $x_{p}$

$\boldsymbol{x}_{\boldsymbol{p}}=\boldsymbol{x}_{\boldsymbol{o}}-\epsilon \cdot \operatorname{sign}\left(\nabla_{\boldsymbol{x}_{o}}\left(\lambda \cdot L_{f_{y_{s}}}-L_{f_{y_{o}}}\right)\right)$

算法流程
COES Attack

COES 既要降低目标类的准确率，又要保持其他类的准确率。

加毒的过程分为：
1. 在每一类中选取相同的数量的图片
2. 通过算法2提升或者减少每幅图像与 label 信息对应的特征信息
3. 改变目标类的标签
目标函数：

$\begin{cases}\lambda \cdot L_{f_{y_{s}}}-L_{f_{y_{o}}}, & \text { if } x_{o} \in \mathcal{C}_{v} \\ L_{f_{y_{o}}}, & \text { otherwise }\end{cases}$

poisoned image $x_{p}$

$\begin{cases}x_{o}-\epsilon \cdot \operatorname{sign}\left(\nabla_{x_{o}}\left(\lambda \cdot L_{f_{y_{s}}}-L_{f_{y_{o}}}\right)\right), & \text { if } x_{o} \in \mathcal{C}_{v} \\ x_{o}+\epsilon \cdot \operatorname{sign}\left(\nabla_{x_{o}}\left(L_{f_{y_{o}}}\right)\right), & \text { otherwise }\end{cases}$

算法2：