概述
这篇文章属于机器学习公平性领域,其中文名为《通过公平性攻击加剧算法偏差》。
其主要贡献为:
提出两种针对公平性的投毒攻击
(1)锚定攻击(anchoring attack )
通过将中毒点放置在特定目标点附近来使决策边界偏斜
(2)影响攻击(influence attack)
最大化敏感属性与决策结果之间的协方差
详述
一、背景
机器学习模型可能会受到各种类型的对抗攻击,从而降低机器学习模型的性能。像准确性一样,恶意攻击者也可以将公平性作为攻击目标。
二、投毒攻击背景
三、基于公平性的投毒攻击
1、影响攻击
Cov ( z , d θ ( x ) ) ≈ 1 N ∑ i = 1 N ( z i − z ˉ ) d θ ( x i ) \operatorname{Cov}\left(z, d_{\theta}(x)\right) \approx \frac{1}{N} \sum_{i=1}^{N}\left(z_{i}-\bar{z}\right) d_{\theta}\left(x_{i}\right) Cov(z,dθ(x))≈N1∑i=1N(zi−zˉ)dθ(xi)
L a d v ( θ ^ ; D test ) = ℓ acc + λ ℓ fairness where ℓ fairness = 1 N ∑ i = 1 N ( z i − z ˉ ) d θ ^ ( x i ) \begin{aligned} L_{a d v}\left(\hat{\theta} ; \mathcal{D}_{\text {test }}\right) &=\ell_{\text {acc }}+\lambda \ell_{\text {fairness }} \\ \text { where } \ell_{\text {fairness }} &=\frac{1}{N} \sum_{i=1}^{N}\left(z_{i}-\bar{z}\right) d_{\hat{\theta}}\left(x_{i}\right) \end{aligned} Ladv(θ^;Dtest ) where ℓfairness =ℓacc +λℓfairness =N1i=1∑N(zi−zˉ)dθ^(xi)
2、锚定攻击
(1)随机锚定攻击
随机选择目标
(2)非随机锚定攻击
选择与周围其他敏感属性相同标签相同距离近的点为目标
四、实验
1、公平性指标
(1)统计均等差异(Statistical Parity Difference)
S P D = ∣ p ( Y ^ = + 1 ∣ x ∈ D a ) − p ( Y ^ = + 1 ∣ x ∈ D d ) ∣ S P D=\left|p\left(\hat{Y}=+1 \mid x \in \mathcal{D}_{a}\right)-p\left(\hat{Y}=+1 \mid x \in \mathcal{D}_{d}\right)\right| SPD=∣∣∣p(Y^=+1∣x∈Da)−p(Y^=+1∣x∈Dd)∣∣∣
(2)机会均等差异(Equality of Opportunity Difference)
E O D = ∣ p ( Y ^ = + 1 ∣ x ∈ D a , Y = + 1 ) − p ( Y ^ = + 1 ∣ x ∈ D d , Y = + 1 ) ∣ \begin{aligned} E O D=& \mid p\left(\hat{Y}=+1 \mid x \in \mathcal{D}_{a}, Y=+1\right) \\ &-p\left(\hat{Y}=+1 \mid x \in \mathcal{D}_{d}, Y=+1\right) \mid \end{aligned} EOD=∣p(Y^=+1∣x∈Da,Y=+1)−p(Y^=+1∣x∈Dd,Y=+1)∣
2、方法
(1)Influence Attack on Fairness (IAF)
Influence attack on fairness is the most effective amongst all the attacks in attacking fairness measures.
(2)Random Anchoring Attack (RAA)
(3)Non-random Anchoring Attack (NRAA)
(4)Influence Attack (Koh et al.)
只攻击准确性,没有攻击公平性
(5)Poisoning Attack Against Algorithmic Fairness (Solans et al.) I