引言
F
G
S
M
\mathrm{FGSM}
FGSM是基于梯度迭代攻击中生成对抗样本的开创性工作。我第一次接触相关工作的时候,给我困惑最多的就是论文中为什么要给梯度加上
s
i
g
n
\mathrm{sign}
sign这个符号函数,因为这会导致生成的对抗扰动的方向与最速梯度方向有一个锐角偏移。
I
a
n
G
o
o
d
f
e
l
l
o
w
\mathrm{Ian\text{ } Goodfellow}
Ian Goodfellow在斯坦福大学的讲座里对梯度符号中加入
s
i
g
n
\mathrm{sign}
sign符号给出了一个基于实证的解释,即在线性假设下,给定一个数据集的样本,当样本中对抗扰动分量方向与梯度分量的方向相同的个数如果多余某个常数时,该样本沿着对抗扰动的方向即可进入到对抗子区域中。这个解释是在说明加入
s
i
g
n
\mathrm{sign}
sign符号后的扰动方向依然具有攻击性,但从原理上来说这并不是最好的攻击方向。最近看到了一篇文章,就讨论了该问题,论文作者通过原理分析实验验证,发现当梯度方向加入
s
i
g
n
\mathrm{sign}
sign符号后会使得攻击效率比较低。论文的代码链接失效,我根据论文中的算法流程图重新编写了一下代码。
论文链接:https://arxiv.org/abs/2110.12734
理论分析
给定一个样本
x
x
x,其对应的标签为
y
y
y,损失函数为
L
(
x
,
y
)
\mathcal{L}(x,y)
L(x,y),其中第
t
t
t步生成的对抗样本为
x
T
a
d
v
x^{adv}_T
xTadv。根据多元函数的泰勒展开公式可以得到如下方程组
{
L
(
x
T
a
d
v
,
y
)
=
L
(
x
T
−
1
a
d
v
,
y
)
+
(
x
T
a
d
v
−
x
T
−
1
a
d
v
)
⋅
∇
L
(
x
T
−
1
a
d
v
,
y
)
+
O
(
∥
x
T
a
d
v
−
x
T
−
1
a
d
v
∥
2
)
L
(
x
T
−
1
a
d
v
,
y
)
=
L
(
x
T
−
2
a
d
v
,
y
)
+
(
x
T
−
1
a
d
v
−
x
T
−
2
a
d
v
)
⋅
∇
L
(
x
T
−
2
a
d
v
,
y
)
+
O
(
∥
x
T
−
1
a
d
v
−
x
T
−
2
a
d
v
∥
2
)
L
(
x
T
−
2
a
d
v
,
y
)
=
L
(
x
T
−
3
a
d
v
,
y
)
+
(
x
T
−
2
a
d
v
−
x
T
−
3
a
d
v
)
⋅
∇
L
(
x
T
−
3
a
d
v
,
y
)
+
O
(
∥
x
T
−
2
a
d
v
−
x
T
−
3
a
d
v
∥
2
)
⋮
L
(
x
3
a
d
v
,
y
)
=
L
(
x
2
a
d
v
,
y
)
+
(
x
3
a
d
v
−
x
2
a
d
v
)
⋅
∇
L
(
x
2
a
d
v
,
y
)
+
O
(
∥
x
3
a
d
v
−
x
2
a
d
v
∥
2
)
L
(
x
2
a
d
v
,
y
)
=
L
(
x
2
a
d
v
,
y
)
+
(
x
2
a
d
v
−
x
1
a
d
v
)
⋅
∇
L
(
x
1
a
d
v
,
y
)
+
O
(
∥
x
2
a
d
v
−
x
1
a
d
v
∥
2
)
L
(
x
1
a
d
v
,
y
)
=
L
(
x
0
a
d
v
,
y
)
+
(
x
1
a
d
v
−
x
0
a
d
v
)
⋅
∇
L
(
x
0
a
d
v
,
y
)
+
O
(
∥
x
1
a
d
v
−
x
0
a
d
v
∥
2
)
\left\{\begin{aligned}\mathcal{L}(x^{adv}_T,y)&=\mathcal{L}(x^{adv}_{T-1},y)+(x^{adv}_T-x^{adv}_{T-1})\cdot \nabla \mathcal{L}(x^{adv}_{T-1},y)+O(\|x^{adv}_T-x^{adv}_{T-1}\|^2)\\\mathcal{L}(x^{adv}_{T-1},y)&=\mathcal{L}(x^{adv}_{T-2},y)+(x^{adv}_{T-1}-x^{adv}_{T-2})\cdot \nabla \mathcal{L}(x^{adv}_{T-2},y)+O(\|x^{adv}_{T-1}-x^{adv}_{T-2}\|^2)\\\mathcal{L}(x^{adv}_{T-2},y)&=\mathcal{L}(x^{adv}_{T-3},y)+(x^{adv}_{T-2}-x^{adv}_{T-3})\cdot \nabla \mathcal{L}(x^{adv}_{T-3},y)+O(\|x^{adv}_{T-2}-x^{adv}_{T-3}\|^2)\\ \vdots \\ \mathcal{L}(x^{adv}_{3},y)&=\mathcal{L}(x^{adv}_{2},y)+(x^{adv}_{3}-x^{adv}_{2})\cdot \nabla \mathcal{L}(x^{adv}_{2},y)+O(\|x^{adv}_{3}-x^{adv}_{2}\|^2)\\ \mathcal{L}(x^{adv}_{2},y)&=\mathcal{L}(x^{adv}_{2},y)+(x^{adv}_{2}-x^{adv}_{1})\cdot \nabla \mathcal{L}(x^{adv}_{1},y)+O(\|x^{adv}_{2}-x^{adv}_{1}\|^2)\\\mathcal{L}(x^{adv}_{1},y)&=\mathcal{L}(x^{adv}_0,y)+(x^{adv}_{1}-x^{adv}_0)\cdot \nabla \mathcal{L}(x^{adv}_0,y)+O(\|x^{adv}_{1}-x^{adv}_0\|^2) \end{aligned}\right.
⎩⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎧L(xTadv,y)L(xT−1adv,y)L(xT−2adv,y)⋮L(x3adv,y)L(x2adv,y)L(x1adv,y)=L(xT−1adv,y)+(xTadv−xT−1adv)⋅∇L(xT−1adv,y)+O(∥xTadv−xT−1adv∥2)=L(xT−2adv,y)+(xT−1adv−xT−2adv)⋅∇L(xT−2adv,y)+O(∥xT−1adv−xT−2adv∥2)=L(xT−3adv,y)+(xT−2adv−xT−3adv)⋅∇L(xT−3adv,y)+O(∥xT−2adv−xT−3adv∥2)=L(x2adv,y)+(x3adv−x2adv)⋅∇L(x2adv,y)+O(∥x3adv−x2adv∥2)=L(x2adv,y)+(x2adv−x1adv)⋅∇L(x1adv,y)+O(∥x2adv−x1adv∥2)=L(x0adv,y)+(x1adv−x0adv)⋅∇L(x0adv,y)+O(∥x1adv−x0adv∥2)根据以上方程组可以得到如下公式
L
(
x
T
a
d
v
,
y
)
=
L
(
x
,
y
)
+
∑
t
=
0
T
−
1
(
x
t
+
1
a
d
v
−
x
t
a
d
v
)
⋅
∇
L
(
x
t
a
d
v
,
y
)
+
∑
t
=
0
T
−
1
O
(
∥
x
t
+
1
a
d
v
−
x
t
a
d
v
∥
2
)
\mathcal{L}(x^{adv}_T,y)=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}(x^{adv}_{t+1}-x^{adv}_t)\cdot \nabla\mathcal{L}(x^{adv}_t,y)+\sum\limits_{t=0}^{T-1}O(\|x^{adv}_{t+1}-x^{adv}_t\|^2)
L(xTadv,y)=L(x,y)+t=0∑T−1(xt+1adv−xtadv)⋅∇L(xtadv,y)+t=0∑T−1O(∥xt+1adv−xtadv∥2)令
δ
t
=
x
t
+
1
a
d
v
−
x
t
a
d
v
\delta_t=x_{t+1}^{adv}-x_t^{adv}
δt=xt+1adv−xtadv,
g
t
=
∇
L
(
x
t
a
d
v
,
y
)
g_t=\nabla\mathcal{L}(x^{adv}_t,y)
gt=∇L(xtadv,y),进而则有
L
(
x
T
a
d
v
,
y
)
=
L
(
x
,
y
)
+
∑
t
=
0
T
−
1
δ
t
⋅
g
t
+
∑
t
=
0
T
−
1
O
(
∥
δ
t
∥
2
)
=
L
(
x
,
y
)
+
∑
t
=
0
T
−
1
∥
δ
t
∥
⋅
∥
g
t
∥
⋅
cos
⟨
δ
t
,
g
t
⟩
+
∑
t
=
0
T
−
1
O
(
∥
δ
t
∥
2
)
≈
L
(
x
,
y
)
+
∑
t
=
0
T
−
1
∥
δ
t
∥
⋅
∥
g
t
∥
⋅
cos
⟨
δ
t
,
g
t
⟩
\begin{aligned}\mathcal{L}(x^{adv}_T,y)&=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\delta_t \cdot g_t + \sum\limits_{t=0}^{T-1}O(\|\delta_t\|^2)\\&=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\|\delta_t\|\cdot \|g_t\|\cdot \cos\langle \delta_t, g_t \rangle + \sum\limits_{t=0}^{T-1}O(\|\delta_t\|^2)\\&\approx \mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\|\delta_t\|\cdot\|g_t\|\cdot \cos\langle \delta_t, g_t \rangle \end{aligned}
L(xTadv,y)=L(x,y)+t=0∑T−1δt⋅gt+t=0∑T−1O(∥δt∥2)=L(x,y)+t=0∑T−1∥δt∥⋅∥gt∥⋅cos⟨δt,gt⟩+t=0∑T−1O(∥δt∥2)≈L(x,y)+t=0∑T−1∥δt∥⋅∥gt∥⋅cos⟨δt,gt⟩假设
x
t
a
d
v
=
[
x
t
1
,
x
t
2
,
⋯
,
x
t
D
]
x^{adv}_t=\left[x^1_t,x^2_t,\cdots,x^D_t\right]
xtadv=[xt1,xt2,⋯,xtD],
g
t
=
[
∇
x
t
1
,
∇
x
t
2
,
⋯
,
∇
x
t
D
]
g_t=\left[\nabla_{x^1_t},\nabla_{x^2_t},\cdots,\nabla_{x^D_t}\right]
gt=[∇xt1,∇xt2,⋯,∇xtD],且
D
=
H
×
W
×
C
D=H\times W \times C
D=H×W×C。令
δ
t
=
s
i
g
n
(
g
t
)
\delta_t=\mathrm{sign}(g_t)
δt=sign(gt),则此时
δ
t
\delta_t
δt和
g
t
g_t
gt的余弦值
cos
θ
t
\cos \theta_t
cosθt表示为
cos
θ
t
=
g
t
⋅
s
i
g
n
(
g
t
)
∥
g
t
∥
∥
s
i
g
n
(
g
t
)
∥
\cos \theta_t=\frac{g_t\cdot \mathrm{sign}{(g_t)}}{\|g_t\| \|\mathrm{sign}(g_t)\|}
cosθt=∥gt∥∥sign(gt)∥gt⋅sign(gt)因为
g
t
⋅
s
i
g
n
(
g
t
)
=
∇
x
t
1
⋅
s
i
g
n
(
∇
x
t
1
)
+
∇
x
t
2
⋅
s
i
g
n
(
∇
x
t
2
)
+
⋯
+
∇
x
t
D
⋅
s
i
g
n
(
∇
x
t
D
)
=
∣
∇
x
t
1
∣
+
∣
∇
x
t
2
∣
+
⋯
+
∣
∇
x
t
D
∣
=
∥
g
t
∥
1
\begin{aligned}g_t \cdot \mathrm{sign}(g_t)&=\nabla_{x^1_t} \cdot \mathrm{sign}(\nabla_{x^1_t})+\nabla_{x^2_t} \cdot \mathrm{sign}(\nabla_{x^2_t})+\cdots+\nabla_{x^D_t} \cdot \mathrm{sign}(\nabla_{x^D_t})\\&=\left|\nabla_{x^1_t}\right|+\left|\nabla_{x^2_t}\right|+\cdots+\left|\nabla_{x^D_t}\right|\\&=\left\|g_t\right\|_1\end{aligned}
gt⋅sign(gt)=∇xt1⋅sign(∇xt1)+∇xt2⋅sign(∇xt2)+⋯+∇xtD⋅sign(∇xtD)=∣∣∣∇xt1∣∣∣+∣∣∣∇xt2∣∣∣+⋯+∣∣∣∇xtD∣∣∣=∥gt∥1又因为
∥
g
t
∥
0
=
∥
s
i
g
n
(
g
t
)
∥
≈
D
\|g_t\|_0=\|\mathrm{sign}(g_t)\|\approx D
∥gt∥0=∥sign(gt)∥≈D,在向量所有的
p
p
p范数中1范数是最大的,即
∥
⋅
∥
1
≥
∥
⋅
∥
\|\cdot\|_1 \ge \|\cdot\|
∥⋅∥1≥∥⋅∥,此时则有
cos
θ
t
=
∥
g
t
∥
1
∥
g
t
∥
∥
s
i
g
n
(
g
t
)
∥
⟹
1
D
<
cos
θ
t
≤
1
\begin{aligned}\cos \theta_t=\frac{\|g_t\|_1}{\|g_t\|\|\mathrm{sign}(g_t)\|}\Longrightarrow& \frac{1}{\sqrt{D}} < \cos \theta_t \le1 \end{aligned}
cosθt=∥gt∥∥sign(gt)∥∥gt∥1⟹D1<cosθt≤1进而则有
1
<
∥
δ
t
∥
cos
θ
t
≤
D
1 < \|\delta_t\| \cos \theta_t \le \sqrt{D}
1<∥δt∥cosθt≤D令新的对抗扰动为
δ
t
′
\delta_t^{\prime}
δt′,且此时该扰动的方向与梯度方向一致即
cos
⟨
δ
t
′
,
g
t
⟩
=
cos
ϕ
t
=
1
\cos \langle \delta_t^{\prime},g_t\rangle=\cos \phi_t=1
cos⟨δt′,gt⟩=cosϕt=1为了能够使得与
F
G
S
M
\mathrm{FGSM}
FGSM的扰动步长
∥
s
i
g
n
(
g
t
)
∥
\|\mathrm{sign}(g_t)\|
∥sign(gt)∥范围一致,则有
ζ
=
∥
s
i
g
n
(
g
t
)
∥
∥
g
t
∥
\zeta=\frac{\|\mathrm{sign}(g_t)\|}{\|g_t\|}
ζ=∥gt∥∥sign(gt)∥进而则有
δ
t
′
=
ζ
⋅
g
t
⟹
∥
δ
t
′
∥
=
∥
s
i
g
n
(
g
t
)
∥
∥
g
t
∥
⋅
∥
g
t
∥
=
∥
s
i
g
n
(
g
t
)
∥
\begin{aligned}&\delta^{\prime}_t= \zeta \cdot g_t\\\Longrightarrow&\|\delta^\prime_t\|=\frac{\|\mathrm{sign}(g_t)\|}{\|g_t\|}\cdot \|g_t\|=\|\mathrm{sign}(g_t)\|\end{aligned}
⟹δt′=ζ⋅gt∥δt′∥=∥gt∥∥sign(gt)∥⋅∥gt∥=∥sign(gt)∥根据以上公式则可以推导出
∥
δ
t
∥
cos
θ
t
≤
∥
δ
t
′
∥
cos
ϕ
t
=
D
\|\delta_t\|\cos \theta_t \le \|\delta_t^\prime\|\cos \phi_t=\sqrt{D}
∥δt∥cosθt≤∥δt′∥cosϕt=D给定
ϵ
\epsilon
ϵ的范围,第
t
t
t步的对抗扰动为
x
t
+
1
a
d
v
−
x
t
a
d
v
=
c
l
i
p
ϵ
x
(
x
t
a
d
v
+
α
⋅
δ
t
′
)
−
x
t
a
d
v
x^{adv}_{t+1}-x^{adv}_t=\mathrm{clip}_{\epsilon}^{x}\left(x^{adv}_t + \alpha \cdot \delta^{\prime}_t\right)-x^{adv}_t
xt+1adv−xtadv=clipϵx(xtadv+α⋅δt′)−xtadv
F
G
N
M
\mathrm{FGNM}
FGNM具体的算法流程图如下所示
实验结果
如下图所示,为原始梯度方向和
s
i
g
n
\mathrm{sign}
sign梯度方向的可视化剪头图。从下图发现原始梯度更快更高效收敛到最优点中。然而
s
i
g
n
\mathrm{sign}
sign施加到梯度上会使得它推离最优方向,导致要以更多的迭代次数才能达到最优点。
如下两图所示为不同方法的平均攻击成功率比较。作者使用
I
n
c
−
v
3
\mathrm{Inc-v3}
Inc−v3作为白盒,并计算其它四种黑盒模型(
I
n
c
\mathrm{Inc}
Inc-
v
4
\mathrm{v4}
v4、
R
e
s
152
\mathrm{Res152}
Res152、
I
n
c
R
e
s
\mathrm{IncRes}
IncRes和
D
e
n
161
\mathrm{Den161}
Den161)的平均攻击成功率,可以发现在各个攻击迭代方法下,应用论文中的方法会取得更好的效果。
算法实现
from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Dataset
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
import torch.nn.functional as F
import os
class CNN(nn.Module):
def __init__(self):
super().__init__()
self.Sq1 = nn.Sequential(
nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2), # (16, 28, 28) # output: (16, 28, 28)
nn.ReLU(),
nn.MaxPool2d(kernel_size=2), # (16, 14, 14)
)
self.Sq2 = nn.Sequential(
nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2), # (32, 14, 14)
nn.ReLU(),
nn.MaxPool2d(2), # (32, 7, 7)
)
self.out = nn.Linear(32 * 7 * 7, 10)
def forward(self, x):
x = self.Sq1(x)
x = self.Sq2(x)
x = x.view(x.size(0), -1)
output = self.out(x)
return output
def FGM_attack(inputs, targets, net, alpha, epsilon, attack_type):
delta = torch.zeros_like(inputs)
delta.requires_grad = True
outputs = net(inputs + delta)
loss = nn.CrossEntropyLoss()(outputs, targets)
loss.backward()
grad = delta.grad.detach()
if type == 'FGSN':
zeta = (torch.norm(inputs, p=0, dim=(2,3), keepdim=True) / torch.norm(inputs, p=2, dim=(2,3), keepdim=True)) * torch.ones(inputs.shape)
delta.data = torch.clamp(delta + alpha * zeta * grad, -epsilon, epsilon)
else:
delta.data = torch.clamp(delta + alpha * torch.sign(grad), -epsilon, epsilon)
delta = delta.detach()
return delta
def main():
alpha = 0.2
epsilon = 0.5
total = 0
correct1 = 0
correct2 = 0
model = CNN()
model.load_state_dict(torch.load('model/model.pt'))
use_cuda = torch.cuda.is_available()
mnist_train = datasets.MNIST("mnist-data", train=True, download=True, transform=transforms.ToTensor())
train_loader = torch.utils.data.DataLoader(mnist_train, batch_size= 5, shuffle=True)
for batch_idx, (inputs, targets) in enumerate(train_loader):
if use_cuda:
inputs, targets = inputs.cuda(), targets.cuda()
inputs, targets = Variable(inputs), Variable(targets)
total += targets.size(0)
delta1 = FGM_attack(inputs, targets, model, alpha, epsilon, 'FGNM')
adv_image1 = torch.clamp(inputs + delta1, 0, 1)
outputs1 = model(adv_image1)
_, predicted1 = torch.max(outputs1.data, 1)
correct1 += predicted1.eq(targets.data).cpu().sum().item()
print('The FGNM accuracy:', correct1, total, correct1/total)
delta2 = FGM_attack(inputs, targets, model, alpha, epsilon, 'FGSM')
adv_images2 = torch.clamp(inputs + delta1, 0, 1)
outputs2 = model(adv_images2)
_, predicted2 = torch.max(outputs2.data, 1)
correct2 += predicted2.eq(targets.data).cpu().sum().item()
print('The FGSM accuracy:', correct2, total, correct2/total)
print('The FGNM accuracy:', correct1)
print('The FGSM accuracy:', correct2)
if __name__ == '__main__':
main()
当给定的攻击步长
α
=
0.2
\alpha=0.2
α=0.2,则有如下实验结果