对抗攻击FGSM的纯粹版FGNS

引言

F G S M \mathrm{FGSM} FGSM是基于梯度迭代攻击中生成对抗样本的开创性工作。我第一次接触相关工作的时候,给我困惑最多的就是论文中为什么要给梯度加上 s i g n \mathrm{sign} sign这个符号函数,因为这会导致生成的对抗扰动的方向与最速梯度方向有一个锐角偏移。 I a n   G o o d f e l l o w \mathrm{Ian\text{ } Goodfellow} Ian Goodfellow在斯坦福大学的讲座里对梯度符号中加入 s i g n \mathrm{sign} sign符号给出了一个基于实证的解释,即在线性假设下,给定一个数据集的样本,当样本中对抗扰动分量方向与梯度分量的方向相同的个数如果多余某个常数时,该样本沿着对抗扰动的方向即可进入到对抗子区域中。这个解释是在说明加入 s i g n \mathrm{sign} sign符号后的扰动方向依然具有攻击性,但从原理上来说这并不是最好的攻击方向。最近看到了一篇文章,就讨论了该问题,论文作者通过原理分析实验验证,发现当梯度方向加入 s i g n \mathrm{sign} sign符号后会使得攻击效率比较低。论文的代码链接失效,我根据论文中的算法流程图重新编写了一下代码。
论文链接:https://arxiv.org/abs/2110.12734

理论分析

给定一个样本 x x x,其对应的标签为 y y y,损失函数为 L ( x , y ) \mathcal{L}(x,y) L(x,y),其中第 t t t步生成的对抗样本为 x T a d v x^{adv}_T xTadv。根据多元函数的泰勒展开公式可以得到如下方程组
{ L ( x T a d v , y ) = L ( x T − 1 a d v , y ) + ( x T a d v − x T − 1 a d v ) ⋅ ∇ L ( x T − 1 a d v , y ) + O ( ∥ x T a d v − x T − 1 a d v ∥ 2 ) L ( x T − 1 a d v , y ) = L ( x T − 2 a d v , y ) + ( x T − 1 a d v − x T − 2 a d v ) ⋅ ∇ L ( x T − 2 a d v , y ) + O ( ∥ x T − 1 a d v − x T − 2 a d v ∥ 2 ) L ( x T − 2 a d v , y ) = L ( x T − 3 a d v , y ) + ( x T − 2 a d v − x T − 3 a d v ) ⋅ ∇ L ( x T − 3 a d v , y ) + O ( ∥ x T − 2 a d v − x T − 3 a d v ∥ 2 ) ⋮ L ( x 3 a d v , y ) = L ( x 2 a d v , y ) + ( x 3 a d v − x 2 a d v ) ⋅ ∇ L ( x 2 a d v , y ) + O ( ∥ x 3 a d v − x 2 a d v ∥ 2 ) L ( x 2 a d v , y ) = L ( x 2 a d v , y ) + ( x 2 a d v − x 1 a d v ) ⋅ ∇ L ( x 1 a d v , y ) + O ( ∥ x 2 a d v − x 1 a d v ∥ 2 ) L ( x 1 a d v , y ) = L ( x 0 a d v , y ) + ( x 1 a d v − x 0 a d v ) ⋅ ∇ L ( x 0 a d v , y ) + O ( ∥ x 1 a d v − x 0 a d v ∥ 2 ) \left\{\begin{aligned}\mathcal{L}(x^{adv}_T,y)&=\mathcal{L}(x^{adv}_{T-1},y)+(x^{adv}_T-x^{adv}_{T-1})\cdot \nabla \mathcal{L}(x^{adv}_{T-1},y)+O(\|x^{adv}_T-x^{adv}_{T-1}\|^2)\\\mathcal{L}(x^{adv}_{T-1},y)&=\mathcal{L}(x^{adv}_{T-2},y)+(x^{adv}_{T-1}-x^{adv}_{T-2})\cdot \nabla \mathcal{L}(x^{adv}_{T-2},y)+O(\|x^{adv}_{T-1}-x^{adv}_{T-2}\|^2)\\\mathcal{L}(x^{adv}_{T-2},y)&=\mathcal{L}(x^{adv}_{T-3},y)+(x^{adv}_{T-2}-x^{adv}_{T-3})\cdot \nabla \mathcal{L}(x^{adv}_{T-3},y)+O(\|x^{adv}_{T-2}-x^{adv}_{T-3}\|^2)\\ \vdots \\ \mathcal{L}(x^{adv}_{3},y)&=\mathcal{L}(x^{adv}_{2},y)+(x^{adv}_{3}-x^{adv}_{2})\cdot \nabla \mathcal{L}(x^{adv}_{2},y)+O(\|x^{adv}_{3}-x^{adv}_{2}\|^2)\\ \mathcal{L}(x^{adv}_{2},y)&=\mathcal{L}(x^{adv}_{2},y)+(x^{adv}_{2}-x^{adv}_{1})\cdot \nabla \mathcal{L}(x^{adv}_{1},y)+O(\|x^{adv}_{2}-x^{adv}_{1}\|^2)\\\mathcal{L}(x^{adv}_{1},y)&=\mathcal{L}(x^{adv}_0,y)+(x^{adv}_{1}-x^{adv}_0)\cdot \nabla \mathcal{L}(x^{adv}_0,y)+O(\|x^{adv}_{1}-x^{adv}_0\|^2) \end{aligned}\right. L(xTadv,y)L(xT1adv,y)L(xT2adv,y)L(x3adv,y)L(x2adv,y)L(x1adv,y)=L(xT1adv,y)+(xTadvxT1adv)L(xT1adv,y)+O(xTadvxT1adv2)=L(xT2adv,y)+(xT1advxT2adv)L(xT2adv,y)+O(xT1advxT2adv2)=L(xT3adv,y)+(xT2advxT3adv)L(xT3adv,y)+O(xT2advxT3adv2)=L(x2adv,y)+(x3advx2adv)L(x2adv,y)+O(x3advx2adv2)=L(x2adv,y)+(x2advx1adv)L(x1adv,y)+O(x2advx1adv2)=L(x0adv,y)+(x1advx0adv)L(x0adv,y)+O(x1advx0adv2)根据以上方程组可以得到如下公式 L ( x T a d v , y ) = L ( x , y ) + ∑ t = 0 T − 1 ( x t + 1 a d v − x t a d v ) ⋅ ∇ L ( x t a d v , y ) + ∑ t = 0 T − 1 O ( ∥ x t + 1 a d v − x t a d v ∥ 2 ) \mathcal{L}(x^{adv}_T,y)=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}(x^{adv}_{t+1}-x^{adv}_t)\cdot \nabla\mathcal{L}(x^{adv}_t,y)+\sum\limits_{t=0}^{T-1}O(\|x^{adv}_{t+1}-x^{adv}_t\|^2) L(xTadv,y)=L(x,y)+t=0T1(xt+1advxtadv)L(xtadv,y)+t=0T1O(xt+1advxtadv2) δ t = x t + 1 a d v − x t a d v \delta_t=x_{t+1}^{adv}-x_t^{adv} δt=xt+1advxtadv g t = ∇ L ( x t a d v , y ) g_t=\nabla\mathcal{L}(x^{adv}_t,y) gt=L(xtadv,y),进而则有 L ( x T a d v , y ) = L ( x , y ) + ∑ t = 0 T − 1 δ t ⋅ g t + ∑ t = 0 T − 1 O ( ∥ δ t ∥ 2 ) = L ( x , y ) + ∑ t = 0 T − 1 ∥ δ t ∥ ⋅ ∥ g t ∥ ⋅ cos ⁡ ⟨ δ t , g t ⟩ + ∑ t = 0 T − 1 O ( ∥ δ t ∥ 2 ) ≈ L ( x , y ) + ∑ t = 0 T − 1 ∥ δ t ∥ ⋅ ∥ g t ∥ ⋅ cos ⁡ ⟨ δ t , g t ⟩ \begin{aligned}\mathcal{L}(x^{adv}_T,y)&=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\delta_t \cdot g_t + \sum\limits_{t=0}^{T-1}O(\|\delta_t\|^2)\\&=\mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\|\delta_t\|\cdot \|g_t\|\cdot \cos\langle \delta_t, g_t \rangle + \sum\limits_{t=0}^{T-1}O(\|\delta_t\|^2)\\&\approx \mathcal{L}(x,y)+\sum\limits_{t=0}^{T-1}\|\delta_t\|\cdot\|g_t\|\cdot \cos\langle \delta_t, g_t \rangle \end{aligned} L(xTadv,y)=L(x,y)+t=0T1δtgt+t=0T1O(δt2)=L(x,y)+t=0T1δtgtcosδt,gt+t=0T1O(δt2)L(x,y)+t=0T1δtgtcosδt,gt假设 x t a d v = [ x t 1 , x t 2 , ⋯   , x t D ] x^{adv}_t=\left[x^1_t,x^2_t,\cdots,x^D_t\right] xtadv=[xt1,xt2,,xtD] g t = [ ∇ x t 1 , ∇ x t 2 , ⋯   , ∇ x t D ] g_t=\left[\nabla_{x^1_t},\nabla_{x^2_t},\cdots,\nabla_{x^D_t}\right] gt=[xt1,xt2,,xtD],且 D = H × W × C D=H\times W \times C D=H×W×C。令 δ t = s i g n ( g t ) \delta_t=\mathrm{sign}(g_t) δt=sign(gt),则此时 δ t \delta_t δt g t g_t gt的余弦值 cos ⁡ θ t \cos \theta_t cosθt表示为 cos ⁡ θ t = g t ⋅ s i g n ( g t ) ∥ g t ∥ ∥ s i g n ( g t ) ∥ \cos \theta_t=\frac{g_t\cdot \mathrm{sign}{(g_t)}}{\|g_t\| \|\mathrm{sign}(g_t)\|} cosθt=gtsign(gt)gtsign(gt)因为 g t ⋅ s i g n ( g t ) = ∇ x t 1 ⋅ s i g n ( ∇ x t 1 ) + ∇ x t 2 ⋅ s i g n ( ∇ x t 2 ) + ⋯ + ∇ x t D ⋅ s i g n ( ∇ x t D ) = ∣ ∇ x t 1 ∣ + ∣ ∇ x t 2 ∣ + ⋯ + ∣ ∇ x t D ∣ = ∥ g t ∥ 1 \begin{aligned}g_t \cdot \mathrm{sign}(g_t)&=\nabla_{x^1_t} \cdot \mathrm{sign}(\nabla_{x^1_t})+\nabla_{x^2_t} \cdot \mathrm{sign}(\nabla_{x^2_t})+\cdots+\nabla_{x^D_t} \cdot \mathrm{sign}(\nabla_{x^D_t})\\&=\left|\nabla_{x^1_t}\right|+\left|\nabla_{x^2_t}\right|+\cdots+\left|\nabla_{x^D_t}\right|\\&=\left\|g_t\right\|_1\end{aligned} gtsign(gt)=xt1sign(xt1)+xt2sign(xt2)++xtDsign(xtD)=xt1+xt2++xtD=gt1又因为 ∥ g t ∥ 0 = ∥ s i g n ( g t ) ∥ ≈ D \|g_t\|_0=\|\mathrm{sign}(g_t)\|\approx D gt0=sign(gt)D,在向量所有的 p p p范数中1范数是最大的,即 ∥ ⋅ ∥ 1 ≥ ∥ ⋅ ∥ \|\cdot\|_1 \ge \|\cdot\| 1,此时则有 cos ⁡ θ t = ∥ g t ∥ 1 ∥ g t ∥ ∥ s i g n ( g t ) ∥ ⟹ 1 D < cos ⁡ θ t ≤ 1 \begin{aligned}\cos \theta_t=\frac{\|g_t\|_1}{\|g_t\|\|\mathrm{sign}(g_t)\|}\Longrightarrow& \frac{1}{\sqrt{D}} < \cos \theta_t \le1 \end{aligned} cosθt=gtsign(gt)gt1D 1<cosθt1进而则有 1 < ∥ δ t ∥ cos ⁡ θ t ≤ D 1 < \|\delta_t\| \cos \theta_t \le \sqrt{D} 1<δtcosθtD 令新的对抗扰动为 δ t ′ \delta_t^{\prime} δt,且此时该扰动的方向与梯度方向一致即 cos ⁡ ⟨ δ t ′ , g t ⟩ = cos ⁡ ϕ t = 1 \cos \langle \delta_t^{\prime},g_t\rangle=\cos \phi_t=1 cosδt,gt=cosϕt=1为了能够使得与 F G S M \mathrm{FGSM} FGSM的扰动步长 ∥ s i g n ( g t ) ∥ \|\mathrm{sign}(g_t)\| sign(gt)范围一致,则有 ζ = ∥ s i g n ( g t ) ∥ ∥ g t ∥ \zeta=\frac{\|\mathrm{sign}(g_t)\|}{\|g_t\|} ζ=gtsign(gt)进而则有 δ t ′ = ζ ⋅ g t ⟹ ∥ δ t ′ ∥ = ∥ s i g n ( g t ) ∥ ∥ g t ∥ ⋅ ∥ g t ∥ = ∥ s i g n ( g t ) ∥ \begin{aligned}&\delta^{\prime}_t= \zeta \cdot g_t\\\Longrightarrow&\|\delta^\prime_t\|=\frac{\|\mathrm{sign}(g_t)\|}{\|g_t\|}\cdot \|g_t\|=\|\mathrm{sign}(g_t)\|\end{aligned} δt=ζgtδt=gtsign(gt)gt=sign(gt)根据以上公式则可以推导出 ∥ δ t ∥ cos ⁡ θ t ≤ ∥ δ t ′ ∥ cos ⁡ ϕ t = D \|\delta_t\|\cos \theta_t \le \|\delta_t^\prime\|\cos \phi_t=\sqrt{D} δtcosθtδtcosϕt=D 给定 ϵ \epsilon ϵ的范围,第 t t t步的对抗扰动为 x t + 1 a d v − x t a d v = c l i p ϵ x ( x t a d v + α ⋅ δ t ′ ) − x t a d v x^{adv}_{t+1}-x^{adv}_t=\mathrm{clip}_{\epsilon}^{x}\left(x^{adv}_t + \alpha \cdot \delta^{\prime}_t\right)-x^{adv}_t xt+1advxtadv=clipϵx(xtadv+αδt)xtadv F G N M \mathrm{FGNM} FGNM具体的算法流程图如下所示

实验结果

如下图所示,为原始梯度方向和 s i g n \mathrm{sign} sign梯度方向的可视化剪头图。从下图发现原始梯度更快更高效收敛到最优点中。然而 s i g n \mathrm{sign} sign施加到梯度上会使得它推离最优方向,导致要以更多的迭代次数才能达到最优点。

如下两图所示为不同方法的平均攻击成功率比较。作者使用 I n c − v 3 \mathrm{Inc-v3} Incv3作为白盒,并计算其它四种黑盒模型( I n c \mathrm{Inc} Inc- v 4 \mathrm{v4} v4 R e s 152 \mathrm{Res152} Res152 I n c R e s \mathrm{IncRes} IncRes D e n 161 \mathrm{Den161} Den161)的平均攻击成功率,可以发现在各个攻击迭代方法下,应用论文中的方法会取得更好的效果。

算法实现

from torchvision import datasets, transforms
from torch.utils.data import DataLoader, Dataset
import torch
import torch.nn as nn
from torch.autograd import Variable
import torch.optim as optim
import torch.nn.functional as F
import os

class CNN(nn.Module):
    def __init__(self):
        super().__init__()
        self.Sq1 = nn.Sequential(         
            nn.Conv2d(in_channels=1, out_channels=16, kernel_size=5, stride=1, padding=2),   # (16, 28, 28)                           #  output: (16, 28, 28)
            nn.ReLU(),                    
            nn.MaxPool2d(kernel_size=2),    # (16, 14, 14)
        )
        self.Sq2 = nn.Sequential(
            nn.Conv2d(in_channels=16, out_channels=32, kernel_size=5, stride=1, padding=2),  # (32, 14, 14)
            nn.ReLU(),                      
            nn.MaxPool2d(2),                # (32, 7, 7)
        )
        self.out = nn.Linear(32 * 7 * 7, 10)   

    def forward(self, x):
        x = self.Sq1(x)
        x = self.Sq2(x)
        x = x.view(x.size(0), -1)          
        output = self.out(x)
        return output
        
def FGM_attack(inputs, targets, net, alpha, epsilon, attack_type):
	delta = torch.zeros_like(inputs)
	delta.requires_grad = True
	outputs = net(inputs + delta)
	loss = nn.CrossEntropyLoss()(outputs, targets)
	loss.backward()
	grad = delta.grad.detach()
	if type == 'FGSN':
		zeta = (torch.norm(inputs, p=0, dim=(2,3), keepdim=True) / torch.norm(inputs, p=2, dim=(2,3), keepdim=True)) * torch.ones(inputs.shape)
		delta.data = torch.clamp(delta + alpha * zeta * grad, -epsilon, epsilon)
	else:
		delta.data = torch.clamp(delta + alpha * torch.sign(grad), -epsilon, epsilon)
	delta = delta.detach()
	return delta

def main():
	alpha = 0.2
	epsilon = 0.5
	total = 0
	correct1 = 0
	correct2 = 0
	model = CNN()
	model.load_state_dict(torch.load('model/model.pt'))
	use_cuda = torch.cuda.is_available()
	mnist_train = datasets.MNIST("mnist-data", train=True, download=True, transform=transforms.ToTensor())
	train_loader = torch.utils.data.DataLoader(mnist_train, batch_size= 5, shuffle=True)

	for batch_idx, (inputs, targets) in enumerate(train_loader):
		if use_cuda:
			inputs, targets = inputs.cuda(), targets.cuda()
		inputs, targets = Variable(inputs), Variable(targets)
		total += targets.size(0)

		delta1 = FGM_attack(inputs, targets, model, alpha, epsilon, 'FGNM')
		adv_image1 = torch.clamp(inputs + delta1, 0, 1)
		outputs1 = model(adv_image1)
		_, predicted1 = torch.max(outputs1.data, 1)
		correct1 += predicted1.eq(targets.data).cpu().sum().item()
		print('The FGNM accuracy:', correct1, total, correct1/total)

		delta2 = FGM_attack(inputs, targets, model, alpha, epsilon, 'FGSM')
		adv_images2 = torch.clamp(inputs + delta1, 0, 1)
		outputs2 = model(adv_images2)
		_, predicted2 = torch.max(outputs2.data, 1)
		correct2 += predicted2.eq(targets.data).cpu().sum().item()
		print('The FGSM accuracy:', correct2, total, correct2/total)
	print('The FGNM accuracy:', correct1)
	print('The FGSM accuracy:', correct2)

if __name__ == '__main__':
	main() 

当给定的攻击步长 α = 0.2 \alpha=0.2 α=0.2,则有如下实验结果

评论 9
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

道2024

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值