利用FGSM实现对抗样本攻击

对抗样本的线性解释

数字图像通常采用每个像素8bit来编码,因此会抛弃小于1/255的信息。设原始图像为 x \bm{x} x,扰动噪声为 η \bm{\eta} η,扰动之后的图像为:
x ~ = x + η \tilde{\bm{x}}=\bm{x}+\bm{\eta} x~=x+η
如果 η \bm{\eta} η小于特征的精度,那么分类器如果做出不同的相应是不合理的。格式上的,对于well-separated类,我们期望的是分类器对于 x ~ \tilde{\bm{x}} x~ x \bm{x} x分配相同的类别只要最大范数 ∥ η ∥ ∞ < ϵ \parallel \bm{\eta}\parallel_{\infty }<\epsilon η<ϵ ∥ η ∥ ∞ = max ⁡ ( ∣ η 1 ∣ , ∣ η 2 ∣ , . . . , ∣ η n ∣ ) \parallel \bm{\eta}\parallel_{\infty }=\max{(|\bm{\eta}_{1}|,|\bm{\eta}_{2}|,...,|\bm{\eta}_{n}|)} η=max(η1,η2,...,ηn),其中 ϵ \epsilon ϵ是一个足够小的,无法被感知到的数。考虑在权重 w \bm{w} w和对抗样本 x ~ \tilde{\bm{x}} x~之间的点乘:
w T x ~ = w T x + w T η \bm{w}^{T}\tilde{\bm{x}}=\bm{w}^{T}\bm{x}+\bm{w}^{T}\bm{\eta} wTx~=wTx+wTη扰动 η \bm{\eta} η被增长激活通过 w T η \bm{w}^{T}\bm{\eta} wTη。我们最大化增长激活通过 η = s i g n ( w ) \bm{\eta}=sign(\bm{w}) η=sign(w)。因为加减1以内的数都是无法被感知的,所以采用 s i g n sign sign函数是最大化的扰动值。这里假设 w \bm{w} w具有 n n n维,权重向量的平均数量级为 m m m,那么通过点乘之后激活被增加到 ϵ m n \epsilon m n ϵmn.虽然 ϵ \epsilon ϵ作为一个常数是不变的,但是维度 n n n会随着线性增长伴随着高维空间,此时对于输入的无穷小改变则会引起输出较大的改变。

对于非线性模型的线性扰动

在这里插入图片描述
上图展示了Fast adversarial examples应用在GoogleNet上。通过加上一个人类感知不到的小向量,向量的值为 ϵ \epsilon ϵ乘输入像素点关于误差的梯度值的符号值。这里 ϵ = 0.007 \epsilon=0.007 ϵ=0.007,符合经过GoogleNet转换为实数后8bit图像编码的最小数量级。图中加上噪声后,熊猫被识别为长臂猿,并且置信度为99.3%,所以此方法叫做fast gradient sign method,简称FGSM

实现FGSM算法:

class Attack(object):
    def __init__(self, net):
        self.net = net
        self.criterion = F.cross_entropy

    def fgsm(self, x, y, eps=0.03, x_val_min=-1, x_val_max=1):
        x_adv = x
        x_adv.requires_grad = True
        logits = self.net(x_adv)
        cost = -self.criterion(logits, y)
        self.net.zero_grad()
        if x_adv.grad is not None:
            x_adv.grad.data.fill_(0)
        cost.backward()

        x_adv.grad.sign_()
        x_adv = x_adv - eps*x_adv.grad
        x_adv = torch.clamp(x_adv, x_val_min, x_val_max)

        return x_adv

使用foolbox实现对pre-trained模型攻击

安装:pip install foolbox==2.0.0
需要提供一张imagenet数据集中的一张图片并命名为val.jpeg和对应的标签值。这里采用pytorch预训练的ResNet-18模型进行攻击。

import foolbox
import torch
import torchvision.models as models
import numpy as np
import cv2
import os

# instantiate the model
resnet18 = models.resnet18(pretrained=True).eval()
if torch.cuda.is_available():
    resnet18 = resnet18.cuda()
mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
fmodel = foolbox.models.PyTorchModel(
    resnet18, bounds=(0, 1), num_classes=1000, preprocessing=(mean, std))

# get source image and label
image_path = 'val.JPEG'
image = cv2.imread(image_path)
image = cv2.resize(image, (224,) * 2)[..., ::-1].transpose((2, 0, 1)).astype(np.float32)
image = image / 255.  # because our model expects values in [0, 1]
image = np.expand_dims(image, axis=0)
label = np.array([0])

print('True label', label)
print('predicted class', np.argmax(fmodel.forward(image), axis=1))

# apply attack on source image
attack = foolbox.attacks.FGSM(fmodel)
# 如果攻击失败,返回全是nan的张量
adversarial = attack(image, label, epsilons=[0.1], max_epsilon=0)
np.save('adversarial.npy', adversarial)  # 保存攻击之后的图像

print('adversarial class', np.argmax(fmodel.forward(adversarial), axis=1))

输出:

True label 0
predicted class 0
adversarial class 997

这里解释一下attack对象的参数:
call(self, input_or_adv, label=None, unpack=True, epsilons=1000, max_epsilon=1)
Parameters:

  • input_or_adv: numpy.ndarray or Adversarial The original, unperturbed input as a numpy.ndarray or an Adversarial instance.
  • label:int The reference label of the original input. Must be passed if a is a numpy.ndarray, must not be passed if a is an Adversarial instance.
  • unpack:bool If true, returns the adversarial input, otherwise returns the Adversarial object.
  • epsilons:int or Iterable[float] Either Iterable of step sizes in the direction of the sign of the gradient or number of step sizes between 0 and max_epsilon that should be tried.
  • max_epsilon:float Largest step size if epsilons is not an iterable.

如果epsilons是int型的数,那么

epsilons= np.linspace(0, max_epsilon, num=epsilons + 1)[1:]

或者epsilons是一个可迭代的浮点数,比如:

epsilons=[0.1, 0.2, 0.3]

FGSM会从小的epsilon开始进行攻击,直至找到可以攻击成功的epsilon值。若攻击失败,则会返回全是nan的数组。若仅想要使用一个epsilon值来攻击,则可以设置epsilons=[0.1]max_epsilon=0,即:

adversarial = attack(image, label, epsilons=[0.001], max_epsilon=0)

遍历整个数据集得到攻击结果

import foolbox
import torch
import torchvision.models as models
import numpy as np
import torchvision.datasets as datasets
import torchvision.transforms as transforms
import os

os.environ["CUDA_VISIBLE_DEVICES"] = '1,2'
# instantiate the model
model = models.resnet101(pretrained=True).cuda().eval()
model = torch.nn.DataParallel(model)
mean = np.array([0.485, 0.456, 0.406]).reshape((3, 1, 1))
std = np.array([0.229, 0.224, 0.225]).reshape((3, 1, 1))
fmodel = foolbox.models.PyTorchModel(
    model, bounds=(0, 1), num_classes=1000, preprocessing=(mean, std))

# get source image and label

val_dir = '/home/ws/winycg/imagenet/ILSVRC2012_img_val/'
val_dataset = datasets.ImageFolder(val_dir, transforms.Compose([
    transforms.Resize(256),
    transforms.CenterCrop(224),
    transforms.ToTensor(),
    ]))
val_loader = torch.utils.data.DataLoader(val_dataset, batch_size=100, shuffle=False,
                                         num_workers=16, pin_memory=(torch.cuda.is_available()))


# apply attack on source image
attack = foolbox.attacks.FGSM(fmodel)
# 记录未成功攻击的样本数
def adversarial_num(output):
    correct_num = 0
    for i in range(output.size(0)):
        if torch.isnan(output[i])[0, 0, 0]:
            correct_num += 1
    return correct_num


def acc_num(output, target, topk=(1,)):
    """Computes the precision@k for the specified values of k"""
    maxk = max(topk)
    _, pred = output.topk(maxk, 1, True, True)
    pred = pred.t()
    correct = pred.eq(target.view(1, -1).expand_as(pred))

    number = []
    for k in topk:
        correct_k = correct[:k].view(-1).float().sum(0).item()
        number.append(correct_k)
    return number

total = 0
ori_top1_correct = 0
adv_top1_correct = 0

for batch_idx, (inputs, targets) in enumerate(val_loader):
    inputs, targets = inputs, targets

    ori_output = fmodel.forward(inputs.numpy())
    ori_top1_correct += acc_num(torch.from_numpy(ori_output), targets, (1,))[0]

    adversarial_input = attack(inputs.numpy(), targets.numpy(), epsilons=[0.3], max_epsilon=0)
    #adv_outputs = fmodel.forward(adversarial_input)

    adv_top1_correct += adversarial_num(torch.from_numpy(adversarial_input))

    total += targets.size(0)
ori_top1_acc = ori_top1_correct / total
adv_top1_acc = adv_top1_correct / total

print('original top1 accuracy:', ori_top1_acc)
print('adversarial top1 accuracy:', adv_top1_acc)

可视化加噪声后的样本以及sign图

import matplotlib.pyplot as plt
import cv2
import numpy as np

plt.subplot(1, 3, 1)
plt.title('Original')
image_path = 'val.JPEG'
image = cv2.imread(image_path)
image = cv2.resize(image, (224,) * 2)[..., ::-1] / 255
plt.imshow(image)  # division by 255 to convert [0, 255] to [0, 1]
plt.axis('off')

plt.subplot(1, 3, 2)
plt.title('Adversarial')
adversarial = np.load('adversarial.npy').transpose((1, 2, 0))
plt.imshow(adversarial)
plt.axis('off')

plt.subplot(1, 3, 3)
plt.title('Difference')
sign = np.sign(adversarial - image)
plt.imshow((sign- sign.min())/ (sign.max()-sign.min()))
plt.axis('off')

plt.show()

在这里插入图片描述

对抗性训练

标准的监督训练不会指定选择的函数可以对对抗样本具有对抗性。作者认为基于FGSM的对抗目标函数是一个有效的正则项:
J ~ ( θ , x , y ) = α J ( θ , x , y ) + ( 1 − α ) J ( θ , x + ϵ s i g n ( ∇ x J ( θ , x , y ) ) , y ) \tilde{J}(\bm{\theta},\bm{x},y)=\alpha J(\bm{\theta},\bm{x},y)+(1-\alpha)J(\bm{\theta},\bm{x}+\epsilon sign(\nabla_{\bm{x}}J(\bm{\theta},\bm{x},y)),y) J~(θ,x,y)=αJ(θ,x,y)+(1α)J(θ,x+ϵsign(xJ(θ,x,y)),y)
作者的默认参数为 α = 0.5 \alpha=0.5 α=0.5,此方法可以持续更新所提供的对抗样本,来对抗现有版本的模型。

©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页