火炬电阻_火炬神经网络的对抗性攻击和防御

火炬电阻

The rise of deep learning and neural networks brought various opportunities and applications such as object detection and text-to-speech into the modern society. Yet, despite the seemingly high accuracy, neural networks (and almost all machine learning models) could actually suffer from data, namely adversarial examples, that are manipulated very slightly from original training samples. In fact, past researches have indicated that as long as you know the “correct” method to change your data, you can force your network to perform poorly on data which may not seem to be visually different through human eyes! These deliberate manipulations of the data to lower model accuracies are called adversarial attacks, and the war of attack and defense is an ongoing popular research topic in the machine learning domain.

深度学习和神经网络的兴起为现代社会带来了各种机会和应用,例如对象检测和文本转语音。 然而,尽管看似准确性很高,但神经网络(以及几乎所有机器学习模型)实际上都可能受到数据(即对抗性示例)的困扰,而这些数据是从原始训练样本中进行的非常轻微的操纵。 实际上,过去的研究表明,只要您知道更改数据的“正确”方法,您就可以强制网络对数据进行不良处理,而这些数据在肉眼看来似乎并没有什么不同! 这些对数据进行有意操纵以降低模型精度的方法称为对抗性攻击,而攻击与防御之战是机器学习领域中持续流行的研究主题。

This article will provide an overview on one of the easiest yet effective attacks — Fast Gradient Signed Method attack — along with its implementation in and defense through adversarial training in PyTorch.

本文将概述最简单但有效的攻击之一-快速梯度签名方法攻击-以及在PyTorch中通过对抗性训练实施和防御的方法。

Side Note: This article assumes prior knowledge in building simple neural networks and training them in PyTorch. If you are not familiar with them it is recommended to first checkout tutorials on PyTorch first.

旁注:本文假设您具有构建简单神经网络并在PyTorch中进行训练的先验知识。 如果您不熟悉它们,建议先在PyTorch上先结帐教程。

对抗性例子和攻击的历史 (The History of Adversarial Examples and Attacks)

Adversarial examples can be defined as inputs or data that are perturbed in order to fool a machine learning network. This idea was formulated by Ian et al. in his paper “Explaining and Harnessing Adversarial Examples” from ICLR 2015 conference. While publications before this paper claimed that these adversarial examples were caused by nonlinearity and overfitting of machine models, Ian et al. argued that neural networks are in fact vulnerable to these examples due to the high linearity of the architecture. Models such as LSTMs and activation functions such as ReLU still often behave in a very linear way, and hence these models would be very easily fooled by linear perturbations. He then followed up by providing a simple and fast one-step method of generating adversarial examples: Fast Gradient Sign Method.

对抗性示例可以定义为扰乱机器学习网络的输入或数据。 这个想法是由伊恩等人提出的。 在ICLR 2015会议的论文“解释和利用对抗性示例”中。 虽然在此之前的出版物声称这些对抗性示例是由非线性和机器模型的过度拟合引起的,但伊恩等人。 他们认为,由于架构的高度线性,神经网络实际上很容易受到这些示例的影响。 诸如LSTM的模型和诸如ReLU的激活函数仍然经常以非常线性的方式运行,因此这些模型很容易被线性扰动所欺骗。 然后,他提供了一种简单而快速的生成对抗性示例的单步方法:快速梯度符号法。

快速梯度符号法(FGSM) (Fast Gradient Sign Method (FGSM))

Image for post
2015 ICLR paper) 2015 ICLR论文中的照片 )

The Fast Gradient Sign Method (FGSM) is a white-box attack, meaning the attack is generated based on a given network architecture. FGSM is based on the idea that normal networks follows a gradient descent to find the lowest point of loss, and hence if we follow the sign of the gradient (going the opposite direction from the gradient descent), we can maximise the loss by just adding a small amount of perturbation.

快速梯度符号方法(FGSM)是一种白盒攻击,这意味着该攻击是基于给定的网络体系结构生成的。 FGSM基于这样的思想,即正常网络遵循梯度下降以找到最低的损耗点,因此,如果我们遵循梯度的符号(与梯度下降的方向相反),我们可以通过仅增加少量的扰动。

FGSM can hence be described as the following mathematical expression:

因此,FGSM可以描述为以下数学表达式:

Image for post

where x’ is the perturbed x that is generated by adding a small constant ε with the sign equal to the direction of the gradient of loss J with respect to x. Figure 1 is the classic illustration of a FGSM attack in the computer vision domain. With a less than 1% change in the image that isn’t visually recognisable by us, the image went from correctly classified with a mediocre confidence to falsely classified with a high confidence.

其中x'是扰动的x,它是通过将一个小的常数ε加到等于损耗J相对于x的梯度方向而生成的。 图1是计算机视觉领域中FGSM攻击的经典图示。 由于我们无法从视觉上识别出小于1%的图像变化,因此图像从具有中等置信度的正确分类变为具有较高置信度的错误分类。

The fact that these simple methods can actually fool a deep neural network is a further evidence that adversarial examples exist because of neural network’s linearity.

这些简单的方法实际上可以欺骗深度神经网络,这一事实进一步证明了由于神经网络的线性,存在对抗性示例。

PyTorch中的FGSM (FGSM in PyTorch)

To build the FGSM attack in PyTorch, we can use the CleverHans library provided and carefully maintained by Ian Goodfellow and Nicolas Papernot. The library provides multiple attacks and defenses and is widely used today for benchmarking. Although the majority of attacks were implemented in Tensorflow, they recently released the codes for FGSM in PyTorch as well.

要在PyTorch中进行FGSM攻击,我们可以使用Ian Goodfellow和Nicolas Papernot提供并精心维护的CleverHans库。 该库提供多种攻击和防御,并且今天已广泛用于基准测试。 尽管大多数攻击是在Tensorflow中实施的,但他们最近也在PyTorch中发布了FGSM的代码。

The library can be downloaded and installed with the following command:

可以使用以下命令下载并安装该库:

pip install git+https://github.com/tensorflow/cleverhans.git#egg=cleverhans

We will use the simple MNIST dataset to demonstrate how to build the attack.

我们将使用简单的MNIST数据集来演示如何构建攻击。

创建模型和数据加载器 (Creating the model and data loader)

Firstly, we have to create an ordinary PyTorch model and data loader for the MNIST dataset.

首先,我们必须为MNIST数据集创建一个普通的PyTorch模型和数据加载器。

For demonstration, we will build a simple convolutional network as the following:

为了演示,我们将构建一个简单的卷积网络,如下所示:

# Creating a simple network
class LeNet5(torch.nn.Module):          
     
    def __init__(self):     
        super(LeNet5, self).__init__()
        self.conv1 = torch.nn.Conv2d(1, 6, 5, padding=2)
        self.conv2 = torch.nn.Conv2d(6, 16, 5)
        self.fc1 = nn.Linear(16*5*5, 120)   
        self.fc2 = nn.Linear(120, 84)       
        self.fc3 = nn.Linear(84, 10)    
        
    def forward(self, x):
        x = F.relu(self.conv1(x))  
        x = F.max_pool2d(x, 2) 
        x = F.relu(self.conv2(x))
        x = F.max_pool2d(x, 2)
        x = x.view(-1, 16*5*5)
        x = F.relu(self.fc1(x))
        x = F.relu(self.fc2(x))
        x = self.fc3(x)
        
        return F.log_softmax(x,dim=-1)

and the data loader as the following:

和数据加载器如下:

train_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=True, download=True,
                    transform=transforms.ToTensor()),
    batch_size=batch_size, shuffle=True)
test_loader = torch.utils.data.DataLoader(
    datasets.MNIST('data', train=False, transform=transforms.ToTensor()),
    batch_size=batch_size)

Afterwards, we implement a normal forward method to train the network on normal data:

之后,我们实现了普通转发方法来对网络进行常规数据训练:

def trainTorch(torch_model, train_loader, test_loader,
        nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE, train_end=-1, test_end=-1, learning_rate=LEARNING_RATE, optimizer=None):


    train_loss = []
    total = 0
    correct = 0
    step = 0
    for _epoch in range(nb_epochs):
      for xs, ys in train_loader:
        xs, ys = Variable(xs), Variable(ys)
        if torch.cuda.is_available():
          xs, ys = xs.cuda(), ys.cuda()
        optimizer.zero_grad()
        preds = torch_model(xs)
        loss = F.nll_loss(preds, ys)
        loss.backward()  # calc gradients
        train_loss.append(loss.data.item())
        optimizer.step()  # update gradients


        preds_np = preds.cpu().detach().numpy()
        correct += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
        total += train_loader.batch_size
        step += 1
        if total % 1000 == 0:
          acc = float(correct) / total
          print('[%s] Training accuracy: %.2f%%' % (step, acc * 100))
          total = 0
          correct = 0

By setting the batch size to 128, number of epochs to 4, and learning rate to 0.001, the network successfully achieves an accuracy of around 98% on the MNIST dataset after training.

通过将批次大小设置为128,将时期数设置为4,将学习率设置为0.001,网络在训练后成功地在MNIST数据集上实现了约98%的精度。

施加攻击 (Applying the Attack)

After training the network, we can then apply the FGSM attack given the network architecture.

训练完网络之后,我们可以根据网络架构应用FGSM攻击。

To do so, we have to first import the required functions from CleverHans:

为此,我们必须首先从CleverHans导入所需的功能:

from cleverhans.future.torch.attacks.fast_gradient_method import fast_gradient_method

This allows us to call the fast_gradient_method() function, which is simple and straightforward: Given the model, an input x, an ε, and a norm (norm=np.inf, 1, or 2), the function outputs a perturbed x.

这使我们可以调用fast_gradient_method()函数,该函数简单明了:给定模型,输入x,ε和范数(norm = np.inf,1或2),该函数输出扰动的x 。

We can then slightly change the original forward function by feeding the perturbed x instead of the original x to measure the results as the following:

然后,我们可以通过馈送被扰动的x而不是原始x来稍微改变原始正向函数,以如下方式测量结果:

def evalAdvAttack(fgsm_model=None, test_loader=None):
    print("Evaluating single model results on adv data")
    total = 0
    correct = 0
    fgsm_model.eval()
    for xs, ys in test_loader:
      if torch.cuda.is_available():
        xs, ys = xs.cuda(), ys.cuda()
      #pytorch fast gradient method
      xs = fast_gradient_method(fgsm_model, xs, eps=0.1, norm=np.inf, clip_min=0., clip_max=1.)
      # xs = fast_gradient_method(fgsm_model, xs, eps=0.1, norm=np.inf)
      xs, ys = Variable(xs), Variable(ys)
      preds1 = fgsm_model(xs)
      preds_np1 = preds1.cpu().detach().numpy()
      finalPred = np.argmax(preds_np1, axis=1)
      correct += (finalPred == ys.cpu().detach().numpy()).sum()
      total += test_loader.batch_size
    acc = float(correct) / total
    print('Adv accuracy: {:.3f}%'.format(acc * 100))

The above attack, after testing, can actually force the accuracy to drop drastically from 98% to around 4%, proving that small perturbations, if on the correct direction, will actually lead to the network performing very poorly.

经过测试,上述攻击实际上可以迫使精度从98%急剧下降到4%左右,证明如果朝正确的方向进行小扰动实际上会导致网络性能非常差。

PyTorch的对抗训练 (Adversarial Training in PyTorch)

In the same paper by Ian et al, they proposed the adversarial training method to combat these samples. In simple words, the adversarial samples generated from the training set were also included in the training.

在Ian等人的同一篇论文中,他们提出了对抗训练的方法来对抗这些样本。 简而言之,从训练集生成的对抗样本也包括在训练中。

This concept can be easily implemented into the code by feeding both the original and the perturbed training set into the architecture at the same time. Note that both types of data should be used for adversarial training to prevent the loss in accuracy on the original set of data. The code below is my implementation of adversarial training:

通过将原始训练集和受扰训练集同时输入到体系结构中,可以很容易地将此​​概念实现到代码中。 请注意,两种数据类型都应用于对抗训练,以防止原始数据集准确性下降。 下面的代码是我对战训练的实现:

def advTrain(torch_model, train_loader, test_loader,
        nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE, train_end=-1, test_end=-1, learning_rate=LEARNING_RATE):
    optimizer = optim.Adam(torch_model.parameters(), lr=learning_rate)
    train_loss = []
    total = 0
    correct = 0
    totalAdv = 0
    correctAdv = 0
    step = 0
    # breakstep = 0
    for _epoch in range(nb_epochs):
      for xs, ys in train_loader:
        #Normal Training
        xs, ys = Variable(xs), Variable(ys)
        if torch.cuda.is_available():
          xs, ys = xs.cuda(), ys.cuda()
        optimizer.zero_grad()
        preds = torch_model(xs)
        loss = F.nll_loss(preds, ys)
        loss.backward()  # calc gradients
        train_loss.append(loss.data.item())
        optimizer.step()  # update gradients
        preds_np = preds.cpu().detach().numpy()
        correct += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
        total += train_loader.batch_size


        #Adversarial Training
        xs = fast_gradient_method(torch_model, xs, eps=0.3, norm=np.inf, clip_min=0., clip_max=1.)
        xs, ys = Variable(xs), Variable(ys)
        if torch.cuda.is_available():
            xs, ys = xs.cuda(), ys.cuda()
        optimizer.zero_grad()
        preds = torch_model(xs)
        loss = F.nll_loss(preds, ys)
        loss.backward()  # calc gradients
        train_loss.append(loss.data.item())
        optimizer.step()  # update gradients
        preds_np = preds.cpu().detach().numpy()
        correctAdv += (np.argmax(preds_np, axis=1) == ys.cpu().detach().numpy()).sum()
        totalAdv += train_loader.batch_size
        
        step += 1
        if total % 1000 == 0:
          acc = float(correct) / total
          print('[%s] Clean Training accuracy: %.2f%%' % (step, acc * 100))
          total = 0
          correct = 0
          accAdv = float(correctAdv) / totalAdv
          print('[%s] Adv Training accuracy: %.2f%%' % (step, accAdv * 100))
          totalAdv = 0
          correctAdv = 0

Note that the network starts from the checkpoint where it is already trained on clean data. Both the clean and adversarial examples are fed into the network during adversarial training to prevent an accuracy decrease on clean data during further training.

请注意,网络从已经接受过干净数据培训的检查点开始。 在对抗训练期间,将干净的示例和对抗的示例都馈送到网络中,以防止在进一步的训练期间降低干净数据的准确性。

With the same batch size, epochs, and learning rate settings, we could actually increase the accuracy back to approximately 90% for adversarial examples while maintaining the accuracy on clean data.

使用相同的批次大小,时间段和学习率设置,我们实际上可以将对抗性示例的准确性提高到大约90%,同时保持原始数据的准确性。

对抗训练的问题 (Problem with Adversarial Training)

Although the aforementioned example illustrates how adversarial training could be adopted to generalise the model architecture, one main issue is that they will only be effective on a specific type of attack that the model is trained on. With different attacks generating different adversarial examples, the adversarial training method needs to be further investigated and evaluated for better adversarial defense.

尽管上述示例说明了如何采用对抗训练来概括模型体系结构,但是一个主要问题是,它们仅对训练模型的特定类型的攻击有效。 由于不同的攻击会产生不同的对抗示例,因此需要进一步研究和评估对抗训练方法,以实现更好的对抗防御。

结论 (Conclusion)

FGSM and adversarial training are one of the earliest attacks and defenses. Recent attacks such as the C&W attack and DeepFool and defenses such as distillation have opened up new opportunities for future research and investigation. This article serves as an introduction to the field of adversarial attacks and hopefully sparks your interest to dig deeper into this field! The full code of my implementation is also posted in my Github:

FGSM和对抗训练是最早的攻击和防御之一。 C&W攻击和DeepFool等最近的攻击以及蒸馏等防御措施为未来的研究和调查打开了新的机会。 本文将介绍对抗性攻击领域,并希望激发您的兴趣,进一步深入该领域! 我的实现的完整代码也发布在我的Github中:

Image for post

Thank you for making it this far 🙏! I will be posting more on different areas of computer vision/deep learning, make sure to check out my other articles and articles by Chuan En Lin too!

谢谢你这么远 far! 我将在计算机视觉/深度学习的不同领域发表更多文章,请确保也查看我的其他文章和 Chuan En Lin的 文章

翻译自: https://towardsdatascience.com/adversarial-attack-and-defense-on-neural-networks-in-pytorch-82b5bcd9171

火炬电阻

  • 0
    点赞
  • 0
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值