对抗机器学习——Fast gradient Sign Method的实践

上文 【对抗机器学习——FGSM经典论文 EXPLAINING AND HARNESSING ADVERSARIAL EXAMPLES】在理论层面介绍了Fast gradient Sign Method 是如何寻找对抗样本的。它的核心思想是假设神经网络最后的目标函数 J ( θ , x , y ) J(\theta,x,y) J(θ,x,y)与 输入x 直接存在着近似的线性关系,然后在 L ∞ ( x , x + δ ) < ϵ L_{\infty}(x,x+\delta)<\epsilon L(x,x+δ)<ϵ约束下,让 x x x 沿着 梯度方向 ∇ x J ( θ , x , y ) \nabla_{x} J(\theta,x,y) xJ(θ,x,y) 增加,使得目标损失函数变大。
本文就来实操一下这个FSGM方法。
Tensorflow已经出了官方的对抗机器学习库 cleverhans 。这个库里面集成了目前学术界提出的大部分对抗方法和防御方法。

cleverhans安装方法

cleverhans是基于Tensorflow的,因此安装它之前必须得安装tensorflow。

为了方便修改cleverhans的代码,便于观察中间结果,免去权限管理能麻烦问题,我们可以把代码下载到本地,然后修改PYTHONPATH环境变量,让python直接使用我们定制的代码。

wget https://github.com/tensorflow/cleverhans/archive/v.3.0.1.tar.gz
tar -xvf v.3.0.1.tar.gz

假设解压后cleverhans所在的目录为x/cleverhans-3.0.1,那么修改环境变量:

export PYTHONPATH=x/cleverhans-3.0.1:PYTHONPATH

代码:

"""
This tutorial shows how to generate adversarial examples using FGSM
and train a model using adversarial training with TensorFlow.
It is very similar to mnist_tutorial_keras_tf.py, which does the same
thing but with a dependence on keras.
The original paper can be found at:
https://arxiv.org/abs/1412.6572
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from __future__ import unicode_literals

import logging
import numpy as np
import tensorflow as tf
from tensorflow.python.platform import flags

from cleverhans.loss import CrossEntropy
from cleverhans.dataset import MNIST
from cleverhans.utils_tf import model_eval
from cleverhans.train import train
from cleverhans.attacks import FastGradientMethod
from cleverhans.utils import AccuracyReport, set_log_level
from cleverhans_tutorials.tutorial_models import ModelBasicCNN

FLAGS = flags.FLAGS

NB_EPOCHS = 6
BATCH_SIZE = 128
LEARNING_RATE = 0.001
CLEAN_TRAIN = True
BACKPROP_THROUGH_ATTACK = False
NB_FILTERS = 64


def mnist_tutorial(train_start=0, train_end=60000, test_start=0,
                   test_end=10000, nb_epochs=NB_EPOCHS, batch_size=BATCH_SIZE,
                   learning_rate=LEARNING_RATE,
                   clean_train=CLEAN_TRAIN,
                   testing=False,
                   backprop_through_attack=BACKPROP_THROUGH_ATTACK,
                   nb_filters=NB_FILTERS, num_threads=None,
                   label_smoothing=0.1):
  """
  MNIST cleverhans tutorial
  :param train_start: index of first training set example
  :param train_end: index of last training set example
  :param test_start: index of first test set example
  :param test_end: index of last test set example
  :param nb_epochs: number of epochs to train model
  :param batch_size: size of training batches
  :param learning_rate: learning rate for training
  :param clean_train: perform normal training on clean examples only
                      before performing adversarial training.
  :param testing: if true, complete an AccuracyReport for unit tests
                  to verify that performance is adequate
  :param backprop_through_attack: If True, backprop through adversarial
                                  example construction process during
                                  adversarial training.
  :param label_smoothing: float, amount of label smoothing for cross entropy
  :return: an AccuracyReport object
  """

  # Object used to keep track of (and return) key accuracies
  report = AccuracyReport()

  # Set TF random seed to improve reproducibility
  tf.set_random_seed(1234)

  # Set logging level to see debug information
  set_log_level(logging.DEBUG)

  # Create TF session
  if num_threads:
    config_args = dict(intra_op_parallelism_threads=1)
  else:
    config_args = {}
  sess = tf.Session(config=tf.ConfigProto(**config_args))

  # Get MNIST data
  mnist = MNIST(train_start=train_start, train_end=train_end,
                test_start=test_start, test_end=test_end)
  x_train, y_train = mnist.get_set('train')
  x_test, y_test = mnist.get_set('test')

  # Use Image Parameters
  img_rows, img_cols, nchannels = x_train.shape[1:4]
  nb_classes = y_train.shape[1]

  # Define input TF placeholder
  x = tf.placeholder(tf.float32, shape=(None, img_rows, img_cols,
                                        nchannels))
  y = tf.placeholder(tf.float32, shape=(None, nb_classes))

  # Train an MNIST model
  train_params = {
      'nb_epochs': nb_epochs,
      'batch_size': batch_size,
      'learning_rate': learning_rate
  }
  eval_params = {'batch_size': batch_size}
  fgsm_params = {
      'eps': 0.2,
      'clip_min': 0.,
      'clip_max': 1.
  }
  rng = np.random.RandomState([2017, 8, 30])

  def do_eval(preds, x_set, y_set, report_key, is_adv=None):
    if is_adv==True:
        acc = model_eval(sess, x, y, preds, x_set, y_set, args=eval_params,debug=True)
    elif is_adv==False:
        acc = model_eval(sess, x, y, preds, x_set, y_set, args=eval_params)
    setattr(report, report_key, acc)
    if is_adv is None:
      report_text = None
    elif is_adv:
      report_text = 'adversarial'
    else:
      report_text = 'legitimate'
    if report_text:
      print('Test accuracy on %s examples: %0.4f' % (report_text, acc))

  if clean_train:
    model = ModelBasicCNN('model1', nb_classes, nb_filters)
    preds = model.get_logits(x)
    loss = CrossEntropy(model, smoothing=label_smoothing)

    def evaluate():
      do_eval(preds, x_test, y_test, 'clean_train_clean_eval', False)

    train(sess, loss, x_train, y_train, evaluate=evaluate,
          args=train_params, rng=rng, var_list=model.get_params())

    # Calculate training error
    if testing:
      do_eval(preds, x_train, y_train, 'train_clean_train_clean_eval')

    # Initialize the Fast Gradient Sign Method (FGSM) attack object and
    # graph
    fgsm = FastGradientMethod(model, sess=sess)
    adv_x = fgsm.generate(x, **fgsm_params)
    preds_adv = model.get_logits(adv_x)
    # Evaluate the accuracy of the MNIST model on adversarial examples
    do_eval(preds_adv, x_test, y_test, 'clean_train_adv_eval', True)
    # Calculate training error
    if testing:
      do_eval(preds_adv, x_train, y_train, 'train_clean_train_adv_eval')
  return report


def main(argv=None):
  from cleverhans_tutorials import check_installation
  check_installation(__file__)

  mnist_tutorial(nb_epochs=FLAGS.nb_epochs, batch_size=FLAGS.batch_size,
                 learning_rate=FLAGS.learning_rate,
                 clean_train=FLAGS.clean_train,
                 backprop_through_attack=FLAGS.backprop_through_attack,
                 nb_filters=FLAGS.nb_filters)


if __name__ == '__main__':
  flags.DEFINE_integer('nb_filters', NB_FILTERS,
                       'Model size multiplier')
  flags.DEFINE_integer('nb_epochs', NB_EPOCHS,
                       'Number of epochs to train model')
  flags.DEFINE_integer('batch_size', BATCH_SIZE,
                       'Size of training batches')
  flags.DEFINE_float('learning_rate', LEARNING_RATE,
                     'Learning rate for training')
  flags.DEFINE_bool('clean_train', CLEAN_TRAIN, 'Train on clean examples')
  flags.DEFINE_bool('backprop_through_attack', BACKPROP_THROUGH_ATTACK,
                    ('If True, backprop through adversarial example '
                     'construction process during adversarial training'))

  tf.app.run()

跑这个代码就完事啦。

以下是一些对抗结果:

ϵ \epsilon ϵ攻击成功率
0.497.5%
0.391.2%
0.266%
0.123%
0.088%
0.065.1%

一些对抗样本的logit:
realy: [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0]
predict: [0.21056737, -0.13243113, 0.51316774, 0.51515293, -0.22385997, -0.36223292, -0.28980073, -0.61583054, 0.5446904, -0.14677687]

realy: [0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
predict: [0.0622759, -0.15177919, 0.28638932, 0.48950654, -0.22479388, 0.34589207, -0.0262568, -0.24448192, 0.38647628, 0.0044806357]

realy: [0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0]
predict: [-0.048175618, -0.29588905, 0.6424468, 0.2426339, -0.08349679, -0.25063646, -0.2635439, -0.39338973, 0.80821, 0.10917343]

其实看这些输出,并没有出现对抗里面错误标签的置信度特别高的现象,因此论文里面提到的置信度这么高可能只是个恰合。

  • 0
    点赞
  • 6
    收藏
    觉得还不错? 一键收藏
  • 0
    评论
强化学习中的策略梯度(policy gradient)是一种基于优化策略的方法,它直接对策略进行优化,而不是先估计值函数,再通过值函数来优化策略。策略梯度方法可以解决连续动作空间的问题,并且可以处理高维状态空间的问题。 策略梯度方法的主要思想是通过梯度上升来优化策略,即不断地调整策略参数,使得策略获得更高的奖励。这个过程可以通过计算策略在当前状态下采取各个动作的概率,然后根据奖励函数来更新策略参数。 策略梯度方法的优点是可以处理连续动作空间和高维状态空间的问题,并且可以处理非凸、非线性的问题。但是,策略梯度方法的缺点是收敛速度慢,容易陷入局部最优解。 以下是一些关于RL-Policy Gradient的资料: 1. Reinforcement Learning: An Introduction(强化学习:导论)书籍中关于Policy Gradient的章节:https://web.stanford.edu/class/psych209/Readings/SuttonBartoIPRLBook2ndEd.pdf 2. Policy Gradient Methods for Reinforcement Learning with Function Approximation论文:https://papers.nips.cc/paper/1713-policy-gradient-methods-for-reinforcement-learning-with-function-approximation.pdf 3. Deep Reinforcement Learning: Pong from Pixels论文:https://arxiv.org/pdf/1312.5602.pdf 4. Policy Gradient Methods for Robotics论文:https://arxiv.org/pdf/1709.06009.pdf 5. RL-Adventure-2:Policy Gradient Algorithms Pytorch实现的代码:https://github.com/higgsfield/RL-Adventure-2 6. Policy Gradient Algorithms笔记:https://lilianweng.github.io/lil-log/2018/04/08/policy-gradient-algorithms.html

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值