利用python实现深度学习生成对抗样本模型,为任一图片加扰动并恢复原像素的全流程记录

一、前言

每个语言都有自己的独到之处,C适合嵌入式开发,C++适合桌面端软件,java适合移动端应用,js适合网页,而python,则是为了机器学习,人工只能,应运而生似得。
这次尝试通过python,实现对抗样本模型的生成,并依据此样本,处理任一图片,并实现图片的尺寸还原,做一下记录。
首先,对一些基本概念解释一下,我只通过我的理解记录,可能不够准确,专业解释请自行百度。

(一)什么是深度学习

机器学习,举例来说,比如为了让计算机能识别出一张图片中有一只鸟,他需要进行学习,学习的过程不同于人类的思维,他是通过大量的整理好的鸟的图片,通过类似数字图像处理的方法,将每张图片分解成一个个特征,他通过记忆这些特征,并且分析出某些特征,如果在大量的鸟的图片中,都曾出现过的,计算机便认为这些特征,便是识别图片中有一只鸟的重要特征,这时候丢给计算机一张图片,如果此时图片上也能搜寻到类似的特征,计算机便把这张图片识别成鸟的图片。

(二)什么是样本模型

样本模型,则是刚刚提到的,通过大量图片,识别出来的,某一物品的重要特征的所组成的库。当然,这个库里可以包含更多物品的重要特征,不见得只有鸟或者狒狒。

(三)什么是对抗样本

对抗样本就是基于样本模型,即这个特征库已经可以较高概率的通过其所记录的特征,来较为准确的描述某一物品的前提下,进行的操作。
说到对抗样本,要搞清楚几个问题,非常重要,不然很难理解,网络上掉书袋的解释非常多,不好懂,我尝试口水解释,当然可能会有偏差,专业解释还请自行百度。

1、对抗的目的

就是给一个训练好的样本模型,任意一张包含在此模型的物品类型的图片,比如一只小鸟的图片,他能准确识别出这是一只鸟,但是经过对这张图片的一些处理后,这个样本模型或者别的包含此类物品的样本模型,竟然都把他识别成了大象,到这就结束了?不,重头戏在后面,这张图片,此时要不影响人眼的目视效果,即人眼看着没什么变化,而计算机却已经认不出这张图片来,这样才算是对抗成功了。

2、谁来对抗?

谁来对抗,当然是你选中的那张,要通过样本模型来识别的图片。

3、对抗的敌人是谁?

划重点,所指的对抗,到底对抗的是什么?对抗的其实是试图来识别这张图片上面的物品类型的那个样本模型,那么就说得清楚了,网上那些牛逼哄哄的名词,什么白盒攻击,黑盒攻击,到底是什么意思了。
白盒攻击,解释起来就是以己之矛,攻己之盾,没错,就是拿自己的模型,来识别这张图片,正常情况下,当然是这个意思就是,测试一下对抗样本的加扰动的成功与否。
黑盒攻击,就是以彼之矛,攻己之盾,就是用别人家训练的样本模型,即通过不同的样本,训练出来的模型,比如这个模型里也有鸟这个物品,用他来尝试识别这张图片,是否能识别出图片中那是一只可爱的小鸟。

4、怎么对抗?

就是刚刚说到的,怎么处理这个图片,即给图片如何加上扰动。这个就是诸位大神们,研究的重点了。方法不少,有各种算法,有名的比如FGSM、IGSM、DeepFool、JSMA等等,实现这些算法的方法也简单,都不用自己写了,直接调用就能实现,直接调用art包,这个art包,是IBM公司开发的AE工具箱(Adversarial Robustness Toolbox),python下可以直接通过pip3工具安装。
以下是重点!!!
但是我要说的不仅如此,而是我刚一接触深度学习时,困扰了我很久的一个问题,就是如果只是给图片加上扰动,跟样本模型有啥关系?我是不是用不着样本模型,直接加扰动,跳过这一步也行?因为网上能够搜到的恶意样本的相关解释也很单一,都没强调这方面的解释,也无怪乎我这样的自学者,有此困扰了。有几个问题,需要特意解释一下。

(1)这里有两个样本模型

一个样本模型是用来训练图片的样本模型,即用来处理图片的样本模型,另一个是被攻击的样本模型,即用来测试的样本模型,当然,这两个可以是一个,就是白盒和黑盒攻击的区别。

(2)用来训练图片的样本模型和加扰动的关系

给图片加扰动,到底需不需要样本模型?或者说样本模型在这个过程中,起了一个什么作用?答案当然是肯定的,需要,而且是决定性的作用。我大致可以这么理解,给图片加扰动,实质上就是在给图片,与别的物品的特征进行拟合,甚至多次拟合,什么是拟合?简单的说,就是把图片上的某些鸟的特征,给杂糅进大猩猩的特征,这里本来会被识别成鸟翅膀的,因为拟合了大猩猩的特征,变成了大猩猩的手臂,从而这张图片就被识别成了大猩猩。就是说,样本模型在其中起到的作用,就是选定一个别的物品的特征,将其记录的特征,拟合进这张图片中去,以至于这张图片会被识别成别的物品。

(3)在这个过程中,我们要做的是什么?

在刚刚解释的拟合概念中,就会出现一个拟合度的问题,即大猩猩的特征这么多,加多少到这张图片中去,这是一个很重要的问题,加多了,影响了人的目视效果,不影响目视效果,是一切的前提,加少了,就会提高了被对抗的样本模型的识别正确的概率。因此,就需要在人眼的识别度和计算机的识别度之间取一个平衡,这个度,就是拟合度,就是我们需要不断调整的一个值了。

二、实现代码

刚刚花了大篇幅来解释清楚了几个概念,我以为非常重要,所以不厌其烦。
下面,我将要带着大家来尝试,实现整个过程。另,文末附上全文代码,包含两个训练好的模型。

(一)获取样本模型

1、选取样本

样本的选择余地挺大的,我看中了两种样本,一个是cifar10,另一个是inception_v3,选前者是因为我想尝试自己生成训练模型,选择后者是因为这是谷歌的现成的训练模型,成熟,并且训练的图片的像素高,在299299,而前者的模型的图片像素是3232,实际使用中,我选后者。

2、训练模型

训练模型,我就尝试用cifar10训练一个自己的模型了,关于这个模型,特别解释一下,如果直接用keras进行训练,如果本地没有这个样本库,就会自动下载这个样本库,然而这个过程非常的漫长且容易失败,因此,推荐自行下载这个训练样本库,下载文件cifar-10-batches-py.tar.gz(文末附上下载链接),注意,划重点,你所下载的此样本库的名称可以是别的名字,但是下载以后,务必改成这个名字,而且必须放在指定目录下,在ubuntu下的目录为/home/XXX/.keras/datasets/cifar-10-batches-py.tar.gz,在windows下记不清了,反正大概是用户文件夹下的,/.keras/datasets/cifar-10-batches-py.tar.gz,这个目录,没错,windows也是用.gz结尾的压缩包。只有在这个位置,才能被keras默认识别到。
好了,直接上训练的代码。

"""
#Trains a ResNet on the CIFAR10 dataset.
ResNet v1:
[Deep Residual Learning for Image Recognition
](https://arxiv.org/pdf/1512.03385.pdf)
ResNet v2:
[Identity Mappings in Deep Residual Networks
](https://arxiv.org/pdf/1603.05027.pdf)
Model|n|200-epoch accuracy|Original paper accuracy |sec/epoch GTX1080Ti
:------------|--:|-------:|-----------------------:|---:
ResNet20   v1|  3| 92.16 %|                 91.25 %|35
ResNet32   v1|  5| 92.46 %|                 92.49 %|50
ResNet44   v1|  7| 92.50 %|                 92.83 %|70
ResNet56   v1|  9| 92.71 %|                 93.03 %|90
ResNet110  v1| 18| 92.65 %|            93.39+-.16 %|165
ResNet164  v1| 27|     - %|                 94.07 %|  -
ResNet1001 v1|N/A|     - %|                 92.39 %|  -
 
Model|n|200-epoch accuracy|Original paper accuracy |sec/epoch GTX1080Ti
:------------|--:|-------:|-----------------------:|---:
ResNet20   v2|  2|     - %|                     - %|---
ResNet32   v2|N/A| NA    %|            NA         %| NA
ResNet44   v2|N/A| NA    %|            NA         %| NA
ResNet56   v2|  6| 93.01 %|            NA         %|100
ResNet110  v2| 12| 93.15 %|            93.63      %|180
ResNet164  v2| 18|     - %|            94.54      %|  -
ResNet1001 v2|111|     - %|            95.08+-.14 %|  -
"""

from __future__ import print_function
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
import tensorflow as tf

#GPU内存不足,对GPU进行按需分配
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

# Training parameters
batch_size = 32  # orig paper trained all networks with batch_size=128
epochs = 200
data_augmentation = True
num_classes = 10

# Subtracting pixel mean improves accuracy
subtract_pixel_mean = True

# Model parameter
# ----------------------------------------------------------------------------
#           |      | 200-epoch | Orig Paper| 200-epoch | Orig Paper| sec/epoch
# Model     |  n   | ResNet v1 | ResNet v1 | ResNet v2 | ResNet v2 | GTX1080Ti
#           |v1(v2)| %Accuracy | %Accuracy | %Accuracy | %Accuracy | v1 (v2)
# ----------------------------------------------------------------------------
# ResNet20  | 3 (2)| 92.16     | 91.25     | -----     | -----     | 35 (---)
# ResNet32  | 5(NA)| 92.46     | 92.49     | NA        | NA        | 50 ( NA)
# ResNet44  | 7(NA)| 92.50     | 92.83     | NA        | NA        | 70 ( NA)
# ResNet56  | 9 (6)| 92.71     | 93.03     | 93.01     | NA        | 90 (100)
# ResNet110 |18(12)| 92.65     | 93.39+-.16| 93.15     | 93.63     | 165(180)
# ResNet164 |27(18)| -----     | 94.07     | -----     | 94.54     | ---(---)
# ResNet1001| (111)| -----     | 92.39     | -----     | 95.08+-.14| ---(---)
# ---------------------------------------------------------------------------
n = 3

# Model version
# Orig paper: version = 1 (ResNet v1), Improved ResNet: version = 2 (ResNet v2)
version = 1

# Computed depth from supplied model parameter n
if version == 1:
    depth = n * 6 + 2
elif version == 2:
    depth = n * 9 + 2

# Model name, depth and version
model_type = 'ResNet%dv%d' % (depth, version)

# Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

# Input image dimensions.
input_shape = x_train.shape[1:]

# Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255

# If subtract pixel mean is enabled
if subtract_pixel_mean:
    x_train_mean = np.mean(x_train, axis=0)
    x_train -= x_train_mean
    x_test -= x_train_mean

print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
print('y_train shape:', y_train.shape)

# Convert class vectors to binary class matrices.
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)


def lr_schedule(epoch):
    """Learning Rate Schedule
    Learning rate is scheduled to be reduced after 80, 120, 160, 180 epochs.
    Called automatically every epoch as part of callbacks during training.
    # Arguments
        epoch (int): The number of epochs
    # Returns
        lr (float32): learning rate
    """
    lr = 1e-3
    if epoch > 180:
        lr *= 0.5e-3
    elif epoch > 160:
        lr *= 1e-3
    elif epoch > 120:
        lr *= 1e-2
    elif epoch > 80:
        lr *= 1e-1
    print('Learning rate: ', lr)
    return lr


def resnet_layer(inputs,
                 num_filters=16,
                 kernel_size=3,
                 strides=1,
                 activation='relu',
                 batch_normalization=True,
                 conv_first=True):
    """2D Convolution-Batch Normalization-Activation stack builder
    # Arguments
        inputs (tensor): input tensor from input image or previous layer
        num_filters (int): Conv2D number of filters
        kernel_size (int): Conv2D square kernel dimensions
        strides (int): Conv2D square stride dimensions
        activation (string): activation name
        batch_normalization (bool): whether to include batch normalization
        conv_first (bool): conv-bn-activation (True) or
            bn-activation-conv (False)
    # Returns
        x (tensor): tensor as input to the next layer
    """
    conv = Conv2D(num_filters,
                  kernel_size=kernel_size,
                  strides=strides,
                  padding='same',
                  kernel_initializer='he_normal',
                  kernel_regularizer=l2(1e-4))

    x = inputs
    if conv_first:
        x = conv(x)
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
    else:
        if batch_normalization:
            x = BatchNormalization()(x)
        if activation is not None:
            x = Activation(activation)(x)
        x = conv(x)
    return x


def resnet_v1(input_shape, depth, num_classes=10):
    """ResNet Version 1 Model builder [a]
    Stacks of 2 x (3 x 3) Conv2D-BN-ReLU
    Last ReLU is after the shortcut connection.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filters is
    doubled. Within each stage, the layers have the same number filters and the
    same number of filters.
    Features maps sizes:
    stage 0: 32x32, 16
    stage 1: 16x16, 32
    stage 2:  8x8,  64
    The Number of parameters is approx the same as Table 6 of [a]:
    ResNet20 0.27M
    ResNet32 0.46M
    ResNet44 0.66M
    ResNet56 0.85M
    ResNet110 1.7M
    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)
    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 6 != 0:
        raise ValueError('depth should be 6n+2 (eg 20, 32, 44 in [a])')
    # Start model definition.
    num_filters = 16
    num_res_blocks = int((depth - 2) / 6)

    inputs = Input(shape=input_shape)
    x = resnet_layer(inputs=inputs)
    # Instantiate the stack of residual units
    for stack in range(3):
        for res_block in range(num_res_blocks):
            strides = 1
            if stack > 0 and res_block == 0:  # first layer but not first stack
                strides = 2  # downsample
            y = resnet_layer(inputs=x,
                             num_filters=num_filters,
                             strides=strides)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters,
                             activation=None)
            if stack > 0 and res_block == 0:  # first layer but not first stack
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])
            x = Activation('relu')(x)
        num_filters *= 2

    # Add classifier on top.
    # v1 does not use BN after last shortcut connection-ReLU
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model


def resnet_v2(input_shape, depth, num_classes=10):
    """ResNet Version 2 Model builder [b]
    Stacks of (1 x 1)-(3 x 3)-(1 x 1) BN-ReLU-Conv2D or also known as
    bottleneck layer
    First shortcut connection per layer is 1 x 1 Conv2D.
    Second and onwards shortcut connection is identity.
    At the beginning of each stage, the feature map size is halved (downsampled)
    by a convolutional layer with strides=2, while the number of filter maps is
    doubled. Within each stage, the layers have the same number filters and the
    same filter map sizes.
    Features maps sizes:
    conv1  : 32x32,  16
    stage 0: 32x32,  64
    stage 1: 16x16, 128
    stage 2:  8x8,  256
    # Arguments
        input_shape (tensor): shape of input image tensor
        depth (int): number of core convolutional layers
        num_classes (int): number of classes (CIFAR10 has 10)
    # Returns
        model (Model): Keras model instance
    """
    if (depth - 2) % 9 != 0:
        raise ValueError('depth should be 9n+2 (eg 56 or 110 in [b])')
    # Start model definition.
    num_filters_in = 16
    num_res_blocks = int((depth - 2) / 9)

    inputs = Input(shape=input_shape)
    # v2 performs Conv2D with BN-ReLU on input before splitting into 2 paths
    x = resnet_layer(inputs=inputs,
                     num_filters=num_filters_in,
                     conv_first=True)

    # Instantiate the stack of residual units
    for stage in range(3):
        for res_block in range(num_res_blocks):
            activation = 'relu'
            batch_normalization = True
            strides = 1
            if stage == 0:
                num_filters_out = num_filters_in * 4
                if res_block == 0:  # first layer and first stage
                    activation = None
                    batch_normalization = False
            else:
                num_filters_out = num_filters_in * 2
                if res_block == 0:  # first layer but not first stage
                    strides = 2    # downsample

            # bottleneck residual unit
            y = resnet_layer(inputs=x,
                             num_filters=num_filters_in,
                             kernel_size=1,
                             strides=strides,
                             activation=activation,
                             batch_normalization=batch_normalization,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_in,
                             conv_first=False)
            y = resnet_layer(inputs=y,
                             num_filters=num_filters_out,
                             kernel_size=1,
                             conv_first=False)
            if res_block == 0:
                # linear projection residual shortcut connection to match
                # changed dims
                x = resnet_layer(inputs=x,
                                 num_filters=num_filters_out,
                                 kernel_size=1,
                                 strides=strides,
                                 activation=None,
                                 batch_normalization=False)
            x = keras.layers.add([x, y])

        num_filters_in = num_filters_out

    # Add classifier on top.
    # v2 has BN-ReLU before Pooling
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = AveragePooling2D(pool_size=8)(x)
    y = Flatten()(x)
    outputs = Dense(num_classes,
                    activation='softmax',
                    kernel_initializer='he_normal')(y)

    # Instantiate model.
    model = Model(inputs=inputs, outputs=outputs)
    return model


if version == 2:
    model = resnet_v2(input_shape=input_shape, depth=depth)
else:
    model = resnet_v1(input_shape=input_shape, depth=depth)

model.compile(loss='categorical_crossentropy',
              optimizer=Adam(learning_rate=lr_schedule(0)),
              metrics=['accuracy'])
model.summary()
print(model_type)

# Prepare model model saving directory.
save_dir = os.path.join(os.getcwd(), 'saved_models')
model_name = 'cifar10_%s_model.h5' % model_type
if not os.path.isdir(save_dir):
    os.makedirs(save_dir)
filepath = os.path.join(save_dir, model_name)

# Prepare callbacks for model saving and for learning rate adjustment.
checkpoint = ModelCheckpoint(filepath=filepath,
                             monitor='val_acc',
                             verbose=1,
                             save_best_only=True)

lr_scheduler = LearningRateScheduler(lr_schedule)

lr_reducer = ReduceLROnPlateau(factor=np.sqrt(0.1),
                               cooldown=0,
                               patience=5,
                               min_lr=0.5e-6)

callbacks = [checkpoint, lr_reducer, lr_scheduler]

# Run training, with or without data augmentation.
if not data_augmentation:
    print('Not using data augmentation.')
    model.fit(x_train, y_train,
              batch_size=batch_size,
              epochs=epochs,
              validation_data=(x_test, y_test),
              shuffle=True,
              callbacks=callbacks)
else:
    print('Using real-time data augmentation.')
    # This will do preprocessing and realtime data augmentation:
    datagen = ImageDataGenerator(
        # set input mean to 0 over the dataset
        featurewise_center=False,
        # set each sample mean to 0
        samplewise_center=False,
        # divide inputs by std of dataset
        featurewise_std_normalization=False,
        # divide each input by its std
        samplewise_std_normalization=False,
        # apply ZCA whitening
        zca_whitening=False,
        # epsilon for ZCA whitening
        zca_epsilon=1e-06,
        # randomly rotate images in the range (deg 0 to 180)
        rotation_range=0,
        # randomly shift images horizontally
        width_shift_range=0.1,
        # randomly shift images vertically
        height_shift_range=0.1,
        # set range for random shear
        shear_range=0.,
        # set range for random zoom
        zoom_range=0.,
        # set range for random channel shifts
        channel_shift_range=0.,
        # set mode for filling points outside the input boundaries
        fill_mode='nearest',
        # value used for fill_mode = "constant"
        cval=0.,
        # randomly flip images
        horizontal_flip=True,
        # randomly flip images
        vertical_flip=False,
        # set rescaling factor (applied before any other transformation)
        rescale=None,
        # set function that will be applied on each input
        preprocessing_function=None,
        # image data format, either "channels_first" or "channels_last"
        data_format=None,
        # fraction of images reserved for validation (strictly between 0 and 1)
        validation_split=0.0)

    # Compute quantities required for featurewise normalization
    # (std, mean, and principal components if ZCA whitening is applied).
    datagen.fit(x_train)

    # Fit the model on the batches generated by datagen.flow().
    model.fit_generator(datagen.flow(x_train, y_train, batch_size=batch_size),
                        validation_data=(x_test, y_test),
                        epochs=epochs, verbose=1, workers=4,
                        callbacks=callbacks)
#保存模型
model.save(filepath)

# Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])

懒得解释代码,自行体会吧。最后会生成一个名为cifar10_ResNet20v1_model.h5的模型。
这段代码中,会自动尝试使用GPU进行训练,如果不支持,则使用CPU进行训练,不过那样会非常的慢。我可能会在另一篇文章中,简单记录怎么在ubuntu下安装,英伟达GPU+cudnn+cuda这样的黄金组合,现在暂时不说了。

(二)处理图片(即生成对抗图片)

处理图片我还是使用inception_v3.ckpt这个模型,因为他生成的图片像素较大。这一段内容,我加了较多的注释,可以自行阅读。

import tensorflow as tf
import tensorflow.contrib.slim as slim
import tensorflow.contrib.slim.nets as nets
import PIL
import numpy as np
import tempfile
from urllib.request import urlretrieve
import tarfile
import os
from PIL import Image
import json
import matplotlib.pyplot as plt
import matplotlib.image as mp

#首先,设置输入图像。使用tf.Variable而不是使用tf.placeholder,这是因为要确保它是可训练的。当我们需要时,仍然可以输入它。
tf.logging.set_verbosity(tf.logging.ERROR)
sess = tf.InteractiveSession()
image = tf.Variable(tf.zeros((299, 299, 3)))

#加载Inception v3模型
def inception(image, reuse):
    #multiply矩阵对应位置相乘,subtract矩阵对应位置相减,expand_dims,为0时,转变为一维函数
    preprocessed = tf.multiply(tf.subtract(tf.expand_dims(image, 0), 0.5), 2.0)
    #weight_decay衰减权重
    arg_scope = nets.inception.inception_v3_arg_scope(weight_decay=0.0)
    #arg_scope常用于为tensorflow里的layer函数提供默认值,以使构建模型的代码更加紧凑苗条(slim)
    # 定义inception-v3模型结构 inception_v3.ckpt里只有参数的取值
    with slim.arg_scope(arg_scope):
        # logits  inception_v3前向传播得到的结果
        logits, _ = nets.inception.inception_v3(
            preprocessed, 1001, is_training=False, reuse=reuse)
        logits = logits[:, 1:]  # ignore background class
        #Softmax简单的说就是把一个N*1的向量归一化为(0,1)之间的值,归一化
        probs = tf.nn.softmax(logits)  # probabilities
    return logits, probs


logits, probs = inception(image, reuse=False)

#加载预训练的权重
data_dir = './saved_models'
# inception_tarball, _ = urlretrieve(
#     'http://download.tensorflow.org/models/inception_v3_2016_08_28.tar.gz')
# tarfile.open(inception_tarball, 'r:gz').extractall(data_dir)
restore_vars = [
    var for var in tf.global_variables()
    #startsWith()方法用来判断当前字符串是否是以另外一个给定的子字符串“开头”的,根据判断结果返回 true 或 false
    if var.name.startswith('InceptionV3/')
]

#创建一个saver
saver = tf.train.Saver(restore_vars)
#恢复模型
saver.restore(sess, os.path.join(data_dir, 'inception_v3.ckpt'))

#显示图像,并对它进行分类及显示分类结果
#是Imagenet图像的类别标注json文件
imagenet_json, _ = urlretrieve(
    'http://www.anishathalye.com/media/2017/07/25/imagenet.json')
with open(imagenet_json) as f:
    imagenet_labels = json.load(f)


#创建显示界面
def classify(img, correct_class=None, target_class=None, label='o'):
    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(10, 8))
    fig.sca(ax1)
    p = sess.run(probs, feed_dict={image: img})[0]
    ax1.imshow(img)
    fig.sca(ax1)
    topk = list(p.argsort()[-10:][::-1])
    topprobs = p[topk]
    print(topprobs)
    barlist = ax2.bar(range(10), topprobs)
    for t in topk:
        print(topk.index(t))
        barlist[topk.index(t)].set_color('r')
    for i in topk:
        print(topk.index(i))
        barlist[topk.index(i)].set_color('g')
    plt.sca(ax2)
    plt.ylim([0, 1.1])
    plt.xticks(range(10),
               [imagenet_labels[i][:15] for i in topk],
               rotation='vertical')
    fig.subplots_adjust(bottom=0.2)
    plt.show()


#加载图像,并确保它已被正确分类
img_path = './picture/test_adv.jpg'
#图片类型
# img_class = 388  # “大熊猫 giant panda”
img = PIL.Image.open(img_path)
#获取宽度和高度之间的最大值
big_dim = max(img.width, img.height)
#判断宽度是否大于高度
wide = img.width > img.height
#如果wide是false
new_w = 299 if not wide else int(img.width * 299 / img.height)
#如果wide是true
new_h = 299 if wide else int(img.height * 299 / img.width)
#重新设置尺寸尺寸,长宽最大值位299
img = img.resize((new_w, new_h)).crop((0, 0, 299, 299))
#归一化
img = (np.asarray(img) / 255.0).astype(np.float32)
classify(img)
# classify(img, correct_class=img_class, label='o')

#编写一个TensorFlow op进行相应的初始化
x = tf.placeholder(tf.float32, (299, 299, 3))
#输入可训练的对抗样本
x_hat = image  # our trainable adversarial input
#赋值,给x_hat赋x值
assign_op = tf.assign(x_hat, x)

#编写梯度下降步骤以最大化目标类的对数概率
#生成4字节数组
learning_rate = tf.placeholder(tf.float32, ())
#生成4字节数组
y_hat = tf.placeholder(tf.int32, ())
#将标签信息转换成one_hot格式,方便评价
labels = tf.one_hot(y_hat, 1000)
#求取输出属于某一类的概率,衡量各个概率分布之间的相似性
loss = tf.nn.softmax_cross_entropy_with_logits(logits=logits, labels=[labels])
#实现实现梯度下降算法的优化器类
optim_step = tf.train.GradientDescentOptimizer(
    learning_rate).minimize(loss, var_list=[x_hat])

#编写投影步骤,使得对抗样本在视觉上与原始图像相似。另外,将其限定为[0,1]范围内保持有效的图像
epsilon = tf.placeholder(tf.float32, ())

below = x - epsilon
above = x + epsilon
projected = tf.clip_by_value(tf.clip_by_value(x_hat, below, above), 0, 1)
with tf.control_dependencies([projected]):
    project_step = tf.assign(x_hat, projected)

#准备合成一个对抗样本。我们任意选择长臂猿作为我们的目标类
demo_epsilon = 2.0 / 255.0  # a really small perturbation
demo_lr = 1e-2
#收敛次数
demo_steps = 100
#在数据集中的目标类的标签号,即长臂猿的标签号
demo_target = 368  # "长臂猿gibbon"

# 初始化
# sess.run(assign_op, feed_dict={x: img})
sess.run(assign_op, feed_dict={x: img})#使用img替换掉x的输出结果,所以打印出来img的结果

# projected gradient descent
for i in range(demo_steps):
    # 梯度下降
    _, loss_value = sess.run(
        [optim_step, loss],
        feed_dict={learning_rate: demo_lr, y_hat: demo_target})
    # project step
    sess.run(project_step, feed_dict={x: img, epsilon: demo_epsilon})
    if (i + 1) % 10 == 0:
        print('step %d, loss=%g' % (i + 1, loss_value))

adv = x_hat.eval()  # retrieve the adversarial example
# import cv2
# adv=cv2.resize(adv,(800,800))#大图为200*200
# img_r=800-adv.shape[0]#第0个维度填充到200需要的像素点个数
# img_b=800-adv.shape[1]#第1个维度填充到200需要的像素点个数
# img_pad=np.pad(adv,((0,img_r),(0,img_b),(0,0)),'constant', constant_values=0)

mp.imsave('./picture/test_adv.jpg', adv)
classify(adv)

我把利用cifar10模型进行图片生成的代码也贴在这里,如果需要,自行取用,这一段最重要的,是用到了IBM的art工具包,有调用那几个算法的方法。

from os.path import abspath
import sys
import os
import tensorflow as tf

sys.path.append(abspath('.'))

import keras
import numpy as np
import pickle

import matplotlib.pyplot as plt
plt.show()

from keras.datasets import cifar10
from keras.models import load_model
from keras.utils import to_categorical
from imageio import imread
from PIL import Image

from art.classifiers import KerasClassifier
from art.attacks.evasion import FastGradientMethod
from art.attacks.evasion import BasicIterativeMethod
from art.attacks.evasion import SaliencyMapMethod
from art.attacks.evasion import DeepFool
#输入图片的路径
input_dir = "./picture"
#输出图片的路径
output_dir = "./out"

#生成数组的宽度维度
image_width = 32
#生成数组的高度维度
image_height = 32
#批量训练样本的数量
batch_size = 10

#设置数组
#(样本数,行或称为高,列或称为宽,通道数)
batch_shape = [batch_size, image_height, image_width, 3]

#加载图片
def load_images(input_dir, batch_shape,Model):
    #全填0
    images = np.zeros(batch_shape)

    filenames = []
    idx = 0
    batch_size = batch_shape[0]
    for filepath in sorted(tf.gfile.Glob(os.path.join(input_dir, '*.png'))):
        with tf.gfile.Open(filepath, "rb") as f:
            #归一化处理,两种方法,一种是除以255,值在[0,1]之间,一种是除以127.5-1,值在[-1,1]之间
            images[idx, :, :, :] = imread(f, pilmode='RGB').astype('float32')/255.0
            # images[idx, :, :, :] = imread(f, pilmode='RGB').astype(np.float) * 2.0 / 255.0 - 1.0
        filenames.append(os.path.basename(filepath))
        idx += 1
        if idx == batch_size:
            yield filenames, images,idx
            filenames = []
            images = np.zeros(batch_shape)
            idx = 0
    if idx > 0:
        yield filenames, images,idx

#输出图片
def save_images(images, filenames, output_dir):
    for i, filename in enumerate(filenames):
        # Images for inception classifier are normalized to be in [-1, 1] interval,
        # so rescale them back to [0, 1].
        with tf.gfile.Open(os.path.join(output_dir, filename), 'w') as f:
            img = (images[i, :, :, :] * 255.0).astype(np.uint8)
            # img = (((images[i, :, :, :] + 1.0) * 0.5) * 255.0).astype(np.uint8)
            Image.fromarray(img).save(f, format='png')

#GPU内存不足,对GPU进行按需分配
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

#加载模型
my_model = load_model('./saved_models/cifar10_ResNet20v1_model.h5')

# 加载需要处理的图片
image_iterator = load_images(input_dir, batch_shape,my_model)

# 得到第一个batch的图片
filenames, images ,idx= next(image_iterator)

#根据模型,生成分类器
classifier = KerasClassifier( model=my_model)

# Craft adversarial samples with FGSM,加扰动
epsilon = 0.03  # Maximum perturbation
adv_fgsm_crafter = FastGradientMethod(classifier)
x_test_adv_fgsm = adv_fgsm_crafter.generate(x=images, eps=epsilon)

# Craft adversarial samples with IGSM
# epsilon = 0.015  # Maximum perturbation
# stepsize = 0.005
# adv_igsm_crafter = BasicIterativeMethod(classifier, eps=epsilon, eps_step=stepsize)
# x_test_adv_igsm = adv_igsm_crafter.generate(x=images)


#显示干扰前后的图片
fig = plt.figure(figsize=(idx, 2))
columns = idx
rows = 2
for i in range(0, idx):
    img = images[i].reshape(32, 32, 3)
    fig.add_subplot(rows, columns, i+1)
    plt.imshow(img)
    plt.axis('off')
    #判断类别
    y_pred = my_model.predict(images[i].reshape(1, 32, 32, 3))
    print(np.argmax(y_pred), end=' ')

for i in range(0, idx):
    img_adv = x_test_adv_fgsm[i].reshape(32, 32, 3)
    fig.add_subplot(rows, columns, i+idx+1)
    plt.imshow(img_adv)
    plt.axis('off')
    # 判断类别
    y_pred = my_model.predict(x_test_adv_fgsm[i].reshape(1, 32, 32, 3))
    print(np.argmax(y_pred), end=' ')

plt.show()

#保存为图片
save_images(x_test_adv_fgsm, filenames, output_dir)

(三)还图片至原像素大小

如果你认真读了图片加扰动的过程,你会发现,不管是哪种方法,第一步都是先把你输入的图片的像素进行resize,即调整了大小,至指定的大小,因此,图片处理完以后,像素也就变小了,cifar10更是只剩下了3232的像素,inception_v3还有299299的像素,但是也不够哈,如果我输入的原图像是800*800的,我可能就得尝试把原图像的像素进行扩充了,当然不能直接填充像素,那样图像会很糟糕,这里我尝试了几种方法,最邻近插值法、双线性插值法、双三次插值法,可以自行看效果决定。

from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
import math

#最邻近插值法
def NN_interpolation(img, dstH, dstW):
    scrH, scrW, _ = img.shape
    retimg = np.zeros((dstH, dstW, 3), dtype=np.uint8)
    for i in range(dstH):
        for j in range(dstW):
            scrx = round((i + 1) * (scrH / dstH))
            scry = round((j + 1) * (scrW / dstW))
            retimg[i, j] = img[scrx - 1, scry - 1]
    return retimg

#双线性插值法
def BiLinear_interpolation(img, dstH, dstW):
    scrH, scrW, _ = img.shape
    img = np.pad(img, ((0, 1), (0, 1), (0, 0)), 'constant')
    retimg = np.zeros((dstH, dstW, 3), dtype=np.uint8)
    for i in range(dstH):
        for j in range(dstW):
            scrx = (i + 1) * (scrH / dstH) - 1
            scry = (j + 1) * (scrW / dstW) - 1
            x = math.floor(scrx)
            y = math.floor(scry)
            u = scrx - x
            v = scry - y
            retimg[i, j] = (1 - u) * (1 - v) * img[x, y] + u * (1 - v) * img[x + 1, y] + (1 - u) * v * img[
                x, y + 1] + u * v * img[x + 1, y + 1]
    return retimg


def BiBubic(x):
    x = abs(x)
    if x <= 1:
        return 1 - 2 * (x ** 2) + (x ** 3)
    elif x < 2:
        return 4 - 8 * x + 5 * (x ** 2) - (x ** 3)
    else:
        return 0

#双三次插值法
def BiCubic_interpolation(img, dstH, dstW):
    scrH, scrW, _ = img.shape
    # img=np.pad(img,((1,3),(1,3),(0,0)),'constant')
    retimg = np.zeros((dstH, dstW, 3), dtype=np.uint8)
    for i in range(dstH):
        for j in range(dstW):
            scrx = i * (scrH / dstH)
            scry = j * (scrW / dstW)
            x = math.floor(scrx)
            y = math.floor(scry)
            u = scrx - x
            v = scry - y
            tmp = 0
            for ii in range(-1, 2):
                for jj in range(-1, 2):
                    if x + ii < 0 or y + jj < 0 or x + ii >= scrH or y + jj >= scrW:
                        continue
                    tmp += img[x + ii, y + jj] * BiBubic(ii - u) * BiBubic(jj - v)
            retimg[i, j] = np.clip(tmp, 0, 255)
    return retimg


im_path = './picture/test_adv.jpg'
image = np.array(Image.open(im_path))

image1 = NN_interpolation(image, 800, 800)
image1 = Image.fromarray(image1.astype('uint8')).convert('RGB')
image1.save('./picture/test_NN.jpg')

image2 = BiLinear_interpolation(image, 800, 800)
image2 = Image.fromarray(image2.astype('uint8')).convert('RGB')
image2.save('./picture/test_BiLinear.jpg')

image3 = BiCubic_interpolation(image, 800, 800)
image3 = Image.fromarray(image3.astype('uint8')).convert('RGB')
image3.save('./picture/test_BiCubic.jpg')

(四)进一步恢复目视效果(可能会适得其反)

其实这一步,包括上一步,我已经心里没底了,这样的做法的后果,会不会导致之前的加扰动就失效了?不好说,我没法验证。
因此,我先声明,我写这篇文章,只是记录,不保证方法的完全正确,只是给大家留个参考。
我进一步恢复目视效果的方法,就是图像叠加。如果你信不过这一步,或者这一步确实导致图片加扰动失效了,您可以只做到第三步。话不多说,上代码。

from PIL import Image
import math
import matplotlib.pyplot as plt

img = Image.open('./picture/test.jpg')#图片1
img_NN = Image.open('./picture/test_BiLinear.jpg')#图片2
img_BiLinear = Image.open('./picture/test_BiLinear.jpg')#图片2
img_BiCubic = Image.open('./picture/test_BiCubic.jpg')#图片2

#该函数的作用是由于 Image.blend()函数只能对像素大小一样的图片进行重叠,故需要对图片进行剪切。
def cut_img(img, x, y):
    """
    函数功能:进行图片裁剪(从中心点出发)
    :param img: 要裁剪的图片
    :param x: 需要裁剪的宽度
    :param y: 需要裁剪的高
    :return: 返回裁剪后的图片
    """
    x_center = img.size[0] / 2
    y_center = img.size[1] / 2
    new_x1 = x_center - x//2
    new_y1 = y_center - y//2
    new_x2 = x_center + x//2
    new_y2 = y_center + y//2
    new_img = img.crop((new_x1, new_y1, new_x2, new_y2))
    return new_img


#print(img1.size, img2.size)

# #取两张图片中最小的图片的像素
# new_x = min(img1.size, img_NN.size)[0]
# new_y = min(img1.size, img_NN.size)[1]
#
# new_img1 = cut_img(img1, new_x, new_y)
# new_img2 = cut_img(img_NN, new_x, new_y)
#print(new_img1.size, new_img2.size)

#进行图片重叠  最后一个参数是图片的权值
final_img_NN = Image.blend(img, img_NN, (math.sqrt(5)-1)/2)
final_img_BiLinear = Image.blend(img, img_BiLinear, (math.sqrt(5)-1)/2)
final_img_BiCubic = Image.blend(img, img_BiCubic, (math.sqrt(5)-1)/2)
#别问我为什么是  (math.sqrt(5)-1)/2   这个是黄金比例,哈哈!!

final_img_NN.save('./picture/test_NN_blend.jpg')
final_img_BiLinear.save('./picture/test_BiLinear_blend.jpg')
final_img_BiCubic.save('./picture/test_BiCubic_blend.jpg')

# final_img_NN.show()




# fig = plt.figure(figsize=(4, 1))
# columns = 4
# rows = 1
# fig.add_subplot(rows, columns, 1)
# plt.imshow(img)
# plt.axis('off')
#
# fig.add_subplot(rows, columns, 2)
# plt.imshow(final_img_NN)
# plt.axis('off')
#
# fig.add_subplot(rows, columns, 3)
# plt.imshow(final_img_BiLinear)
# plt.axis('off')
#
# fig.add_subplot(rows, columns, 4)
# plt.imshow(final_img_BiCubic)
# plt.axis('off')

# plt.show()

三、总结

后面的代码不重要,重要的是第一部分的解释,希望给大家能够提个醒,当然,那些解释只能作为我的理解,不保证正确,欢迎批评指正。

四、附件下载

附件:
cifar-10-batches-py.tar.gz下载链接
本文的代码链接(包含训练好的两个模型)

  • 9
    点赞
  • 63
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 16
    评论
生成对抗网络(GAN)是一种用于生成新样本的机器学习模型。在语音样本增强中,GAN可以用于生成更真实、更清晰的语音样本,以提高语音信号的质量。 GAN通常由两个主要组件组成:生成器(Generator)和判别器(Discriminator)。生成器负责生成与原始语音样本相似的新样本,而判别器则负责判断生成的样本是否真实。 下面是一种使用GAN进行语音样本增强的基本步骤: 1. 数据准备:收集并准备原始语音样本数据集,包括干净的语音样本和噪声数据。可以使用公开的语音数据库或人工录制的数据。 2. 噪声模型训练:使用收集到的噪声数据训练一个噪声模型,例如高斯噪声模型或其他类型的噪声模型。这个噪声模型将用于在生成器中添噪声。 3. 生成器训练:使用干净的语音样本作为输入,在生成器中入噪声模型生成新的语音样本。这些生成的样本与原始样本尽可能相似,但在质量上更好。 4. 判别器训练:将生成生成的样本与真实的干净语音样本混合在一起,训练一个判别器模型来区分生成的样本和真实样本。判别器的目标是尽可能准确地识别出哪些样本是生成的。 5. 对抗训练:在生成器和判别器之间进行对抗性训练。生成器试图生成更真实的语音样本,以欺骗判别器;而判别器则试图更准确地区分生成的样本和真实样本。 6. 评估和优化:使用一些评估指标(如信噪比、语音质量等)来评估生成的语音样本的质量。根据评估结果对生成器和判别器进行优化和调整。 通过不断迭代训练,生成器可以生成真实、清晰的语音样本,以实现语音样本增强的目的。需要指出的是,GAN在语音样本增强中仍然是一个活跃的研究领域,具体的实现方法和技术细节可能因应用场景而有所不同。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论 16
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

鱼月半

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值