在MNIST数据集上建立一个堆叠的地方,其中自动编码器建立在残余块上。它举例说明了在过去几年中发展的两种有影响力的方法。
SWWAE(stacked what-where auto-encoders),更准确的说是一种 convolutional autoencoder,因为在CNN中的pooling处才有 “what-where。
代码注释
'''Trains a stacked what-where autoencoder built on residual blocks on the
MNIST dataset. It exemplifies two influential methods that have been developed
in the past few years.
在MNIST数据集上建立一个堆叠的地方,其中自动编码器建立在残余块上。它举例说明了在过去几年中发展的两种有影响力的方法。
SWWAE(stacked what-where auto-encoders),更准确的说是一种 convolutional autoencoder,
因为在CNN中的pooling处才有 “what-where。
The first is the idea of properly 'unpooling.' During any max pool, the
exact location (the 'where') of the maximal value in a pooled receptive field
is lost, however it can be very useful in the overall reconstruction of an
input image. Therefore, if the 'where' is handed from the encoder
to the corresponding decoder layer, features being decoded can be 'placed' in
the right location, allowing for reconstructions of much higher fidelity.
第一个是适当的“unpooling(解体)”的想法。在任何最大池中,池接收域中的最大值的确切位置(“哪里”)丢失,但是它在输入
图像的整体重建中非常有用。因此,如果“从何处”从编码器传递到相应的解码层,被解码的特征可以被“放置”在正确的
位置,从而允许高得多的保真度的重建。
# References
参考
- Visualizing and Understanding Convolutional Networks
可视化和理解卷积网络
Matthew D Zeiler, Rob Fergus
https://arxiv.org/abs/1311.2901v3
- Stacked What-Where Auto-encoders
堆叠自动编码器
Junbo Zhao, Michael Mathieu, Ross Goroshin, Yann LeCun
https://arxiv.org/abs/1506.02351v8
The second idea exploited here is that of residual learning. Residual blocks
ease the training process by allowing skip connections that give the network
the ability to be as linear (or non-linear) as the data sees fit. This allows
for much deep networks to be easily trained. The residual element seems to
be advantageous in the context of this example as it allows a nice symmetry
between the encoder and decoder. Normally, in the decoder, the final
projection to the space where the image is reconstructed is linear, however
this does not have to be the case for a residual block as the degree to which
its output is linear or non-linear is determined by the data it is fed.
However, in order to cap the reconstruction in this example, a hard softmax is
applied as a bias because we know the MNIST digits are mapped to [0, 1].
这里使用的第二个想法是残差学习。残差块通过允许跳过连接来简化训练过程,使网络具有与数据相适应的线性(或非线性)的能力。
这允许很深的网络易于训练。残余元素在本例的上下文中似乎是有利的,因为它允许编码器和解码器之间的良好对称性。通常,在解码
器中,对重建图像的空间的最终投影是线性的,但是对于残差块而言,这不是必须的程度。它的输出是线性的或非线性的是由它馈送的
数据决定的。然而,为了在本示例中覆盖重构,softmax应用为偏置,因为我们知道MNIST数字被映射到[0, 1]。
# References
参考
- Deep Residual Learning for Image Recognition
用于图像识别的深度残差学习
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
https://arxiv.org/abs/1512.03385v1
- Identity Mappings in Deep Residual Networks
深度残差网络中的恒等映射
Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun
https://arxiv.org/abs/1603.05027v3
'''
from __future__ import print_function
import numpy as np
from keras.datasets import mnist
from keras.models import Model
from keras.layers import Activation
from keras.layers import UpSampling2D, Conv2D, MaxPooling2D
from keras.layers import Input, BatchNormalization, ELU
import matplotlib.pyplot as plt
import keras.backend as K
from keras import layers
def convresblock(x, nfeats=8, ksize=3, nskipped=2, elu=True):
"""The proposed residual block from [4].
推荐的残差块[4]。
Running with elu=True will use ELU nonlinearity and running with
elu=False will use BatchNorm + RELU nonlinearity. While ELU's are fast
due to the fact they do not suffer from BatchNorm overhead, they may
overfit because they do not offer the stochastic element of the batch
formation process of BatchNorm, which acts as a good regularizer.
与elu=True的运行将使用ELU非线性和运行与elu=False将使用BatchNorm + RELU非线性。虽然ELU的速度很快,
因为它们不受BatchNorm开销的影响,但它们可能会过拟合,因为它们不提供BatchNorm的批量形成过程的
随机元素,这是一个很好的正则化器。
# Arguments
参数
x: 4D tensor, the tensor to feed through the block
x: 4D张量,通过块连接张量
nfeats: Integer, number of feature maps for conv layers.
nfeats: Integer, 卷积层层的特征映射数。
ksize: Integer, width and height of conv kernels in first convolution.
ksize: Integer, 第一卷积中的卷积核的宽度和高度。
nskipped: Integer, number of conv layers for the residual function.
nskipped: Integer, 残差函数的卷积层数。
elu: Boolean, whether to use ELU or BN+RELU.
elu: Boolean, 是否使用ELU或BN+RELU.。
ReLu(Rectified Linear Units)激活函数
BN(Batch normalization),批次归一化
# Input shape
输入形状
4D tensor with shape:
`(batch, channels, rows, cols)`
4D张量形状:(批次,通道,行数,列数)
# Output shape
输出形状
4D tensor with shape:
`(batch, filters, rows, cols)`
4D张量形状:(批次,过滤器,行数,列数)
"""
y0 = Conv2D(nfeats, ksize, padding='same')(x)
y = y0
for i in range(nskipped):
if elu:
y = ELU()(y)
else:
y = BatchNormalization(axis=1)(y)
y = Activation('relu')(y)
y = Conv2D(nfeats, 1, padding='same')(y)
return layers.add([y0, y])
def getwhere(x):
''' Calculate the 'where' mask that contains switches indicating which
index contained the max value when MaxPool2D was applied. Using the
gradient of the sum is a nice trick to keep everything high level.
计算“where”掩码,它包含标示MaxPool2D应用时包含最大值开关的索引。使用和梯度是一个很好的技巧,以保持一切高水平。
'''
y_prepool, y_postpool = x
return K.gradients(K.sum(y_postpool), y_prepool)
if K.backend() == 'tensorflow':
raise RuntimeError('This example can only run with the '
'Theano backend for the time being, '
'because it requires taking the gradient '
'of a gradient, which isn\'t '
'supported for all TensorFlow ops.')
# This example assume 'channels_first' data format.
# 实例较少使用“channels_first”数据格式
K.set_image_data_format('channels_first')
# input image dimensions
# 输入图像维度
img_rows, img_cols = 28, 28
# the data, shuffled and split between train and test sets
# 用于训练和测试的数据集,经过了筛选(清洗、数据样本顺序打乱)和划分(划分为训练和测试集)
(x_train, _), (x_test, _) = mnist.load_data()
x_train = x_train.reshape(x_train.shape[0], 1, img_rows, img_cols)
x_test = x_test.reshape(x_test.shape[0], 1, img_rows, img_cols)
x_train = x_train.astype('float32')
x_test = x_test.astype('float32')
x_train /= 255
x_test /= 255
print('x_train shape:', x_train.shape)
print(x_train.shape[0], 'train samples')
print(x_test.shape[0], 'test samples')
# The size of the kernel used for the MaxPooling2D
# 用于MaxPooling2D内核大小
pool_size = 2
# The total number of feature maps at each layer
# 每个层的特征映射总数
nfeats = [8, 16, 32, 64, 128]
# The sizes of the pooling kernel at each layer
# 每个层的池化内核的大小
pool_sizes = np.array([1, 1, 1, 1, 1]) * pool_size
# The convolution kernel size
# 卷积核大小
ksize = 3
# Number of epochs to train for
# 训练周期数
epochs = 5
# Batch size during training
# 训练时的批次大小(样本数)
batch_size = 128
if pool_size == 2:
# if using a 5 layer net of pool_size = 2
# pool_size = 2 的5层网络
x_train = np.pad(x_train, [[0, 0], [0, 0], [2, 2], [2, 2]],
mode='constant')
x_test = np.pad(x_test, [[0, 0], [0, 0], [2, 2], [2, 2]], mode='constant')
nlayers = 5
elif pool_size == 3:
# if using a 3 layer net of pool_size = 3
# pool_size = 3 的3层网络
x_train = x_train[:, :, :-1, :-1]
x_test = x_test[:, :, :-1, :-1]
nlayers = 3
else:
import sys
sys.exit('Script supports pool_size of 2 and 3.')
# Shape of input to train on (note that model is fully convolutional however)
# 训练时输入的形状(注意模型是全卷积)
input_shape = x_train.shape[1:]
# The final list of the size of for all layers, including input
# 所有层的轴axis=1尺寸的最终列表,包括输入
nfeats_all = [input_shape[0]] + nfeats
# First build the encoder, all the while keeping track of the 'where' masks
# 首先,构建编码器,同时跟踪“where”掩码。
img_input = Input(shape=input_shape)
# We push the 'where' masks to the following list
# “where”掩码推到下面的列表中
wheres = [None] * nlayers
y = img_input
for i in range(nlayers):
y_prepool = convresblock(y, nfeats=nfeats_all[i + 1], ksize=ksize)
y = MaxPooling2D(pool_size=(pool_sizes[i], pool_sizes[i]))(y_prepool)
wheres[i] = layers.Lambda(
getwhere, output_shape=lambda x: x[0])([y_prepool, y])
# Now build the decoder, and use the stored 'where' masks to place the features
# 现在构建解码器,并使用存储的“where”掩码来放置特征。
for i in range(nlayers):
ind = nlayers - 1 - i
y = UpSampling2D(size=(pool_sizes[ind], pool_sizes[ind]))(y)
y = layers.multiply([y, wheres[ind]])
y = convresblock(y, nfeats=nfeats_all[ind], ksize=ksize)
# Use hard_simgoid to clip range of reconstruction
# 用hard_simgoid剪辑重建范围
y = Activation('hard_sigmoid')(y)
# Define the model and it's mean square error loss, and compile it with Adam
# 定义模型及其均方误差损失,并用Adam编译
model = Model(img_input, y)
model.compile('adam', 'mse')
# Fit the model
# 拟合模型
model.fit(x_train, x_train,
batch_size=batch_size,
epochs=epochs,
validation_data=(x_test, x_test))
# Plot
# 制图
x_recon = model.predict(x_test[:25])
x_plot = np.concatenate((x_test[:25], x_recon), axis=1)
x_plot = x_plot.reshape((5, 10, input_shape[-2], input_shape[-1]))
x_plot = np.vstack([np.hstack(x) for x in x_plot])
plt.figure()
plt.axis('off')
plt.title('Test Samples: Originals/Reconstructions')
plt.imshow(x_plot, interpolation='none', cmap='gray')
plt.savefig('reconstructions.png')
代码执行
Keras详细介绍
中文:http://keras-cn.readthedocs.io/en/latest/
实例下载
https://github.com/keras-team/keras
https://github.com/keras-team/keras/tree/master/examples
完整项目下载
方便没积分童鞋,请加企鹅452205574,共享文件夹。
包括:代码、数据集合(图片)、已生成model、安装库文件等。