# deep learning tutorial 翻译（theano学习指南4（翻译）- 卷积神经网络 ）

## 稀疏连接性

CNN通过增强相邻两层中神经元的局部的连接来发掘局部空间相关性. m层的隐输入单元和m-1层的一部分空间相邻,并具有连续可视野的神经元相连接. 它们的关系如下图所示:

## 具体细节

$$h^k_{ij} = tanh ( (W^k * x)_{ij} + b_k )$$

Figure 1: 卷积层实例 (这个图和下面的说明有点冲突,下面的特征权重表示成了$W^0$,$W^1$,图中是 $W^1$,$W^2$)

## ConvOp

Convop是Theano中实现卷积的函数, 它主要重复了scipy工具包中signal.convolve2d的函数功能. 总的来讲,ConvOp包含两个参数:

• 对应输入图像的mini-batch的4D张量. 其每个张量的大小为:[mini-batch的大小, 输入的特征图的数量, 图像的高度,图像的宽度]
• 对应权重矩阵$W$的4D张量,其每个张量的大小为:[m层的特征图的数量,m-1层的特征图的数量,过滤器的高度,过滤器的宽度].

from theano.tensor.nnet import conv
rng = numpy.random.RandomState(23455)

# instantiate 4D tensor for input
input = T.tensor4(name='input')

# initialize shared variable for weights.
w_shp = (2, 3, 9, 9)
w_bound = numpy.sqrt(3 * 9 * 9)
W = theano.shared( numpy.asarray(
rng.uniform(
low=-1.0 / w_bound,
high=1.0 / w_bound,
size=w_shp),
dtype=input.dtype), name ='W')

# initialize shared variable for bias (1D tensor) with random values
# IMPORTANT: biases are usually initialized to zero. However in this
# particular application, we simply apply the convolutional layer to
# an image without learning the parameters. We therefore initialize
# them to random values to "simulate" learning.
b_shp = (2,)
b = theano.shared(numpy.asarray(
rng.uniform(low=-.5, high=.5, size=b_shp),
dtype=input.dtype), name ='b')

# build symbolic expression that computes the convolution of input with filters in w
conv_out = conv.conv2d(input, W)

# build symbolic expression to add bias and apply activation function, i.e. produce neural net layer output
# A few words on dimshuffle :
#   dimshuffle is a powerful tool in reshaping a tensor;
#   what it allows you to do is to shuffle dimension around
#   but also to insert new ones along which the tensor will be
#   dimshuffle('x', 2, 'x', 0, 1)
#   This will work on 3d tensors with no broadcastable
#   dimensions. The first dimension will be broadcastable,
#   then we will have the third dimension of the input tensor as
#   the second of the resulting tensor, etc. If the tensor has
#   shape (20, 30, 40), the resulting tensor will have dimensions
#   (1, 40, 1, 20, 30). (AxBxC tensor is mapped to 1xCx1xAxB tensor)
#   More examples:
#    dimshuffle('x') -> make a 0d (scalar) into a 1d vector
#    dimshuffle(0, 1) -> identity
#    dimshuffle(1, 0) -> inverts the first and second dimensions
#    dimshuffle('x', 0) -> make a row out of a 1d vector (N to 1xN)
#    dimshuffle(0, 'x') -> make a column out of a 1d vector (N to Nx1)
#    dimshuffle(2, 0, 1) -> AxBxC to CxAxB
#    dimshuffle(0, 'x', 1) -> AxB to Ax1xB
#    dimshuffle(1, 'x', 0) -> AxB to Bx1xA
output = T.nnet.sigmoid(conv_out + b.dimshuffle('x', 0, 'x', 'x'))

# create theano function to compute filtered images
f = theano.function([input], output)

import pylab
from PIL import Image

# open random image of dimensions 639x516
img = Image.open(open('images/3wolfmoon.jpg'))
img = numpy.asarray(img, dtype='float64') / 256.

# put image in 4D tensor of shape (1, 3, height, width)
img_ = img.swapaxes(0, 2).swapaxes(1, 2).reshape(1, 3, 639, 516)
filtered_img = f(img_)

# plot original image and first and second components of output
pylab.subplot(1, 3, 1); pylab.axis('off'); pylab.imshow(img)
pylab.gray();
# recall that the convOp output (filtered image) is actually a "minibatch",
# of size 1 here, so we take index 0 in the first dimension:
pylab.subplot(1, 3, 2); pylab.axis('off'); pylab.imshow(filtered_img[0, 0, :, :])
pylab.subplot(1, 3, 3); pylab.axis('off'); pylab.imshow(filtered_img[0, 1, :, :])
pylab.show()


## 共用最大化

CNN的另外一个重要特征是共用最大化，这其实是一种非线性向下采样的方法。共用最大化把输入图像分割成不重叠的矩形，然后对于每个矩形区域，输出最大化的结果。

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 from theano.tensor.signal import downsample     input = T.dtensor4('input')  maxpool_shape = (2, 2)  pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=True)  f = theano.function([input],pool_out)     invals = numpy.random.RandomState(1).rand(3, 2, 5, 5)  print 'With ignore_border set to True:' print 'invals[0, 0, :, :] =\n', invals[0, 0, :, :]  print 'output[0, 0, :, :] =\n', f(invals)[0, 0, :, :]     pool_out = downsample.max_pool_2d(input, maxpool_shape, ignore_border=False)  f = theano.function([input],pool_out)  print 'With ignore_border set to False:' print 'invals[1, 0, :, :] =\n ', invals[1, 0, :, :]  print 'output[1, 0, :, :] =\n ', f(invals)[1, 0, :, :]

这段代码的输出为类似下面的内容：

With ignore_border set to True:
invals[0, 0, :, :] =
[[  4.17022005e-01   7.20324493e-01   1.14374817e-04   3.02332573e-01 1.46755891e-01]
[  9.23385948e-02   1.86260211e-01   3.45560727e-01   3.96767474e-01 5.38816734e-01]
[  4.19194514e-01   6.85219500e-01   2.04452250e-01   8.78117436e-01 2.73875932e-02]
[  6.70467510e-01   4.17304802e-01   5.58689828e-01   1.40386939e-01 1.98101489e-01]
[  8.00744569e-01   9.68261576e-01   3.13424178e-01   6.92322616e-01 8.76389152e-01]]
output[0, 0, :, :] =
[[ 0.72032449  0.39676747]
[ 0.6852195   0.87811744]]

With ignore_border set to False:
invals[1, 0, :, :] =
[[ 0.01936696  0.67883553  0.21162812  0.26554666  0.49157316]
[ 0.05336255  0.57411761  0.14672857  0.58930554  0.69975836]
[ 0.10233443  0.41405599  0.69440016  0.41417927  0.04995346]
[ 0.53589641  0.66379465  0.51488911  0.94459476  0.58655504]
[ 0.90340192  0.1374747   0.13927635  0.80739129  0.39767684]]
output[1, 0, :, :] =
[[ 0.67883553  0.58930554  0.69975836]
[ 0.66379465  0.94459476  0.58655504]
[ 0.90340192  0.80739129  0.39767684]]

## 综合所有

class LeNetConvPoolLayer(object):

def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
"""
Allocate a LeNetConvPoolLayer with shared variable internal parameters.

:type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights

:type input: theano.tensor.dtensor4
:param input: symbolic image tensor, of shape image_shape

:type filter_shape: tuple or list of length 4
:param filter_shape: (number of filters, num input feature maps,
filter height,filter width)

:type image_shape: tuple or list of length 4
:param image_shape: (batch size, num input feature maps,
image height, image width)

:type poolsize: tuple or list of length 2
:param poolsize: the downsampling (pooling) factor (#rows,#cols)
"""
assert image_shape[1] == filter_shape[1]
self.input = input

# initialize weight values: the fan-in of each hidden neuron is
# restricted by the size of the receptive fields.
fan_in =  numpy.prod(filter_shape[1:])
W_values = numpy.asarray(rng.uniform(
low=-numpy.sqrt(3./fan_in),
high=numpy.sqrt(3./fan_in),
size=filter_shape), dtype=theano.config.floatX)
self.W = theano.shared(value=W_values, name='W')

# the bias is a 1D tensor -- one bias per output feature map
b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
self.b = theano.shared(value=b_values, name='b')

# convolve input feature maps with filters
conv_out = conv.conv2d(input, self.W,
filter_shape=filter_shape, image_shape=image_shape)

# downsample each feature map individually, using maxpooling
pooled_out = downsample.max_pool_2d(conv_out, poolsize, ignore_border=True)

# add the bias term. Since the bias is a vector (1D array), we first
# reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will thus
# be broadcasted across mini-batches and feature map width & height
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))

# store parameters of this layer
self.params = [self.W, self.b]

class LeNetConvPoolLayer(object):

def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
"""
Allocate a LeNetConvPoolLayer with shared variable internal parameters.

:type rng: numpy.random.RandomState
:param rng: a random number generator used to initialize weights

:type input: theano.tensor.dtensor4
:param input: symbolic image tensor, of shape image_shape

:type filter_shape: tuple or list of length 4
:param filter_shape: (number of filters, num input feature maps,
filter height,filter width)

:type image_shape: tuple or list of length 4
:param image_shape: (batch size, num input feature maps,
image height, image width)

:type poolsize: tuple or list of length 2
:param poolsize: the downsampling (pooling) factor (#rows,#cols)
"""
assert image_shape[1] == filter_shape[1]
self.input = input

# initialize weight values: the fan-in of each hidden neuron is
# restricted by the size of the receptive fields.
fan_in =  numpy.prod(filter_shape[1:])
W_values = numpy.asarray(rng.uniform(
low=-numpy.sqrt(3./fan_in),
high=numpy.sqrt(3./fan_in),
size=filter_shape), dtype=theano.config.floatX)
self.W = theano.shared(value=W_values, name='W')

# the bias is a 1D tensor -- one bias per output feature map
b_values = numpy.zeros((filter_shape[0],), dtype=theano.config.floatX)
self.b = theano.shared(value=b_values, name='b')

# convolve input feature maps with filters
conv_out = conv.conv2d(input, self.W,
filter_shape=filter_shape, image_shape=image_shape)

# downsample each feature map individually, using maxpooling
pooled_out = downsample.max_pool_2d(conv_out, poolsize, ignore_border=True)

# add the bias term. Since the bias is a vector (1D array), we first
# reshape it to a tensor of shape (1, n_filters, 1, 1). Each bias will thus
# be broadcasted across mini-batches and feature map width & height
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))

# store parameters of this layer
self.params = [self.W, self.b]

## 运行算法

python code/convolutional_mlp.py

Optimization complete.
Best validation score of 0.910000 % obtained at iteration 17800,with test
performance 0.920000 %
The code for file convolutional_mlp.py ran for 380.28m

Optimization complete.
Best validation score of 0.910000 % obtained at iteration 15500,with test
performance 0.930000 %
The code for file convolutional_mlp.py ran for 46.76m