在上一篇博文中,我们讨论了去噪自动编码机(dA),并讨论了Theano框架实现的细节。在本节中,我们将讨论去噪自动编码机(dA)的主要应用,即组成堆叠自动编码机(SdA),我们将以MNIST手写字母识别为例,用堆叠自动编码机(SdA)来解决这一问题。
堆叠自动编码机(SdA)是由一系列去噪自动编码机堆叠而成,每个去噪自动编码机的中间层(即编码层)作为下一层的输入层,这样一层一层堆叠起来,构成一个深层网络,这些网络组成堆叠去噪自动编码机(SdA)的表示部分。这部分通过无监督学习,逐层进行培训,每一层均可以还原加入随机噪音后的输入信号,而此时在每个去噪自动编码机(dA)中间层即编码层的输出信号,可以视为原始输入信号的某种表示,是对原始输入信号的某种简化表示。
当将所有去噪自动编机(dA)堆叠形成的网络训练完成之后,再把最后一层的中间层即编码接入逻辑回归网络,作为其输入层,这样就形成了一个新的多层BP网络,隐藏层之间的权值,就是前面利用去噪自动编码机(dA)逐层训练时所得到的权值矩阵。然后将这个网络视为一个标准的BP网络,利用我们原来的BP网络算法,进行监督学习,最后达到我们希望的状态。
可能读者会有疑问,为什么直接就用多层BP网络呢?这样先逐层训练去噪自动编码机(SdA),然后再组成BP网络,进行监督学习,好像很麻烦呀。其实BP网络诞生之初,就有人基于这个做具有多个隐藏层的深度网络了。但是人们很快就发现,基于误差反向传播的BP网各,利用随机梯度下降算法来调整权值,但是随着层数的加深,离输出层越远的隐藏层,其权值调整量将递减,最后导致这种深度网络学习速度非常慢,直接限制了其的使用,因此在深度学习崛起之前,深层网络基本没有实际成功的应用案例。
从我们的堆叠自动编码机(SdA)来看,我们首先通过逐层非监督学习方式训练独立的去噪自动编码机,可以视为神经网络自动发现问题域的特征的过程,通过自动特征提取,来找到解决问题的最优特征。而去噪自动编码机(SdA)的训练,可以视为已经对多层BP网络进行了初步训练,最后的监督学习是对网络权值的微调优化。这样可以较好的解决深度BP网各学习收敛速度慢的问题,使其具有实用价值。
首先定义堆叠去噪自动编码机(SdA)类,代码如下所示:
- from __future__ import print_function
-
- import os
- import sys
- import timeit
-
- import numpy
-
- import theano
- import theano.tensor as T
- from theano.tensor.shared_randomstreams import RandomStreams
-
- from logistic_regression import LogisticRegression
- from hidden_layer import HiddenLayer
- from denosing_autoencoder import DenosingAutoencoder
-
- class SdA(object):
- def __init__(
- self,
- numpy_rng,
- theano_rng=None,
- n_ins=784,
- hidden_layers_sizes=[500, 500],
- n_outs=10,
- corruption_levels=[0.1, 0.1]
- ):
- self.sigmoid_layers = []
- self.dA_layers = []
- self.params = []
- self.n_layers = len(hidden_layers_sizes)
-
- assert self.n_layers > 0
-
- if not theano_rng:
- theano_rng = RandomStreams(numpy_rng.randint(2 ** 30))
- self.x = T.matrix('x')
- self.y = T.ivector('y')
- for i in range(self.n_layers):
- if i == 0:
- input_size = n_ins
- else:
- input_size = hidden_layers_sizes[i - 1]
- if i == 0:
- layer_input = self.x
- else:
- layer_input = self.sigmoid_layers[-1].output
-
- sigmoid_layer = HiddenLayer(rng=numpy_rng,
- input=layer_input,
- n_in=input_size,
- n_out=hidden_layers_sizes[i],
- activation=T.nnet.sigmoid)
- self.sigmoid_layers.append(sigmoid_layer)
- self.params.extend(sigmoid_layer.params)
- dA_layer = DenosingAutoencoder(numpy_rng=numpy_rng,
- theano_rng=theano_rng,
- input=layer_input,
- n_visible=input_size,
- n_hidden=hidden_layers_sizes[i],
- W=sigmoid_layer.W,
- bhid=sigmoid_layer.b)
- self.dA_layers.append(dA_layer)
- self.logLayer = LogisticRegression(
- input=self.sigmoid_layers[-1].output,
- n_in=hidden_layers_sizes[-1],
- n_out=n_outs
- )
- self.params.extend(self.logLayer.params)
- self.finetune_cost = self.logLayer.negative_log_likelihood(self.y)
- self.errors = self.logLayer.errors(self.y)
-
- def pretraining_functions(self, train_set_x, batch_size):
- index = T.lscalar('index')
- corruption_level = T.scalar('corruption')
- learning_rate = T.scalar('lr')
- batch_begin = index * batch_size
- batch_end = batch_begin + batch_size
- pretrain_fns = []
- for dA in self.dA_layers:
- cost, updates = dA.get_cost_updates(corruption_level,
- learning_rate)
- fn = theano.function(
- inputs=[
- index,
- theano.In(corruption_level, value=0.2),
- theano.In(learning_rate, value=0.1)
- ],
- outputs=cost,
- updates=updates,
- givens={
- self.x: train_set_x[batch_begin: batch_end]
- }
- )
- pretrain_fns.append(fn)
- return pretrain_fns
-
- def build_finetune_functions(self, datasets, batch_size, learning_rate):
- (train_set_x, train_set_y) = datasets[0]
- (valid_set_x, valid_set_y) = datasets[1]
- (test_set_x, test_set_y) = datasets[2]
- n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
- n_valid_batches //= batch_size
- n_test_batches = test_set_x.get_value(borrow=True).shape[0]
- n_test_batches //= batch_size
- index = T.lscalar('index')
- gparams = T.grad(self.finetune_cost, self.params)
- updates = [
- (param, param - gparam * learning_rate)
- for param, gparam in zip(self.params, gparams)
- ]
- train_fn = theano.function(
- inputs=[index],
- outputs=self.finetune_cost,
- updates=updates,
- givens={
- self.x: train_set_x[
- index * batch_size: (index + 1) * batch_size
- ],
- self.y: train_set_y[
- index * batch_size: (index + 1) * batch_size
- ]
- },
- name='train'
- )
- test_score_i = theano.function(
- [index],
- self.errors,
- givens={
- self.x: test_set_x[
- index * batch_size: (index + 1) * batch_size
- ],
- self.y: test_set_y[
- index * batch_size: (index + 1) * batch_size
- ]
- },
- name='test'
- )
- valid_score_i = theano.function(
- [index],
- self.errors,
- givens={
- self.x: valid_set_x[
- index * batch_size: (index + 1) * batch_size
- ],
- self.y: valid_set_y[
- index * batch_size: (index + 1) * batch_size
- ]
- },
- name='valid'
- )
- def valid_score():
- return [valid_score_i(i) for i in range(n_valid_batches)]
- def test_score():
- return [test_score_i(i) for i in range(n_test_batches)]
- return train_fn, valid_score, test_score
在构造函数中,n_ins为输入信号维数,hidden_layer_sizes是一个列表,其中每个元素代表一个隐藏层的神经元数量,可以定义多层,例如在上例中,缺省情况下即为两层,n_outs为输出神经元个数,由于是手写数字识别,因此该值为10,corruption_levels是去噪自动编码机(dA)随机噪音级别,上例中分别为10%的随机噪音。
在构造网络过程中,首先建立BP网络的隐藏层,然后权值和Bias与去噪自动编码机(dA)共享,按照缺省参数,会组成一个输入层有584个神经元,第一隐藏层500个神经元,第二个隐藏层500个神经元,输出层为10个神经元,代码中循环部分具体操作如下所示:
i=0时:
input_size = 584, layer_input = x即为原始输入信号
BP隐藏层定义:input=x(原始输入信号)n_in=584(28*28),n_out=hidden_layer_sizes[0]=500,激活函数为Sigmoid函数
dA定义:input=原始输入信号,n_visible=584, n_hidden=hidden_layer_sizes[0]=500,权值与上面定义的隐藏层共享,Bias与上面定义的隐藏层共享
i=1时:
input_size=500
layer_input=上一层输出
BP隐藏层:input=上一层输出,n_in=500,n_out=hidden_layer_sizes[1]=500,激活函数为Sigmoid函数
dA定义:input=上一层输出,n_visible=500,n_hidden=hidden_layer_sizes[0]=500,权值与上面定义的隐藏层共享,Bias与上面定义的隐藏层共享
至此循环结束,接着定义最后的逻辑回归层:输入层为上面最后一层的输出,输入层节点数为500,输出层节点数为10。
当创建好网络结构之后,SdA类定义了两阶段的训练方法,pretraining_functions用于逐层训练去噪自动编码机(dA),而build_finetune_functions则用于训练BP网络,由于上面的代码与DenosingAutoencoder和MLP类相类似,这里就不再重复介绍了。
下面定义SdAEngine类,用于完成具体的模型训练工作,代码如下所示:
- from __future__ import print_function
-
- import os
- import sys
- import timeit
-
- import numpy
-
- import theano
- import theano.tensor as T
- from theano.tensor.shared_randomstreams import RandomStreams
-
- from mnist_loader import MnistLoader
- from mlp import HiddenLayer
- from sda import SdA
-
-
- class SdAEngine(object):
- def __init__(self):
- print('create SdAEngine')
-
- def train(finetune_lr=0.1, pretraining_epochs=15,
- pretrain_lr=0.001, training_epochs=1000,
- dataset='mnist.pkl.gz', batch_size=1):
- loader = MnistLoader()
- datasets = loader.load_data(dataset)
- train_set_x, train_set_y = datasets[0]
- valid_set_x, valid_set_y = datasets[1]
- test_set_x, test_set_y = datasets[2]
- n_train_batches = train_set_x.get_value(borrow=True).shape[0]
- n_train_batches //= batch_size
- numpy_rng = numpy.random.RandomState(89677)
- print('... building the model')
- sda = SdA(
- numpy_rng=numpy_rng,
- n_ins=28 * 28,
- hidden_layers_sizes=[1000, 1000, 1000],
- n_outs=10
- )
- print('... getting the pretraining functions')
- pretraining_fns = sda.pretraining_functions(train_set_x=train_set_x,
- batch_size=batch_size)
- print('... pre-training the model')
- start_time = timeit.default_timer()
- corruption_levels = [.1, .2, .3]
- for i in range(sda.n_layers):
- for epoch in range(pretraining_epochs):
- c = []
- for batch_index in range(n_train_batches):
- c.append(pretraining_fns[i](index=batch_index,
- corruption=corruption_levels[i],
- lr=pretrain_lr))
- print('Pre-training layer %i, epoch %d, cost %f' % (i, epoch, numpy.mean(c)))
- end_time = timeit.default_timer()
- print(('The pretraining code for file ' +
- os.path.split(__file__)[1] +
- ' ran for %.2fm' % ((end_time - start_time) / 60.)), file=sys.stderr)
- print('... getting the finetuning functions')
- train_fn, validate_model, test_model = sda.build_finetune_functions(
- datasets=datasets,
- batch_size=batch_size,
- learning_rate=finetune_lr
- )
- print('... finetunning the model')
- patience = 10 * n_train_batches
- patience_increase = 2.
-
- improvement_threshold = 0.995
-
- validation_frequency = min(n_train_batches, patience // 2)
- best_validation_loss = numpy.inf
- test_score = 0.
- start_time = timeit.default_timer()
- done_looping = False
- epoch = 0
- while (epoch < training_epochs) and (not done_looping):
- epoch = epoch + 1
- for minibatch_index in range(n_train_batches):
- minibatch_avg_cost = train_fn(minibatch_index)
- iter = (epoch - 1) * n_train_batches + minibatch_index
- if (iter + 1) % validation_frequency == 0:
- validation_losses = validate_model()
- this_validation_loss = numpy.mean(validation_losses)
- print('epoch %i, minibatch %i/%i, validation error %f %%' %
- (epoch, minibatch_index + 1, n_train_batches,
- this_validation_loss * 100.))
- if this_validation_loss < best_validation_loss:
- if (
- this_validation_loss < best_validation_loss *
- improvement_threshold
- ):
- patience = max(patience, iter * patience_increase)
- best_validation_loss = this_validation_loss
- best_iter = iter
- test_losses = test_model()
- test_score = numpy.mean(test_losses)
- print((' epoch %i, minibatch %i/%i, test error of '
- 'best model %f %%') %
- (epoch, minibatch_index + 1, n_train_batches,
- test_score * 100.))
- if patience <= iter:
- done_looping = True
- break
- end_time = timeit.default_timer()
- print(
- (
- 'Optimization complete with best validation score of %f %%, '
- 'on iteration %i, '
- 'with test performance %f %%'
- )
- % (best_validation_loss * 100., best_iter + 1, test_score * 100.)
- )
- print(('The training code for file ' +
- os.path.split(__file__)[1] +
- ' ran for %.2fm' % ((end_time - start_time) / 60.)), file=sys.stderr)
上面的代码基本上是DenosingAutoencoder和MLP训练算法的合成,没有太多可以介绍的部分。
将上面的代码,结合之间介绍的LogisticRegression、HIddenLayer、MnistLoader等类,就可以构成一个完整的堆叠自动编码机(SdA)了。下面是训练网络的代码:
- from sda_engine import SdAEngine
-
- if __name__ == '__main__':
- engine = SdAEngine()
- engine.train()
运行上述代码,在我的Mac笔记本上需要跑一个晚上,可以得到识别错误率为1%左右。
大家可以看到,堆叠去噪自动编码机(SdA)训练速度和识别精度方面,与之前介绍的卷积神经网络(CNN)相比,都会有些差距,这就说明不同的网络,适合不同的任务。图像识别领域,首选是卷积神经网络(CNN),而在图像搜索等领域,堆叠去噪自动编码机(SdA)的应用效果更佳。