【theano-windows】学习笔记十——多层感知机手写数字分类

最新推荐文章于 2022-12-29 11:47:23 发布

风翼冰舟

最新推荐文章于 2022-12-29 11:47:23 发布

阅读量1k

点赞数

分类专栏： theano 文章标签： theano

theano 专栏收录该内容

22 篇文章 5 订阅

订阅专栏

前言

上一篇学习了softmax, 然后更进一步就是学习一下基本的多层感知机(MLP)了. 其实多层感知机同时就是w*x+b用某个激活函数激活一下, 得到的结果作为下一层神经元的输入x, 类似于

o u t p u t = \dots f 3 (f 2 (f 1 (x * w 1 + b 2) * w 2 + b 2) * w 3 + b 3) \dots

$output=\cdots f^3(f^2(f^1(x*w^1+b^2)*w^2+b^2)*w^3+b^3)\cdots$
如果用感知器分类, 那么通常的做法是在最后接一个 softmax, 如果是回归或者拟合, 这个额, 回头用到了再说. 如果以 sigmoid作为激活函数, 那么每层单元的计算方法就是

y i = ⎧ ⎩ ⎨ ⎪ ⎪ ⎪ ⎪ 1 1 + e - w i * x - b i 1 \leq i < n - 1 e w n - 1 j * y n - 1 + b j \sum t o t a l n e u r a l j = 1 e w n - 1 j * y n - 1 + b j

$\begin{aligned} y^i=\begin{cases} \frac{1}{1+e^{-w^i*x-b^i}} \quad 1\leq i <n-1 \\ \frac{e^{w^{n-1}_j*y^{n-1}+b_j}}{\sum_{j=1}^{totalneural} e^{w^{n-1}_j*y^{n-1}+b_j}} \end{cases} \end{aligned}$
国际惯例，参考网址:

Multilayer Perceptron

预备知识

超参数

这些参数无法通过梯度下降算法优化, 严格点说就是为这些参数寻找最优值是不可行问题, 我们无法单独对每个参数进行优化, 在这, 我们无法使用之前介绍的梯度方法(因为一些参数是离散值, 其它的是实值), 最后就是优化问题是非凸的,找到(局部)极小值可能需要费很大劲.(笔者注:说了这么多, 其实那些神经元个数啊, 学习率啊,诸如此类的都属于超参)

非线性函数

其实就是激活函数, 截止到目前, 已经出现过好多激活函数了, 详细可以去看caffe的官方文档都有哪些. 早期主要使用sigmoid和tanh, 其实它俩可以互相变换得到

1 - 2 * s i g m o i d (x) = tanh (x 2)

$1-2*sigmoid(x)=\tanh(\frac{x}{2})$
详细区别可以戳《在神经网络中，激活函数sigmoid和tanh除了阈值取值外有什么不同吗？》

权重初始化

一定不能把权重初始化为0, 因为全0的话所有的输出就一样了, 影响不同神经元上梯度的多样性. 初始化权重的时候, 我们希望能够让它尽量接近0, 这样梯度就在激活函数的接近线性区域的部分(比如sigmoid和tanh在原点附近很接近y=x), 这时候梯度是最大的. 还有就是尤其对于深度神经网络, 会保存激活的反差以及层与层之间的梯度, 这允许神经网络中上行和下行过程正常流动, 并且降低层与层之间的差异性. 我们一般会遵循一个称为fan-in and fan-out的准则, 具体论文Understanding the difficulty of training deep feedforward neuralnetworks, 就是权重从如下分布中均匀采样:

u n i f o r m [- 6 \sqrt f a n i n + f a n o u t - - - - - - - - - - - - \sqrt, 6 \sqrt f a n i n + f a n o u t - - - - - - - - - - - - \sqrt] f o r t a n h u n i f o r m [- 4 * 6 \sqrt f a n i n + f a n o u t - - - - - - - - - - - - \sqrt, 4 * 6 \sqrt f a n i n + f a n o u t - - - - - - - - - - - - \sqrt] f o r s i g m o i d

$uniform[-\frac{\sqrt 6}{\sqrt{fan_{in}+fan_{out}}},\frac{\sqrt 6}{\sqrt{fan_{in}+fan_{out}}}]\quad for \quad tanh\\ uniform[-4*\frac{\sqrt 6}{\sqrt{fan_{in}+fan_{out}}},4*\frac{\sqrt 6}{\sqrt{fan_{in}+fan_{out}}}]\quad for \quad sigmoid\\$
其中

fanin $fan_{in}$ 是输入神经元个数,

fanout $fan_{out}$ 是隐层单元个数

学习率

最简单的就是采用常量值, 尝试一些对数空间值 $(10^{-1},10^{-2},\cdots)$ , 逐渐缩小直到验证集误差最小

还有一个好方法是逐渐降低学习率.使用

μ 0 1 + d * t

$\frac{\mu_0}{1+d*t}$ 其中

μ0 $\mu_0$ 是初始学习率,

d $d$ 称为降低常量, 控制学习率的降低速度(经常是不大于

10−3 $10^{-3}$ ),

t $t$ 就是迭代次数

隐单元个数

这个超参与数据集非常相关, 如果数据分布复杂, 那么就需要更多的神经元个数, 是不是可以理解为”并不是说数据量越大网络就需要越复杂？”呢…….除非我们使用正则化方法(提前停止或者L1/L2惩罚项),否则隐单元个数与图模型的泛化能力将是U型的

惩罚项

典型的是L1/L2正则参数, $\lambda$ 是 $10^{-2},10^{-3},\cdots$

算法实现

导入包

这个也就没啥好说的, 导入三种模块:thenao相关的、解压相关的, 读取数据相关的, 计时相关的

# -*- coding:utf-8 -*-
#导入模块
import theano
import theano.tensor as T
import numpy as np
import cPickle,gzip
import os
import timeit

读取数据集

这个没啥好说的, 所有theano手写数字分类的博客都是用这段代码读数据

#读取数据集
def load_data(dataset):
    data_dir,data_file=os.path.split(dataset)
    if os.path.isfile(dataset):
        with gzip.open(dataset,'rb') as f:
            train_set,valid_set,test_set=cPickle.load(f)
    #共享数据集
    def shared_dataset(data_xy,borrow=True):
        data_x,data_y=data_xy
        shared_x=theano.shared(np.asarray(data_x,dtype=theano.config.floatX),borrow=borrow)
        shared_y=theano.shared(np.asarray(data_y,dtype=theano.config.floatX),borrow=borrow)
        return shared_x,T.cast(shared_y,'int32')

    #定义三个元组分别返回训练集,验证集,测试集
    train_set_x,train_set_y=shared_dataset(train_set)
    valid_set_x,valid_set_y=shared_dataset(valid_set)
    test_set_x,test_set_y=shared_dataset(test_set)
    rval=[(train_set_x,train_set_y),(valid_set_x,valid_set_y),(test_set_x,test_set_y)]
    return rval

分类器函数

这里要注意由于多层感知机最后一层输出是softmax, 而之前的隐层都是它前一层与权重乘积加上偏置被激活得来的(详细看前言中的那个计算每层单元值的方法), 所以我们要定义两种层:softmax层和HiddenLayer层

`softmax`层

直接复制粘贴前面一篇博客的定义方法就行啦

#定义最后一层softmax
class LogisticRegression(object):
    def __init__(self,input,n_in,n_out):
        #共享权重
        self.W=theano.shared(value=np.zeros((n_in,n_out),dtype=theano.config.floatX),
                            name='W',
                            borrow=True)
        #共享偏置
        self.b=theano.shared(value=np.zeros((n_out,),dtype=theano.config.floatX),
                            name='b',
                            borrow=True)
        #softmax函数
        self.p_y_given_x=T.nnet.softmax(T.dot(input,self.W)+self.b)
        #预测值
        self.y_pred=T.argmax(self.p_y_given_x,axis=1)
        self.params=[self.W,self.b]#模型参数
        self.input=input#模型输入

    #定义负对数似然
    def negative_log_likelihood(self,y):
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]),y])

    #定义误差
    def errors(self, y):

        # check if y has same dimension of y_pred
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        # check if y is of the correct datatype
        if y.dtype.startswith('int'):
            # the T.neq operator returns a vector of 0s and 1s, where 1
            # represents a mistake in prediction
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()

`HiddenLayer`层

因为MLP的损失函数都是softmax控制的, 而HiddenLayer只需要完成中间隐层单元值的计算就行了

#定义多层感知器的隐层单元相关操作
class HiddenLayer(object):
    def __init__(self,rng,input,n_in,n_out,W=None,b=None,activation=T.tanh):
        self.input=input
        if W is None:
            W_values=np.asarray(rng.uniform(low=- np.sqrt(6./(n_in+n_out)),
                                           high= np.sqrt(6./(n_in+n_out)),
                                           size=(n_in,n_out)),dtype=theano.config.floatX)
            if activation==T.nnet.sigmoid:
                W_values *= 4
            W=theano.shared(value=W_values,name='W',borrow=True)
        if b is None:
            b_vaules=np.zeros((n_out,),dtype=theano.config.floatX)
            b=theano.shared(value=b_vaules,name='b',borrow=True)

        self.W=W
        self.b=b

        lin_output=T.dot(input,self.W)+self.b#未被激活的线性操作
        self.output=(lin_output if activation is None else activation(lin_output))
        self.params=[self.W,self.b]

组合成`MLP`

搭建一个具有单隐层的MLP网络就是将这两个网络堆起来, 堆的方法就是将HiddenLayer的输出丢给softmax的输入, 还有一个就是要将HiddenLayer中的参数与softmax中的参数组合起来存到一起相当于是MLP的参数了

#定义感知器
class MLP(object):
    def __init__(self,rng,input,n_in,n_hidden,n_out):
        self.hiddenLayer=HiddenLayer(rng=rng,
                                     input=input,
                                     n_in=n_in,
                                     n_out=n_hidden,
                                     activation=T.tanh)
        self.logRegressitionLayer=LogisticRegression(input=self.hiddenLayer.output,
                                                    n_in=n_hidden,
                                                    n_out=n_out)
        #正则项
        self.L1=(abs(self.hiddenLayer.W).sum()+abs(self.logRegressitionLayer.W).sum())
        self.L2=((self.hiddenLayer.W**2).sum()+(self.logRegressitionLayer.W**2).sum())
        #损失函数
        self.negative_log_likelihood=(self.logRegressitionLayer.negative_log_likelihood)
        self.errors=self.logRegressitionLayer.errors
        self.params=self.hiddenLayer.params+self.logRegressitionLayer.params#两类参数存一起

训练

接下来就是训练了, 说白了就是梯度计算, 更新梯度, 提前终止训练, 以下代码都放在test_mlp()函数中

def test_mlp(learning_rate=0.01,L1_reg=0.00,L2_reg=0.0001,n_epochs=1000,
            dataset='mnist.pkl.gz',batch_size=20,n_hidden=500):

首先是读取数据, 计算批总数

    #读取数据
    datasets = load_data(dataset)
    train_set_x,train_set_y=datasets[0]
    valid_set_x,valid_set_y=datasets[1]
    test_set_x,test_set_y=datasets[2]
    #总批次
    n_train_batches=train_set_x.get_value(borrow=True).shape[0]//batch_size
    n_valid_batches=valid_set_x.get_value(borrow=True).shape[0] //batch_size
    n_test_batches=test_set_x.get_value(borrow=True).shape[0]//batch_size

随后构建存储数据和标签的容器, 并实例化一个分类器

#建立模型
    print '建立模型......'
    index=T.iscalar()#批索引
    x=T.matrix('x')#存储数据集
    y=T.ivector('y')#存储标签

    rng=np.random.RandomState(1234)
    #创建分类器
    classifier=MLP(rng=rng,input=x,n_in=28*28,n_hidden=n_hidden,n_out=10)

定义具有正则项的损失函数(softmax的负对数似然+ $\lambda_1L1+\lambda_2L2$ ), 并且对参数(包含softmax和HiddenLayer两种层的权重和偏置)求导, 并且进行梯度更新

    #创建具有正则项的损失函数
    cost=(classifier.negative_log_likelihood(y)+L1_reg*classifier.L1+L2_reg*classifier.L2)
    #梯度计算
    gparams=[T.grad(cost,param) for param in classifier.params]
    updates=[(param,param-learning_rate*gparams) for param,gparams in zip(classifier.params,gparams)]

接下来就是训练模型、验证模型、测试模型的三个函数设计

    #训练模型
    train_model=theano.function(inputs=[index],
                               outputs=cost,
                               updates=updates,
                               givens={
                                   x:train_set_x[index*batch_size:(index+1)*batch_size],
                                   y:train_set_y[index*batch_size:(index+1)*batch_size]
                               })
    #验证模型
    valid_model=theano.function(inputs=[index],
                               outputs=classifier.errors(y),
                               givens={
                                   x:valid_set_x[index*batch_size:(index+1)*batch_size],
                                   y:valid_set_y[index*batch_size:(index+1)*batch_size]
                               })
    #测试模型
    test_model=theano.function(inputs=[index],
                              outputs=classifier.errors(y),
                              givens={
                                  x:test_set_x[index*batch_size:(index+1)*batch_size],
                                  y:test_set_y[index*batch_size:(index+1)*batch_size]
                              })

使用提前终止算法开始训练

    #提前终止法训练
    patiences=10000
    patiences_increase=2
    improvement_threshold=0.995#模型性能提升阈值
    validation_frequency=min(n_train_batches,patiences//2)
    best_validation_loss=np.inf#最好的模型损失
    best_iter=0#最好的迭代次数
    best_score=0#最好的得分
    start_time=timeit.default_timer()

    epoch=0
    done_looping=False
    while(epoch<n_epochs) and (not done_looping):
        epoch=epoch+1
        for minibatch_index in range(n_train_batches):
            minibatch_avg_cost=train_model(minibatch_index)
            #迭代次数
            iter=(epoch-1)*n_train_batches+minibatch_index
            if (iter+1)%validation_frequency==0:
                validation_loss=[valid_model(i) for i in range(n_valid_batches)]
                this_validation_loss=np.mean(validation_loss)
                print(
                    'epoch %i, minibatch %i/%i, validation error %f %%' %
                    (
                        epoch,
                        minibatch_index + 1,
                        n_train_batches,
                        this_validation_loss * 100.
                    )
                )
                if this_validation_loss<best_validation_loss:
                    if this_validation_loss<best_validation_loss*improvement_threshold:
                        patiences=max(patiences,iter*patiences_increase)
                    best_validation_loss=this_validation_loss
                    best_iter=iter
                    #测试集的效果
                    test_losses=[test_model(i) for i in range(n_test_batches)]
                    test_score=np.mean(test_losses)
                    print(('     epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))
            if patiences<iter:
                done_looping=True
                break
    end_time=timeit.default_timer()
    print(('Optimization complete. Best validation score of %f %% '
           'obtained at iteration %i, with test performance %f %%') %
          (best_validation_loss * 100., best_iter + 1, test_score * 100.))

再回顾一下这个提前终止算法：最大迭代上限就是n_epochs, 在迭代过程中设置了一个最大耐心值patiences, 每批数据迭代一次算是更新了一次梯度(所以这个次数iter是一直递增的, 不会在某次循环被置零), 每更新validation_frequency次就测试以下模型的精度如何, 如果模型还在优化且性能提升超过阈值, 那么取max(原始耐心值, iter*增量)作为新的耐心值, 当模型性能不再优化或者优化程度不高的时候(不会再更新耐心值), 一旦梯度更新次数超过耐心值, 就强制终止循环了.

接下来执行训练过程【先别训练, 继续看博客】

if __name__=='__main__':
    test_mlp()

贴出我训练的时候最后一次迭代的准确率:

......
epoch 1000, minibatch 2500/2500, validation error 1.700000 %
Optimization complete. Best validation score of 1.690000 % obtained at iteration 2367500, with test performance 1.650000 %

那么问题出现了？我丫没保存模型哇，待会咋测试。。。。。。然后尝试着在上面的test_mlp()中添加保存过程

print(('epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))
                    # 保存最优模型
                    with open('best_model_MPL.pkl', 'wb') as f:
                        pickle.dump(classifier, f)
            if patiences<iter:
                done_looping=True
                break

我勒个擦，提示错误了

TypeError: can't pickle instancemethod objects

允许我这个python菜鸡逃避这个错误的修改方法, 尝试使用其它方法保存模型

想啊想，想啊想，好吧，把参数提取出来保存吧

 print(('epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))
                    # 保存最优模型
                    save_file=open('best_model_MLP.pkl','wb')
                    model=[classifier.hiddenLayer,classifier.logRegressitionLayer]
                    cPickle.dump( model,save_file)
            if patiences<iter:
                done_looping=True
                break

竟然成功了, 哈哈哈哈哈哈嗝o(╯□╰)o

测试

保存成功以后当然是来一波测试咯

读之

classifier=cPickle.load(open('best_model_MLP.pkl'))

初始化一个MLP, 注意要与训练的一模一样

x=T.matrix('x')
n_hidden=500
classifier_test=MLP(rng=np.random.RandomState(1234),input=x,n_in=28*28,n_hidden=n_hidden,n_out=10)

然后用set_value更改这个初始化MLP的权重和偏置

classifier_test.hiddenLayer.W.set_value(classifier[0].W.get_value())
classifier_test.hiddenLayer.b.set_value(classifier[0].b.get_value())

classifier_test.logRegressitionLayer.W.set_value(classifier[1].W.get_value())
classifier_test.logRegressitionLayer.b.set_value(classifier[1].b.get_value())

读一个数据出来

dataset='mnist.pkl.gz'
datasets=load_data(dataset)
test_set_x,test_set_y=datasets[2]
test_set_x=test_set_x.get_value()
test_data=test_set_x[10:11]

跟上一篇softmax一样使用y_pred()函数测试以下准确度

predict_model=theano.function(inputs=[x],outputs=classifier_test.logRegressitionLayer.y_pred)
predicted_value=predict_model(test_data)
print predicted_value

我勒个擦，竟然没错，出结果了，为了严谨性，我们输出以下这个图像

from skimage import io
import matplotlib.pyplot as plt
img= np.ceil(test_data*255)
img_res=np.asarray(img.reshape(28,28),dtype=np.int32)
io.imshow(img_res)
plt.show()

这里写图片描述

完全正确，多试几个也是对的，偷偷说一下，为了保存这个模型, 我后来只训练了模型2次哇

建立模型......
epoch 1, minibatch 2500/2500, validation error 9.620000 %
epoch 1, minibatch 2500/2500, test error of best model 10.090000 %
epoch 2, minibatch 2500/2500, validation error 8.610000 %
epoch 2, minibatch 2500/2500, test error of best model 8.740000 %
Optimization complete. Best validation score of 8.610000 % obtained at iteration 5000, with test performance 8.740000 %