【深度学习】9:CNN实现olivettifaces人脸数据库识别

前言

卷积神经网络依靠其强大的特征提取的能力,在模式识别中大放异彩,这篇博文就是介绍如何用卷积神经网络识别olivettifaces人脸数据库,这算是图像识别的入门级的demo啦,在研习这篇博文前,如果你并没有CNN卷积神经网络的基础,强烈推荐先学习这一篇博文——卷积神经网络(CNN)原理,如果你已经有了卷积神经网络的基础,那就直接来吧!

-----------------------------------------------------------------------------------------------------------------------------------------

说明:

1、olivettifaces人脸数据库详细简介大家可以通过这个链接自己了解,这一篇博客就不详述了,必要且简单的说明还是会悉心奉上;

2、博主是python3.5版本,IDE是pycharm,使用的深度学习框架有两个——tensorflow框架、Kreas框架。(Kreas框架底层也是tensorflow,只是Kreas代码看起来简洁,Kreas的下载安装也很简单这里不做介绍);

3、依靠tensorflow实现的方式较为复杂,这里主要介绍以Kreas框架实现的代码,但是两个的源码都在文末奉上;

4、本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
-----------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------

一、olivettifaces人脸数据库简介

这里写图片描述
1、olivettifaces人脸数据库是纽约大学组建的一个比较小的人脸数据库。有40个人,每人10张图片,组成一张有400张人脸的大图片。

2、像素灰度范围在[0,255]。整张图片大小是1190942,20行320列,所以每张照片大小是(1190/20)(942/20)= 57*47

3、程序需先配置h5pypython -m pip install h5py

-----------------------------------------------------------------------------------------------------------------------------------------

二、伪代码讲解

1、数据的读取;标签的划分

# 读取整张图片的数据,并设置对应标签
def get_load_data(dataset_path):
    img = Image.open(dataset_path)
    # 数据归一化。asarray是使用原内存将数据转化为np.ndarray
    img_ndarray = np.asarray(img, dtype = 'float64')/255
    # 400 pictures, size: 57*47 = 2679  
    faces_data = np.empty((400, 2679))
    for row in range(20):  
       for column in range(20):
           # flatten可将多维数组降成一维
           faces_data[row*20+column] = np.ndarray.flatten(img_ndarray[row*57:(row+1)*57, column*47:(column+1)*47])

    # 设置图片标签
    label = np.empty(400)
    for i in range(40):
        label[i*10:(i+1)*10] = i
    label = label.astype(np.int)

    # 分割数据集:每个人前8张图片做训练,第9张做验证,第10张做测试;所以train:320,valid:40,test:40
    train_data = np.empty((320, 2679))
    train_label = np.empty(320)
    valid_data = np.empty((40, 2679))
    valid_label = np.empty(40)
    test_data = np.empty((40, 2679))
    test_label = np.empty(40)
    for i in range(40):
        train_data[i*8:i*8+8] = faces_data[i*10:i*10+8] # 训练集对应的数据
        train_label[i*8:i*8+8] = label[i*10 : i*10+8]   # 训练集对应的标签
        valid_data[i] = faces_data[i*10+8]   # 验证集对应的数据
        valid_label[i] = label[i*10+8]       # 验证集对应的标签
        test_data[i] = faces_data[i*10+9]    # 测试集对应的数据
        test_label[i] = label[i*10+9]        # 测试集对应的标签
    train_data = train_data.astype('float32')
    valid_data = valid_data.astype('float32')
    test_data = test_data.astype('float32')
       
    result = [(train_data, train_label), (valid_data, valid_label), (test_data, test_label)]
    return result

依照图片的path地址,读取图片的主要信息,根据上面的注释大家都能理解每一步都是做什么的,主要介绍以下几个地方:

  1. 设置图片标签:每10张图片设置一个相同的标签,
  2. 分割数据集:每个人10张照片中,前8张用做训练,第9张用做内测验证,第10张用做外测,也是按照像素索引进行划分;
  3. 函数的返回值就是3个元组,分别是训练集、内测验证集、外测测试集;

2、CNN网络的搭建

# CNN主体
def get_set_model(lr=0.005,decay=1e-6,momentum=0.9):
    model = Sequential()
    # 卷积1+池化1
    if K.image_data_format() == 'channels_first':
        model.add(Conv2D(nb_filters1, kernel_size=(3, 3), input_shape = (1, img_rows, img_cols)))
    else:
        model.add(Conv2D(nb_filters1, kernel_size=(2, 2), input_shape = (img_rows, img_cols, 1)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # 卷积2+池化2
    model.add(Conv2D(nb_filters2, kernel_size=(3, 3)))
    model.add(Activation('tanh'))  
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))  

    # 全连接层1+分类器层
    model.add(Flatten())  
    model.add(Dense(1000))       #Full connection
    model.add(Activation('tanh'))  
    model.add(Dropout(0.5))  
    model.add(Dense(40))
    model.add(Activation('softmax'))  

    # 选择设置SGD优化器参数
    sgd = SGD(lr=lr, decay=decay, momentum=momentum, nesterov=True)  
    model.compile(loss='categorical_crossentropy', optimizer=sgd)
    return model  

Kreas框架是不是看起来很简洁,它的语法主体就包含在以下声明中:

from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD    # 梯度下降的优化器
  • Sequential:模型初始化
  • Dense:全连接层
  • Flatten:合并拉伸函数
  • SGDoptimizers,优化器
  • DropoutActivationConv2DMaxPooling2D就不介绍了,各自的参数含义也请参考卷积神经网络(CNN)原理

3、训练过程,保存参数

# 训练过程,保存参数
def get_train_model(model,X_train, Y_train, X_val, Y_val):
    model.fit(X_train, Y_train, batch_size = batch_size, epochs = epochs,  
          verbose=1, validation_data=(X_val, Y_val))
    # 保存参数
    model.save_weights('model_weights.h5', overwrite=True)  
    return model  

4、测试过程,调用参数

# 测试过程,调用参数
def get_test_model(model,X,Y):
    model.load_weights('model_weights.h5')  
    score = model.evaluate(X, Y, verbose=0)
    return score  

-----------------------------------------------------------------------------------------------------------------------------------------
三、Kreas源码及结果展示

# -*- coding:utf-8 -*-
# -*- author:zzZ_CMing  CSDN address:https://blog.csdn.net/zzZ_CMing
# -*- 2018/06/05;11:41
# -*- python3.5
"""
olivetti Faces是纽约大学组建的一个比较小的人脸数据库。有40个人,每人10张图片,组成一张有400张人脸的大图片。
像素灰度范围在[0,255]。整张图片大小是1190*942,20行320列,所以每张照片大小是(1190/20)*(942/20)= 57*47
程序需配置h5py:python -m pip install h5py
博客地址:https://blog.csdn.net/zzZ_CMing,更多机器学习源码
"""
import numpy as np
from PIL import Image
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD    # 梯度下降的优化器
from keras.utils import np_utils
from keras import backend as K

# 读取整张图片的数据,并设置对应标签
def get_load_data(dataset_path):
    img = Image.open(dataset_path)
    # 数据归一化。asarray是使用原内存将数据转化为np.ndarray
    img_ndarray = np.asarray(img, dtype = 'float64')/255
    # 400 pictures, size: 57*47 = 2679  
    faces_data = np.empty((400, 2679))
    for row in range(20):  
       for column in range(20):
           # flatten可将多维数组降成一维
           faces_data[row*20+column] = np.ndarray.flatten(img_ndarray[row*57:(row+1)*57, column*47:(column+1)*47])

    # 设置图片标签
    label = np.empty(400)
    for i in range(40):
        label[i*10:(i+1)*10] = i
    label = label.astype(np.int)

    # 分割数据集:每个人前8张图片做训练,第9张做验证,第10张做测试;所以train:320,valid:40,test:40
    train_data = np.empty((320, 2679))
    train_label = np.empty(320)
    valid_data = np.empty((40, 2679))
    valid_label = np.empty(40)
    test_data = np.empty((40, 2679))
    test_label = np.empty(40)
    for i in range(40):
        train_data[i*8:i*8+8] = faces_data[i*10:i*10+8] # 训练集对应的数据
        train_label[i*8:i*8+8] = label[i*10 : i*10+8]   # 训练集对应的标签
        valid_data[i] = faces_data[i*10+8]   # 验证集对应的数据
        valid_label[i] = label[i*10+8]       # 验证集对应的标签
        test_data[i] = faces_data[i*10+9]    # 测试集对应的数据
        test_label[i] = label[i*10+9]        # 测试集对应的标签
    train_data = train_data.astype('float32')
    valid_data = valid_data.astype('float32')
    test_data = test_data.astype('float32')
       
    result = [(train_data, train_label), (valid_data, valid_label), (test_data, test_label)]
    return result

# CNN主体
def get_set_model(lr=0.005,decay=1e-6,momentum=0.9):
    model = Sequential()
    # 卷积1+池化1
    if K.image_data_format() == 'channels_first':
        model.add(Conv2D(nb_filters1, kernel_size=(3, 3), input_shape = (1, img_rows, img_cols)))
    else:
        model.add(Conv2D(nb_filters1, kernel_size=(2, 2), input_shape = (img_rows, img_cols, 1)))
    model.add(Activation('tanh'))
    model.add(MaxPooling2D(pool_size=(2, 2)))

    # 卷积2+池化2
    model.add(Conv2D(nb_filters2, kernel_size=(3, 3)))
    model.add(Activation('tanh'))  
    model.add(MaxPooling2D(pool_size=(2, 2)))
    model.add(Dropout(0.25))  

    # 全连接层1+分类器层
    model.add(Flatten())  
    model.add(Dense(1000))       #Full connection
    model.add(Activation('tanh'))  
    model.add(Dropout(0.5))  
    model.add(Dense(40))
    model.add(Activation('softmax'))  

    # 选择设置SGD优化器参数
    sgd = SGD(lr=lr, decay=decay, momentum=momentum, nesterov=True)  
    model.compile(loss='categorical_crossentropy', optimizer=sgd)
    return model  

# 训练过程,保存参数
def get_train_model(model,X_train, Y_train, X_val, Y_val):
    model.fit(X_train, Y_train, batch_size = batch_size, epochs = epochs,  
          verbose=1, validation_data=(X_val, Y_val))
    # 保存参数
    model.save_weights('model_weights.h5', overwrite=True)  
    return model  

# 测试过程,调用参数
def get_test_model(model,X,Y):
    model.load_weights('model_weights.h5')  
    score = model.evaluate(X, Y, verbose=0)
    return score  



# [start]
epochs = 35          # 进行多少轮训练
batch_size = 40      # 每个批次迭代训练使用40个样本,一共可训练320/40=8个网络
img_rows, img_cols = 57, 47         # 每张人脸图片的大小
nb_filters1, nb_filters2 = 20, 40   # 两层卷积核的数目(即输出的维度)

if __name__ == '__main__':  
    # 将每个人10张图片,按8:1:1的比例拆分为训练集、验证集、测试集数据
    (X_train, y_train), (X_val, y_val),(X_test, y_test) = get_load_data('olivettifaces.gif')
    
    if K.image_data_format() == 'channels_first':    # 1为图像像素深度
        X_train = X_train.reshape(X_train.shape[0],1,img_rows,img_cols)
        X_val = X_val.reshape(X_val.shape[0], 1, img_rows, img_cols)  
        X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)  
        input_shape = (1, img_rows, img_cols)
    else:
        X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)  
        X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1)  
        X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)  
        input_shape = (img_rows, img_cols, 1)
    
    print('X_train shape:', X_train.shape)
    # convert class vectors to binary class matrices  
    Y_train = np_utils.to_categorical(y_train, 40)
    Y_val = np_utils.to_categorical(y_val, 40)
    Y_test = np_utils.to_categorical(y_test, 40)

    # 训练过程,保存参数
    model = get_set_model()
    get_train_model(model, X_train, Y_train, X_val, Y_val)
    score = get_test_model(model, X_test, Y_test)

    # 测试过程,调用参数,得到准确率、预测输出
    model.load_weights('model_weights.h5')
    classes = model.predict_classes(X_test, verbose=0)  
    test_accuracy = np.mean(np.equal(y_test, classes))
    print("last accuarcy:", test_accuracy)
    for i in range(0,40):
        if y_test[i] != classes[i]:
            print(y_test[i], '被错误分成', classes[i]);
    

这里写图片描述

本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
-----------------------------------------------------------------------------------------------------------------------------------------

四、TensorFlow源码

**声明:**这是网上前辈的代码,olivettifaces人脸数据库,表示敬意——这是很早以前的代码,有些地方与现在的函数调用不匹配,我已经整理过,应该可以跑起来并得到结果,这里只附上源码,有兴趣的伙伴自己通过上面链接研究啦。

4.1: 训练程序

建立train_CNN.py文件,olivettifaces.gif归入同一文件目录,train_CNN.py文件内写入如下代码,

# -*- coding:utf-8 -*-
"""
本程序基于python+numpy+theano+PIL开发,采用类似LeNet5的CNN模型,应用于olivettifaces人脸数据库,
实现人脸识别的功能,模型的误差降到了5%以下。
本程序只是个人学习过程的一个toy implement,模型可能存在overfitting,因为样本小,这一点也无从验证。
但是,本程序意在理清程序开发CNN模型的具体步骤,特别是针对图像识别,从拿到图像数据库,到实现一个针对这个图像数据库的CNN模型,
我觉得本程序对这些流程的实现具有参考意义。
@author:wepon(http://2hwp.com)
讲解这份代码的文章:http://blog.csdn.net/u012162613/article/details/43277187
"""
import os
import sys
import time

import numpy as np
from PIL import Image

import theano
import theano.tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv

"""
加载图像数据的函数,dataset_path即图像olivettifaces的路径
加载olivettifaces后,划分为train_data,valid_data,test_data三个数据集
函数返回train_data,valid_data,test_data以及对应的label
"""


def get_data(dataset_path):
    img = Image.open(dataset_path)
    img_ndarray = np.asarray(img, dtype='float64') / 256
    faces = np.empty((400, 2679))
    for row in range(20):
        for column in range(20):
            faces[row * 20 + column] = np.ndarray.flatten(
                img_ndarray[row * 57:(row + 1) * 57, column * 47:(column + 1) * 47])

    label = np.empty(400)
    for i in range(40):
        label[i * 10:i * 10 + 10] = i
    label = label.astype(np.int)

    # 分成训练集、验证集、测试集,大小如下
    train_data = np.empty((320, 2679))
    train_label = np.empty(320)
    valid_data = np.empty((40, 2679))
    valid_label = np.empty(40)
    test_data = np.empty((40, 2679))
    test_label = np.empty(40)

    for i in range(40):
        train_data[i * 8:i * 8 + 8] = faces[i * 10:i * 10 + 8]
        train_label[i * 8:i * 8 + 8] = label[i * 10:i * 10 + 8]
        valid_data[i] = faces[i * 10 + 8]
        valid_label[i] = label[i * 10 + 8]
        test_data[i] = faces[i * 10 + 9]
        test_label[i] = label[i * 10 + 9]

    # 将数据集定义成shared类型,才能将数据复制进GPU,利用GPU加速程序。
    def shared_dataset(data_x, data_y, borrow=True):
        shared_x = theano.shared(np.asarray(data_x,
                                               dtype=theano.config.floatX),
                                 borrow=borrow)
        shared_y = theano.shared(np.asarray(data_y,
                                               dtype=theano.config.floatX),
                                 borrow=borrow)
        return shared_x, T.cast(shared_y, 'int32')


    train_set_x, train_set_y = shared_dataset(train_data, train_label)
    valid_set_x, valid_set_y = shared_dataset(valid_data, valid_label)
    test_set_x, test_set_y = shared_dataset(test_data, test_label)
    rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
            (test_set_x, test_set_y)]
    return rval


# 分类器,即CNN最后一层,采用逻辑回归(softmax)
class LogisticRegression(object):
    def __init__(self, input, n_in, n_out):
        self.W = theano.shared(value=np.zeros(
            (n_in, n_out),dtype=theano.config.floatX),
            name='W',borrow=True)

        self.b = theano.shared(value=np.zeros(
                (n_out,),dtype=theano.config.floatX),
            name='b',borrow=True)

        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)
        self.params = [self.W, self.b]

    def negative_log_likelihood(self, y):
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

    def errors(self, y):
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type))

        if y.dtype.startswith('int'):
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()


# 全连接层,分类器前一层
class HiddenLayer(object):
    def __init__(self, rng, input, n_in, n_out,
                 W=None, b=None,activation=T.tanh):

        self.input = input

        if W is None:
            W_values = np.asarray(
                rng.uniform(
                    low=-np.sqrt(6. / (n_in + n_out)),
                    high=np.sqrt(6. / (n_in + n_out)),
                    size=(n_in, n_out)),dtype=theano.config.floatX)

            if activation == theano.tensor.nnet.sigmoid:
                W_values *= 4
            W = theano.shared(value=W_values, name='W', borrow=True)

        if b is None:
            b_values = np.zeros((n_out,), dtype=theano.config.floatX)
            b = theano.shared(value=b_values, name='b', borrow=True)

        self.W = W
        self.b = b

        lin_output = T.dot(input, self.W) + self.b
        self.output = (
            lin_output if activation is None
            else activation(lin_output))

        # parameters of the model
        self.params = [self.W, self.b]


# 卷积+采样层(conv+maxpooling)
class LeNetConvPoolLayer(object):
    def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
        assert image_shape[1] == filter_shape[1]
        self.input = input

        fan_in = np.prod(filter_shape[1:])
        fan_out = (filter_shape[0] * np.prod(filter_shape[2:]) /
                   np.prod(poolsize))

        # initialize weights with random weights
        W_bound = np.sqrt(6. / (fan_in + fan_out))
        self.W = theano.shared(
            np.asarray(
                rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
                dtype=theano.config.floatX),borrow=True)

        # the bias is a 1D tensor -- one bias per output feature map
        b_values = np.zeros((filter_shape[0],), dtype=theano.config.floatX)
        self.b = theano.shared(value=b_values, borrow=True)

        # 卷积
        conv_out = conv.conv2d(
            input=input,
            filters=self.W,
            image_shape=image_shape,
            filter_shape = filter_shape,)

        # 子采样
        pooled_out = pool_2d(
            input=conv_out,
            ws=poolsize,
            ignore_border=True)

        self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))

        # store parameters of this layer
        self.params = [self.W, self.b]


# 保存训练参数的函数
def save_params(param1, param2, param3, param4):
    import pickle
    write_file = open('params.pkl', 'wb')
    pickle.dump(param1, write_file, -1)
    pickle.dump(param2, write_file, -1)
    pickle.dump(param3, write_file, -1)
    pickle.dump(param4, write_file, -1)
    write_file.close()


"""
上面定义好了CNN的一些基本构件,下面的函数将CNN应用于olivettifaces这个数据集,CNN的模型基于LeNet。
采用的优化算法是批量随机梯度下降算法,minibatch SGD,所以下面很多参数都带有batch_size,比如image_shape=(batch_size, 1, 57, 47)
可以设置的参数有:
batch_size,但应注意n_train_batches、n_valid_batches、n_test_batches的计算都依赖于batch_size
nkerns=[5, 10]即第一二层的卷积核个数可以设置
全连接层HiddenLayer的输出神经元个数n_out可以设置,要同时更改分类器的输入n_in
另外,还有一个很重要的就是学习速率learning_rate.
"""


def main(learning_rate=0.05, n_epochs=200,
         dataset='olivettifaces.gif',
        nkerns=[5, 10], batch_size=40):

    # 随机数生成器,用于初始化参数
    rng = np.random.RandomState(23455)
    # 加载数据:分为训练集、验证集、测试集三个数据集
    datasets = get_data(dataset)
    train_set_x, train_set_y = datasets[0]
    valid_set_x, valid_set_y = datasets[1]
    test_set_x, test_set_y = datasets[2]

    # 计算各数据集的batch个数
    n_train_batches = train_set_x.get_value(borrow=True).shape[0]
    n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
    n_test_batches = test_set_x.get_value(borrow=True).shape[0]
    n_train_batches /= batch_size
    n_valid_batches /= batch_size
    n_test_batches /= batch_size

    # 定义几个变量,x代表人脸数据,作为layer0的输入
    index = T.lscalar()
    x = T.matrix('x')
    y = T.ivector('y')

    ######################
    # 建立CNN模型:
    # input+layer0(LeNetConvPoolLayer)+layer1(LeNetConvPoolLayer)+layer2(HiddenLayer)+layer3(LogisticRegression)
    ######################
    print('... building the model')

    # Reshape matrix of rasterized images of shape (batch_size, 57 * 47)
    # to a 4D tensor, compatible with our LeNetConvPoolLayer
    # (57, 47) is the size of  images.
    layer0_input = x.reshape((batch_size, 1, 57, 47))

    # 第一个卷积+maxpool层
    # 卷积后得到:(57-5+1 , 47-5+1) = (53, 43)
    # maxpooling后得到: (53/2, 43/2) = (26, 21),因为忽略了边界
    # 4D output tensor is thus of shape (batch_size, nkerns[0], 26, 21)
    layer0 = LeNetConvPoolLayer(rng,
        input=layer0_input,
        image_shape=(batch_size, 1, 57, 47),
        filter_shape=(nkerns[0], 1, 5, 5),
        poolsize=(2, 2)
    )

    # 第二个卷积+maxpool层,输入是上层的输出,即(batch_size, nkerns[0], 26, 21)
    # 卷积后得到:(26-5+1 , 21-5+1) = (22, 17)
    # maxpooling后得到: (22/2, 17/2) = (11, 8),因为忽略了边界
    # 4D output tensor is thus of shape (batch_size, nkerns[1], 11, 8)
    layer1 = LeNetConvPoolLayer(rng,
        input=layer0.output,
        image_shape=(batch_size, nkerns[0], 26, 21),
        filter_shape=(nkerns[1], nkerns[0], 5, 5),
        poolsize=(2, 2)
    )

    # HiddenLayer全连接层,它的输入的大小是(batch_size, num_pixels),也就是说要将每个样本经layer0、layer1后得到的特征图整成一个一维的长向量,
    # 有batch_size个样本,故输入的大小为(batch_size, num_pixels),每一行是一个样本的长向量
    # 因此将上一层的输出(batch_size, nkerns[1], 11, 8)转化为(batch_size, nkerns[1] * 11* 8),用flatten
    layer2_input = layer1.output.flatten(2)
    layer2 = HiddenLayer(
        rng,
        input=layer2_input,
        n_in=nkerns[1] * 11 * 8,
        n_out=2000,  # 全连接层输出神经元的个数,自己定义的,可以根据需要调节
        activation=T.tanh
    )

    # 分类器
    layer3 = LogisticRegression(input=layer2.output, n_in=2000, n_out=40)  # n_in等于全连接层的输出,n_out等于40个类别

    ###############
    # 定义优化算法的一些基本要素:代价函数,训练、验证、测试model、参数更新规则(即梯度下降)
    ###############
    # 代价函数
    cost = layer3.negative_log_likelihood(y)

    test_model = theano.function(
        [index],
        layer3.errors(y),
        givens={
            x: test_set_x[index * batch_size: (index + 1) * batch_size],
            y: test_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    validate_model = theano.function(
        [index],
        layer3.errors(y),
        givens={
            x: valid_set_x[index * batch_size: (index + 1) * batch_size],
            y: valid_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    # 所有参数
    params = layer3.params + layer2.params + layer1.params + layer0.params
    # 各个参数的梯度
    grads = T.grad(cost, params)
    # 参数更新规则
    updates = [
        (param_i, param_i - learning_rate * grad_i)
        for param_i, grad_i in zip(params, grads)
    ]
    # train_model在训练过程中根据MSGD优化更新参数
    train_model = theano.function(
        [index],
        cost,
        updates=updates,
        givens={
            x: train_set_x[index * batch_size: (index + 1) * batch_size],
            y: train_set_y[index * batch_size: (index + 1) * batch_size]
        }
    )

    ###############
    # 训练CNN阶段,寻找最优的参数。
    ###############
    print('... training')
    # 在LeNet5中,batch_size=500,n_train_batches=50000/500=100,patience=10000
    # 在olivettifaces中,batch_size=40,n_train_batches=320/40=8, paticence可以相应地设置为800,这个可以根据实际情况调节,调大一点也无所谓
    patience = 800
    patience_increase = 2
    improvement_threshold = 0.99
    validation_frequency = min(n_train_batches, patience / 2)

    best_validation_loss = np.inf
    best_iter = 0
    test_score = 0.
    start_time = time.clock()

    epoch = 0
    done_looping = False

    while (epoch < n_epochs) and (not done_looping):
        epoch = epoch + 1
        for minibatch_index in range(int(n_train_batches)):

            iter = (epoch - 1) * n_train_batches + minibatch_index

            if iter % 100 == 0:
                print('training @ iter = ', iter)

            cost_ij = train_model(minibatch_index)

            if (iter + 1) % validation_frequency == 0:

                # compute zero-one loss on validation set
                validation_losses = [validate_model(i) for i
                                     in range(int(n_valid_batches))]
                this_validation_loss = np.mean(validation_losses)
                print('epoch %i, minibatch %i/%i, validation error %f %%' %
                      (epoch, minibatch_index + 1, n_train_batches,
                       this_validation_loss * 100.))

                # if we got the best validation score until now
                if this_validation_loss < best_validation_loss:

                    # improve patience if loss improvement is good enough
                    if this_validation_loss < best_validation_loss * \
                            improvement_threshold:
                        patience = max(patience, iter * patience_increase)

                    # save best validation score and iteration number
                    best_validation_loss = this_validation_loss
                    best_iter = iter
                    save_params(layer0.params, layer1.params, layer2.params, layer3.params)  # 保存参数

                    # test it on the test set
                    test_losses = [
                        test_model(i)
                        for i in range(int(n_test_batches))
                    ]
                    test_score = np.mean(test_losses)
                    print(('     epoch %i, minibatch %i/%i, test error of '
                           'best model %f %%') %
                          (epoch, minibatch_index + 1, n_train_batches,
                           test_score * 100.))

            if patience <= iter:
                done_looping = True
                break

    end_time = time.clock()
    print('Optimization complete.')
    print('Best validation score of %f %% obtained at iteration %i, '
          'with test performance %f %%' %
          (best_validation_loss * 100., best_iter + 1, test_score * 100.))
    print >> sys.stderr, ('The code for file ' +
                          os.path.split(__file__)[1] +
                          ' ran for %.2fm' % ((end_time - start_time) / 60.))


if __name__ == '__main__':
    main()

训练程序会生成一个.pkl的文件,该文件是保存了训练参数。
4.2: 测试程序

在同一目录下,建立use_CNN.py文件,写入以下代码:

# -*-coding:utf8-*-#
"""
本程序实现的功能:
在train_CNN_olivettifaces.py中我们训练好并保存了模型的参数,利用这些保存下来的参数来初始化CNN模型,
这样就得到一个可以使用的CNN系统,将人脸图输入这个CNN系统,预测人脸图的类别。
@author:wepon(http://2hwp.com)
讲解这份代码的文章:http://blog.csdn.net/u012162613/article/details/43277187
"""

import os
import sys
import pickle

import numpy
from PIL import Image

import theano
import theano.tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv


# 读取之前保存的训练参数
# layer0_params~layer3_params都是包含W和b的,layer*_params[0]是W,layer*_params[1]是b
def load_params(params_file):
    f = open(params_file, 'rb')
    layer0_params = pickle.load(f)
    layer1_params = pickle.load(f)
    layer2_params = pickle.load(f)
    layer3_params = pickle.load(f)
    f.close()
    return layer0_params, layer1_params, layer2_params, layer3_params


# 读取图像,返回numpy.array类型的人脸数据以及对应的label
def load_data(dataset_path):
    img = Image.open(dataset_path)
    img_ndarray = numpy.asarray(img, dtype='float64') / 256

    faces = numpy.empty((400, 2679))
    for row in range(20):
        for column in range(20):
            faces[row * 20 + column] = numpy.ndarray.flatten(
                img_ndarray[row * 57:(row + 1) * 57, column * 47:(column + 1) * 47])

    label = numpy.empty(400)
    for i in range(40):
        label[i * 10:i * 10 + 10] = i
    label = label.astype(numpy.int)

    return faces, label


"""
train_CNN_olivettifaces中的LeNetConvPoolLayer、HiddenLayer、LogisticRegression是随机初始化的
下面将它们定义为可以用参数来初始化的版本
"""


class LogisticRegression(object):
    def __init__(self, input, params_W, params_b, n_in, n_out):
        self.W = params_W
        self.b = params_b
        self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
        self.y_pred = T.argmax(self.p_y_given_x, axis=1)
        self.params = [self.W, self.b]

    def negative_log_likelihood(self, y):
        return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])

    def errors(self, y):
        if y.ndim != self.y_pred.ndim:
            raise TypeError(
                'y should have the same shape as self.y_pred',
                ('y', y.type, 'y_pred', self.y_pred.type)
            )
        if y.dtype.startswith('int'):
            return T.mean(T.neq(self.y_pred, y))
        else:
            raise NotImplementedError()


class HiddenLayer(object):
    def __init__(self, input, params_W, params_b, n_in, n_out,
                 activation=T.tanh):
        self.input = input
        self.W = params_W
        self.b = params_b

        lin_output = T.dot(input, self.W) + self.b
        self.output = (
            lin_output if activation is None
            else activation(lin_output)
        )
        self.params = [self.W, self.b]


# 卷积+采样层(conv+maxpooling)
class LeNetConvPoolLayer(object):
    def __init__(self, input, params_W, params_b, filter_shape, image_shape, poolsize=(2, 2)):
        assert image_shape[1] == filter_shape[1]
        self.input = input
        self.W = params_W
        self.b = params_b
        # 卷积
        conv_out = conv.conv2d(
            input=input,
            filters=self.W,
            filter_shape=filter_shape,
            image_shape=image_shape
        )
        # 子采样
        pooled_out = pool_2d(
            input=conv_out,
            ws=poolsize,
            ignore_border=True
        )
        self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
        self.params = [self.W, self.b]


"""
用之前保存下来的参数初始化CNN,就得到了一个训练好的CNN模型,然后使用这个模型来测图像
注意:n_kerns跟之前训练的模型要保持一致。dataset是你要测试的图像的路径,params_file是之前训练时保存的参数文件的路径
"""


def use_CNN(dataset='olivettifaces.gif', params_file='params.pkl', nkerns=[5, 10]):
    # 读取测试的图像,这里读取整个olivettifaces.gif,即全部样本,得到faces、label
    faces, label = load_data(dataset)
    face_num = faces.shape[0]  # 有多少张人脸图

    # 读入参数
    layer0_params, layer1_params, layer2_params, layer3_params = load_params(params_file)

    x = T.matrix('x')  # 用变量x表示输入的人脸数据,作为layer0的输入

    ######################
    # 用读进来的参数初始化各层参数W、b
    ######################
    layer0_input = x.reshape((face_num, 1, 57, 47))
    layer0 = LeNetConvPoolLayer(
        input=layer0_input,
        params_W=layer0_params[0],
        params_b=layer0_params[1],
        image_shape=(face_num, 1, 57, 47),
        filter_shape=(nkerns[0], 1, 5, 5),
        poolsize=(2, 2)
    )

    layer1 = LeNetConvPoolLayer(
        input=layer0.output,
        params_W=layer1_params[0],
        params_b=layer1_params[1],
        image_shape=(face_num, nkerns[0], 26, 21),
        filter_shape=(nkerns[1], nkerns[0], 5, 5),
        poolsize=(2, 2)
    )

    layer2_input = layer1.output.flatten(2)
    layer2 = HiddenLayer(
        input=layer2_input,
        params_W=layer2_params[0],
        params_b=layer2_params[1],
        n_in=nkerns[1] * 11 * 8,
        n_out=2000,
        activation=T.tanh
    )

    layer3 = LogisticRegression(input=layer2.output, params_W=layer3_params[0], params_b=layer3_params[1], n_in=2000,
                                n_out=40)

    # 定义theano.function,让x作为输入,layer3.y_pred(即预测的类别)作为输出
    f = theano.function(
        [x],  # funtion 的输入必须是list,即使只有一个输入
        layer3.y_pred
    )

    # 预测的类别pred
    pred = f(faces)

    # 将预测的类别pred与真正类别label对比,输出错分的图像
    for i in range(face_num):
        if pred[i] != label[i]:
            print('picture: %i is person %i, but mis-predicted as person %i' % (i, label[i], pred[i]))


if __name__ == '__main__':
    use_CNN()

"""一点笔记,对theano.function的理解,不一定正确,后面深入理解了再回头看看
在theano里面,必须通过function定义输入x和输出,然后调用function,才会开始计算,比如在use_CNN里面,在定义layer0时,即使将faces作为输入,将layer1~layer3定义好后,也无法直接用layer3.y_pred来获得所属类别。
因为在theano中,layer0~layer3只是一种“图”关系,我们定义了layer0~layer3,也只是创建了这种图关系,但是如果没有funtion,它是不会计算的。
这也是为什么要定义x的原因:
    x = T.matrix('x')
然后将变量x作为layer0的输入。
最后,定义一个function:
f = theano.function(
        [x],    #funtion 的输入必须是list,即使只有一个输入
        layer3.y_pred
    )
将x作为输入,layer3.y_pred作为输出。
当调用f(faces)时,就获得了预测值
"""

运行就可以得到结果,这里不做具体说明,如有错误,还望大家说明。
-----------------------------------------------------------------------------------------------------------------------------------------

本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
-----------------------------------------------------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------------------------------------------------

评论 5
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值