前言:
卷积神经网络依靠其强大的特征提取的能力,在模式识别中大放异彩,这篇博文就是介绍如何用卷积神经网络识别olivettifaces人脸数据库,这算是图像识别的入门级的demo啦,在研习这篇博文前,如果你并没有CNN卷积神经网络的基础,强烈推荐先学习这一篇博文——卷积神经网络(CNN)原理,如果你已经有了卷积神经网络的基础,那就直接来吧!
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
说明:
1、olivettifaces人脸数据库详细简介大家可以通过这个链接自己了解,这一篇博客就不详述了,必要且简单的说明还是会悉心奉上;
2、博主是python3.5版本,IDE是pycharm,使用的深度学习框架有两个——tensorflow框架、Kreas框架。(Kreas框架底层也是tensorflow,只是Kreas代码看起来简洁,Kreas的下载安装也很简单这里不做介绍);
3、依靠tensorflow实现的方式较为复杂,这里主要介绍以Kreas框架实现的代码,但是两个的源码都在文末奉上;
4、本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
一、olivettifaces人脸数据库简介
1、olivettifaces人脸数据库是纽约大学组建的一个比较小的人脸数据库。有40个人,每人10张图片,组成一张有400张人脸的大图片。
2、像素灰度范围在[0,255]。整张图片大小是1190942,20行320列,所以每张照片大小是(1190/20)(942/20)= 57*47
3、程序需先配置h5py
:python -m pip install h5py
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
二、伪代码讲解
1、数据的读取;标签的划分
# 读取整张图片的数据,并设置对应标签
def get_load_data(dataset_path):
img = Image.open(dataset_path)
# 数据归一化。asarray是使用原内存将数据转化为np.ndarray
img_ndarray = np.asarray(img, dtype = 'float64')/255
# 400 pictures, size: 57*47 = 2679
faces_data = np.empty((400, 2679))
for row in range(20):
for column in range(20):
# flatten可将多维数组降成一维
faces_data[row*20+column] = np.ndarray.flatten(img_ndarray[row*57:(row+1)*57, column*47:(column+1)*47])
# 设置图片标签
label = np.empty(400)
for i in range(40):
label[i*10:(i+1)*10] = i
label = label.astype(np.int)
# 分割数据集:每个人前8张图片做训练,第9张做验证,第10张做测试;所以train:320,valid:40,test:40
train_data = np.empty((320, 2679))
train_label = np.empty(320)
valid_data = np.empty((40, 2679))
valid_label = np.empty(40)
test_data = np.empty((40, 2679))
test_label = np.empty(40)
for i in range(40):
train_data[i*8:i*8+8] = faces_data[i*10:i*10+8] # 训练集对应的数据
train_label[i*8:i*8+8] = label[i*10 : i*10+8] # 训练集对应的标签
valid_data[i] = faces_data[i*10+8] # 验证集对应的数据
valid_label[i] = label[i*10+8] # 验证集对应的标签
test_data[i] = faces_data[i*10+9] # 测试集对应的数据
test_label[i] = label[i*10+9] # 测试集对应的标签
train_data = train_data.astype('float32')
valid_data = valid_data.astype('float32')
test_data = test_data.astype('float32')
result = [(train_data, train_label), (valid_data, valid_label), (test_data, test_label)]
return result
依照图片的path
地址,读取图片的主要信息,根据上面的注释大家都能理解每一步都是做什么的,主要介绍以下几个地方:
- 设置图片标签:每10张图片设置一个相同的标签,
- 分割数据集:每个人10张照片中,前8张用做训练,第9张用做内测验证,第10张用做外测,也是按照像素索引进行划分;
- 函数的返回值就是3个元组,分别是训练集、内测验证集、外测测试集;
2、CNN网络的搭建
# CNN主体
def get_set_model(lr=0.005,decay=1e-6,momentum=0.9):
model = Sequential()
# 卷积1+池化1
if K.image_data_format() == 'channels_first':
model.add(Conv2D(nb_filters1, kernel_size=(3, 3), input_shape = (1, img_rows, img_cols)))
else:
model.add(Conv2D(nb_filters1, kernel_size=(2, 2), input_shape = (img_rows, img_cols, 1)))
model.add(Activation('tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# 卷积2+池化2
model.add(Conv2D(nb_filters2, kernel_size=(3, 3)))
model.add(Activation('tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 全连接层1+分类器层
model.add(Flatten())
model.add(Dense(1000)) #Full connection
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(40))
model.add(Activation('softmax'))
# 选择设置SGD优化器参数
sgd = SGD(lr=lr, decay=decay, momentum=momentum, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
return model
Kreas框架是不是看起来很简洁,它的语法主体就包含在以下声明中:
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD # 梯度下降的优化器
Sequential
:模型初始化Dense
:全连接层Flatten
:合并拉伸函数SGD
:optimizers
,优化器Dropout
、Activation
、Conv2D
、MaxPooling2D
就不介绍了,各自的参数含义也请参考卷积神经网络(CNN)原理
3、训练过程,保存参数
# 训练过程,保存参数
def get_train_model(model,X_train, Y_train, X_val, Y_val):
model.fit(X_train, Y_train, batch_size = batch_size, epochs = epochs,
verbose=1, validation_data=(X_val, Y_val))
# 保存参数
model.save_weights('model_weights.h5', overwrite=True)
return model
4、测试过程,调用参数
# 测试过程,调用参数
def get_test_model(model,X,Y):
model.load_weights('model_weights.h5')
score = model.evaluate(X, Y, verbose=0)
return score
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
三、Kreas源码及结果展示
# -*- coding:utf-8 -*-
# -*- author:zzZ_CMing CSDN address:https://blog.csdn.net/zzZ_CMing
# -*- 2018/06/05;11:41
# -*- python3.5
"""
olivetti Faces是纽约大学组建的一个比较小的人脸数据库。有40个人,每人10张图片,组成一张有400张人脸的大图片。
像素灰度范围在[0,255]。整张图片大小是1190*942,20行320列,所以每张照片大小是(1190/20)*(942/20)= 57*47
程序需配置h5py:python -m pip install h5py
博客地址:https://blog.csdn.net/zzZ_CMing,更多机器学习源码
"""
import numpy as np
from PIL import Image
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.optimizers import SGD # 梯度下降的优化器
from keras.utils import np_utils
from keras import backend as K
# 读取整张图片的数据,并设置对应标签
def get_load_data(dataset_path):
img = Image.open(dataset_path)
# 数据归一化。asarray是使用原内存将数据转化为np.ndarray
img_ndarray = np.asarray(img, dtype = 'float64')/255
# 400 pictures, size: 57*47 = 2679
faces_data = np.empty((400, 2679))
for row in range(20):
for column in range(20):
# flatten可将多维数组降成一维
faces_data[row*20+column] = np.ndarray.flatten(img_ndarray[row*57:(row+1)*57, column*47:(column+1)*47])
# 设置图片标签
label = np.empty(400)
for i in range(40):
label[i*10:(i+1)*10] = i
label = label.astype(np.int)
# 分割数据集:每个人前8张图片做训练,第9张做验证,第10张做测试;所以train:320,valid:40,test:40
train_data = np.empty((320, 2679))
train_label = np.empty(320)
valid_data = np.empty((40, 2679))
valid_label = np.empty(40)
test_data = np.empty((40, 2679))
test_label = np.empty(40)
for i in range(40):
train_data[i*8:i*8+8] = faces_data[i*10:i*10+8] # 训练集对应的数据
train_label[i*8:i*8+8] = label[i*10 : i*10+8] # 训练集对应的标签
valid_data[i] = faces_data[i*10+8] # 验证集对应的数据
valid_label[i] = label[i*10+8] # 验证集对应的标签
test_data[i] = faces_data[i*10+9] # 测试集对应的数据
test_label[i] = label[i*10+9] # 测试集对应的标签
train_data = train_data.astype('float32')
valid_data = valid_data.astype('float32')
test_data = test_data.astype('float32')
result = [(train_data, train_label), (valid_data, valid_label), (test_data, test_label)]
return result
# CNN主体
def get_set_model(lr=0.005,decay=1e-6,momentum=0.9):
model = Sequential()
# 卷积1+池化1
if K.image_data_format() == 'channels_first':
model.add(Conv2D(nb_filters1, kernel_size=(3, 3), input_shape = (1, img_rows, img_cols)))
else:
model.add(Conv2D(nb_filters1, kernel_size=(2, 2), input_shape = (img_rows, img_cols, 1)))
model.add(Activation('tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
# 卷积2+池化2
model.add(Conv2D(nb_filters2, kernel_size=(3, 3)))
model.add(Activation('tanh'))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Dropout(0.25))
# 全连接层1+分类器层
model.add(Flatten())
model.add(Dense(1000)) #Full connection
model.add(Activation('tanh'))
model.add(Dropout(0.5))
model.add(Dense(40))
model.add(Activation('softmax'))
# 选择设置SGD优化器参数
sgd = SGD(lr=lr, decay=decay, momentum=momentum, nesterov=True)
model.compile(loss='categorical_crossentropy', optimizer=sgd)
return model
# 训练过程,保存参数
def get_train_model(model,X_train, Y_train, X_val, Y_val):
model.fit(X_train, Y_train, batch_size = batch_size, epochs = epochs,
verbose=1, validation_data=(X_val, Y_val))
# 保存参数
model.save_weights('model_weights.h5', overwrite=True)
return model
# 测试过程,调用参数
def get_test_model(model,X,Y):
model.load_weights('model_weights.h5')
score = model.evaluate(X, Y, verbose=0)
return score
# [start]
epochs = 35 # 进行多少轮训练
batch_size = 40 # 每个批次迭代训练使用40个样本,一共可训练320/40=8个网络
img_rows, img_cols = 57, 47 # 每张人脸图片的大小
nb_filters1, nb_filters2 = 20, 40 # 两层卷积核的数目(即输出的维度)
if __name__ == '__main__':
# 将每个人10张图片,按8:1:1的比例拆分为训练集、验证集、测试集数据
(X_train, y_train), (X_val, y_val),(X_test, y_test) = get_load_data('olivettifaces.gif')
if K.image_data_format() == 'channels_first': # 1为图像像素深度
X_train = X_train.reshape(X_train.shape[0],1,img_rows,img_cols)
X_val = X_val.reshape(X_val.shape[0], 1, img_rows, img_cols)
X_test = X_test.reshape(X_test.shape[0], 1, img_rows, img_cols)
input_shape = (1, img_rows, img_cols)
else:
X_train = X_train.reshape(X_train.shape[0], img_rows, img_cols, 1)
X_val = X_val.reshape(X_val.shape[0], img_rows, img_cols, 1)
X_test = X_test.reshape(X_test.shape[0], img_rows, img_cols, 1)
input_shape = (img_rows, img_cols, 1)
print('X_train shape:', X_train.shape)
# convert class vectors to binary class matrices
Y_train = np_utils.to_categorical(y_train, 40)
Y_val = np_utils.to_categorical(y_val, 40)
Y_test = np_utils.to_categorical(y_test, 40)
# 训练过程,保存参数
model = get_set_model()
get_train_model(model, X_train, Y_train, X_val, Y_val)
score = get_test_model(model, X_test, Y_test)
# 测试过程,调用参数,得到准确率、预测输出
model.load_weights('model_weights.h5')
classes = model.predict_classes(X_test, verbose=0)
test_accuracy = np.mean(np.equal(y_test, classes))
print("last accuarcy:", test_accuracy)
for i in range(0,40):
if y_test[i] != classes[i]:
print(y_test[i], '被错误分成', classes[i]);
本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
四、TensorFlow源码
**声明:**这是网上前辈的代码,olivettifaces人脸数据库,表示敬意——这是很早以前的代码,有些地方与现在的函数调用不匹配,我已经整理过,应该可以跑起来并得到结果,这里只附上源码,有兴趣的伙伴自己通过上面链接研究啦。
4.1: 训练程序
建立train_CNN.py
文件,olivettifaces.gif
归入同一文件目录,train_CNN.py
文件内写入如下代码,
# -*- coding:utf-8 -*-
"""
本程序基于python+numpy+theano+PIL开发,采用类似LeNet5的CNN模型,应用于olivettifaces人脸数据库,
实现人脸识别的功能,模型的误差降到了5%以下。
本程序只是个人学习过程的一个toy implement,模型可能存在overfitting,因为样本小,这一点也无从验证。
但是,本程序意在理清程序开发CNN模型的具体步骤,特别是针对图像识别,从拿到图像数据库,到实现一个针对这个图像数据库的CNN模型,
我觉得本程序对这些流程的实现具有参考意义。
@author:wepon(http://2hwp.com)
讲解这份代码的文章:http://blog.csdn.net/u012162613/article/details/43277187
"""
import os
import sys
import time
import numpy as np
from PIL import Image
import theano
import theano.tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv
"""
加载图像数据的函数,dataset_path即图像olivettifaces的路径
加载olivettifaces后,划分为train_data,valid_data,test_data三个数据集
函数返回train_data,valid_data,test_data以及对应的label
"""
def get_data(dataset_path):
img = Image.open(dataset_path)
img_ndarray = np.asarray(img, dtype='float64') / 256
faces = np.empty((400, 2679))
for row in range(20):
for column in range(20):
faces[row * 20 + column] = np.ndarray.flatten(
img_ndarray[row * 57:(row + 1) * 57, column * 47:(column + 1) * 47])
label = np.empty(400)
for i in range(40):
label[i * 10:i * 10 + 10] = i
label = label.astype(np.int)
# 分成训练集、验证集、测试集,大小如下
train_data = np.empty((320, 2679))
train_label = np.empty(320)
valid_data = np.empty((40, 2679))
valid_label = np.empty(40)
test_data = np.empty((40, 2679))
test_label = np.empty(40)
for i in range(40):
train_data[i * 8:i * 8 + 8] = faces[i * 10:i * 10 + 8]
train_label[i * 8:i * 8 + 8] = label[i * 10:i * 10 + 8]
valid_data[i] = faces[i * 10 + 8]
valid_label[i] = label[i * 10 + 8]
test_data[i] = faces[i * 10 + 9]
test_label[i] = label[i * 10 + 9]
# 将数据集定义成shared类型,才能将数据复制进GPU,利用GPU加速程序。
def shared_dataset(data_x, data_y, borrow=True):
shared_x = theano.shared(np.asarray(data_x,
dtype=theano.config.floatX),
borrow=borrow)
shared_y = theano.shared(np.asarray(data_y,
dtype=theano.config.floatX),
borrow=borrow)
return shared_x, T.cast(shared_y, 'int32')
train_set_x, train_set_y = shared_dataset(train_data, train_label)
valid_set_x, valid_set_y = shared_dataset(valid_data, valid_label)
test_set_x, test_set_y = shared_dataset(test_data, test_label)
rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
(test_set_x, test_set_y)]
return rval
# 分类器,即CNN最后一层,采用逻辑回归(softmax)
class LogisticRegression(object):
def __init__(self, input, n_in, n_out):
self.W = theano.shared(value=np.zeros(
(n_in, n_out),dtype=theano.config.floatX),
name='W',borrow=True)
self.b = theano.shared(value=np.zeros(
(n_out,),dtype=theano.config.floatX),
name='b',borrow=True)
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
self.params = [self.W, self.b]
def negative_log_likelihood(self, y):
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
def errors(self, y):
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type))
if y.dtype.startswith('int'):
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()
# 全连接层,分类器前一层
class HiddenLayer(object):
def __init__(self, rng, input, n_in, n_out,
W=None, b=None,activation=T.tanh):
self.input = input
if W is None:
W_values = np.asarray(
rng.uniform(
low=-np.sqrt(6. / (n_in + n_out)),
high=np.sqrt(6. / (n_in + n_out)),
size=(n_in, n_out)),dtype=theano.config.floatX)
if activation == theano.tensor.nnet.sigmoid:
W_values *= 4
W = theano.shared(value=W_values, name='W', borrow=True)
if b is None:
b_values = np.zeros((n_out,), dtype=theano.config.floatX)
b = theano.shared(value=b_values, name='b', borrow=True)
self.W = W
self.b = b
lin_output = T.dot(input, self.W) + self.b
self.output = (
lin_output if activation is None
else activation(lin_output))
# parameters of the model
self.params = [self.W, self.b]
# 卷积+采样层(conv+maxpooling)
class LeNetConvPoolLayer(object):
def __init__(self, rng, input, filter_shape, image_shape, poolsize=(2, 2)):
assert image_shape[1] == filter_shape[1]
self.input = input
fan_in = np.prod(filter_shape[1:])
fan_out = (filter_shape[0] * np.prod(filter_shape[2:]) /
np.prod(poolsize))
# initialize weights with random weights
W_bound = np.sqrt(6. / (fan_in + fan_out))
self.W = theano.shared(
np.asarray(
rng.uniform(low=-W_bound, high=W_bound, size=filter_shape),
dtype=theano.config.floatX),borrow=True)
# the bias is a 1D tensor -- one bias per output feature map
b_values = np.zeros((filter_shape[0],), dtype=theano.config.floatX)
self.b = theano.shared(value=b_values, borrow=True)
# 卷积
conv_out = conv.conv2d(
input=input,
filters=self.W,
image_shape=image_shape,
filter_shape = filter_shape,)
# 子采样
pooled_out = pool_2d(
input=conv_out,
ws=poolsize,
ignore_border=True)
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
# store parameters of this layer
self.params = [self.W, self.b]
# 保存训练参数的函数
def save_params(param1, param2, param3, param4):
import pickle
write_file = open('params.pkl', 'wb')
pickle.dump(param1, write_file, -1)
pickle.dump(param2, write_file, -1)
pickle.dump(param3, write_file, -1)
pickle.dump(param4, write_file, -1)
write_file.close()
"""
上面定义好了CNN的一些基本构件,下面的函数将CNN应用于olivettifaces这个数据集,CNN的模型基于LeNet。
采用的优化算法是批量随机梯度下降算法,minibatch SGD,所以下面很多参数都带有batch_size,比如image_shape=(batch_size, 1, 57, 47)
可以设置的参数有:
batch_size,但应注意n_train_batches、n_valid_batches、n_test_batches的计算都依赖于batch_size
nkerns=[5, 10]即第一二层的卷积核个数可以设置
全连接层HiddenLayer的输出神经元个数n_out可以设置,要同时更改分类器的输入n_in
另外,还有一个很重要的就是学习速率learning_rate.
"""
def main(learning_rate=0.05, n_epochs=200,
dataset='olivettifaces.gif',
nkerns=[5, 10], batch_size=40):
# 随机数生成器,用于初始化参数
rng = np.random.RandomState(23455)
# 加载数据:分为训练集、验证集、测试集三个数据集
datasets = get_data(dataset)
train_set_x, train_set_y = datasets[0]
valid_set_x, valid_set_y = datasets[1]
test_set_x, test_set_y = datasets[2]
# 计算各数据集的batch个数
n_train_batches = train_set_x.get_value(borrow=True).shape[0]
n_valid_batches = valid_set_x.get_value(borrow=True).shape[0]
n_test_batches = test_set_x.get_value(borrow=True).shape[0]
n_train_batches /= batch_size
n_valid_batches /= batch_size
n_test_batches /= batch_size
# 定义几个变量,x代表人脸数据,作为layer0的输入
index = T.lscalar()
x = T.matrix('x')
y = T.ivector('y')
######################
# 建立CNN模型:
# input+layer0(LeNetConvPoolLayer)+layer1(LeNetConvPoolLayer)+layer2(HiddenLayer)+layer3(LogisticRegression)
######################
print('... building the model')
# Reshape matrix of rasterized images of shape (batch_size, 57 * 47)
# to a 4D tensor, compatible with our LeNetConvPoolLayer
# (57, 47) is the size of images.
layer0_input = x.reshape((batch_size, 1, 57, 47))
# 第一个卷积+maxpool层
# 卷积后得到:(57-5+1 , 47-5+1) = (53, 43)
# maxpooling后得到: (53/2, 43/2) = (26, 21),因为忽略了边界
# 4D output tensor is thus of shape (batch_size, nkerns[0], 26, 21)
layer0 = LeNetConvPoolLayer(rng,
input=layer0_input,
image_shape=(batch_size, 1, 57, 47),
filter_shape=(nkerns[0], 1, 5, 5),
poolsize=(2, 2)
)
# 第二个卷积+maxpool层,输入是上层的输出,即(batch_size, nkerns[0], 26, 21)
# 卷积后得到:(26-5+1 , 21-5+1) = (22, 17)
# maxpooling后得到: (22/2, 17/2) = (11, 8),因为忽略了边界
# 4D output tensor is thus of shape (batch_size, nkerns[1], 11, 8)
layer1 = LeNetConvPoolLayer(rng,
input=layer0.output,
image_shape=(batch_size, nkerns[0], 26, 21),
filter_shape=(nkerns[1], nkerns[0], 5, 5),
poolsize=(2, 2)
)
# HiddenLayer全连接层,它的输入的大小是(batch_size, num_pixels),也就是说要将每个样本经layer0、layer1后得到的特征图整成一个一维的长向量,
# 有batch_size个样本,故输入的大小为(batch_size, num_pixels),每一行是一个样本的长向量
# 因此将上一层的输出(batch_size, nkerns[1], 11, 8)转化为(batch_size, nkerns[1] * 11* 8),用flatten
layer2_input = layer1.output.flatten(2)
layer2 = HiddenLayer(
rng,
input=layer2_input,
n_in=nkerns[1] * 11 * 8,
n_out=2000, # 全连接层输出神经元的个数,自己定义的,可以根据需要调节
activation=T.tanh
)
# 分类器
layer3 = LogisticRegression(input=layer2.output, n_in=2000, n_out=40) # n_in等于全连接层的输出,n_out等于40个类别
###############
# 定义优化算法的一些基本要素:代价函数,训练、验证、测试model、参数更新规则(即梯度下降)
###############
# 代价函数
cost = layer3.negative_log_likelihood(y)
test_model = theano.function(
[index],
layer3.errors(y),
givens={
x: test_set_x[index * batch_size: (index + 1) * batch_size],
y: test_set_y[index * batch_size: (index + 1) * batch_size]
}
)
validate_model = theano.function(
[index],
layer3.errors(y),
givens={
x: valid_set_x[index * batch_size: (index + 1) * batch_size],
y: valid_set_y[index * batch_size: (index + 1) * batch_size]
}
)
# 所有参数
params = layer3.params + layer2.params + layer1.params + layer0.params
# 各个参数的梯度
grads = T.grad(cost, params)
# 参数更新规则
updates = [
(param_i, param_i - learning_rate * grad_i)
for param_i, grad_i in zip(params, grads)
]
# train_model在训练过程中根据MSGD优化更新参数
train_model = theano.function(
[index],
cost,
updates=updates,
givens={
x: train_set_x[index * batch_size: (index + 1) * batch_size],
y: train_set_y[index * batch_size: (index + 1) * batch_size]
}
)
###############
# 训练CNN阶段,寻找最优的参数。
###############
print('... training')
# 在LeNet5中,batch_size=500,n_train_batches=50000/500=100,patience=10000
# 在olivettifaces中,batch_size=40,n_train_batches=320/40=8, paticence可以相应地设置为800,这个可以根据实际情况调节,调大一点也无所谓
patience = 800
patience_increase = 2
improvement_threshold = 0.99
validation_frequency = min(n_train_batches, patience / 2)
best_validation_loss = np.inf
best_iter = 0
test_score = 0.
start_time = time.clock()
epoch = 0
done_looping = False
while (epoch < n_epochs) and (not done_looping):
epoch = epoch + 1
for minibatch_index in range(int(n_train_batches)):
iter = (epoch - 1) * n_train_batches + minibatch_index
if iter % 100 == 0:
print('training @ iter = ', iter)
cost_ij = train_model(minibatch_index)
if (iter + 1) % validation_frequency == 0:
# compute zero-one loss on validation set
validation_losses = [validate_model(i) for i
in range(int(n_valid_batches))]
this_validation_loss = np.mean(validation_losses)
print('epoch %i, minibatch %i/%i, validation error %f %%' %
(epoch, minibatch_index + 1, n_train_batches,
this_validation_loss * 100.))
# if we got the best validation score until now
if this_validation_loss < best_validation_loss:
# improve patience if loss improvement is good enough
if this_validation_loss < best_validation_loss * \
improvement_threshold:
patience = max(patience, iter * patience_increase)
# save best validation score and iteration number
best_validation_loss = this_validation_loss
best_iter = iter
save_params(layer0.params, layer1.params, layer2.params, layer3.params) # 保存参数
# test it on the test set
test_losses = [
test_model(i)
for i in range(int(n_test_batches))
]
test_score = np.mean(test_losses)
print((' epoch %i, minibatch %i/%i, test error of '
'best model %f %%') %
(epoch, minibatch_index + 1, n_train_batches,
test_score * 100.))
if patience <= iter:
done_looping = True
break
end_time = time.clock()
print('Optimization complete.')
print('Best validation score of %f %% obtained at iteration %i, '
'with test performance %f %%' %
(best_validation_loss * 100., best_iter + 1, test_score * 100.))
print >> sys.stderr, ('The code for file ' +
os.path.split(__file__)[1] +
' ran for %.2fm' % ((end_time - start_time) / 60.))
if __name__ == '__main__':
main()
训练程序会生成一个.pk
l的文件,该文件是保存了训练参数。
4.2: 测试程序
在同一目录下,建立use_CNN.py
文件,写入以下代码:
# -*-coding:utf8-*-#
"""
本程序实现的功能:
在train_CNN_olivettifaces.py中我们训练好并保存了模型的参数,利用这些保存下来的参数来初始化CNN模型,
这样就得到一个可以使用的CNN系统,将人脸图输入这个CNN系统,预测人脸图的类别。
@author:wepon(http://2hwp.com)
讲解这份代码的文章:http://blog.csdn.net/u012162613/article/details/43277187
"""
import os
import sys
import pickle
import numpy
from PIL import Image
import theano
import theano.tensor as T
from theano.tensor.signal.pool import pool_2d
from theano.tensor.nnet import conv
# 读取之前保存的训练参数
# layer0_params~layer3_params都是包含W和b的,layer*_params[0]是W,layer*_params[1]是b
def load_params(params_file):
f = open(params_file, 'rb')
layer0_params = pickle.load(f)
layer1_params = pickle.load(f)
layer2_params = pickle.load(f)
layer3_params = pickle.load(f)
f.close()
return layer0_params, layer1_params, layer2_params, layer3_params
# 读取图像,返回numpy.array类型的人脸数据以及对应的label
def load_data(dataset_path):
img = Image.open(dataset_path)
img_ndarray = numpy.asarray(img, dtype='float64') / 256
faces = numpy.empty((400, 2679))
for row in range(20):
for column in range(20):
faces[row * 20 + column] = numpy.ndarray.flatten(
img_ndarray[row * 57:(row + 1) * 57, column * 47:(column + 1) * 47])
label = numpy.empty(400)
for i in range(40):
label[i * 10:i * 10 + 10] = i
label = label.astype(numpy.int)
return faces, label
"""
train_CNN_olivettifaces中的LeNetConvPoolLayer、HiddenLayer、LogisticRegression是随机初始化的
下面将它们定义为可以用参数来初始化的版本
"""
class LogisticRegression(object):
def __init__(self, input, params_W, params_b, n_in, n_out):
self.W = params_W
self.b = params_b
self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
self.y_pred = T.argmax(self.p_y_given_x, axis=1)
self.params = [self.W, self.b]
def negative_log_likelihood(self, y):
return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
def errors(self, y):
if y.ndim != self.y_pred.ndim:
raise TypeError(
'y should have the same shape as self.y_pred',
('y', y.type, 'y_pred', self.y_pred.type)
)
if y.dtype.startswith('int'):
return T.mean(T.neq(self.y_pred, y))
else:
raise NotImplementedError()
class HiddenLayer(object):
def __init__(self, input, params_W, params_b, n_in, n_out,
activation=T.tanh):
self.input = input
self.W = params_W
self.b = params_b
lin_output = T.dot(input, self.W) + self.b
self.output = (
lin_output if activation is None
else activation(lin_output)
)
self.params = [self.W, self.b]
# 卷积+采样层(conv+maxpooling)
class LeNetConvPoolLayer(object):
def __init__(self, input, params_W, params_b, filter_shape, image_shape, poolsize=(2, 2)):
assert image_shape[1] == filter_shape[1]
self.input = input
self.W = params_W
self.b = params_b
# 卷积
conv_out = conv.conv2d(
input=input,
filters=self.W,
filter_shape=filter_shape,
image_shape=image_shape
)
# 子采样
pooled_out = pool_2d(
input=conv_out,
ws=poolsize,
ignore_border=True
)
self.output = T.tanh(pooled_out + self.b.dimshuffle('x', 0, 'x', 'x'))
self.params = [self.W, self.b]
"""
用之前保存下来的参数初始化CNN,就得到了一个训练好的CNN模型,然后使用这个模型来测图像
注意:n_kerns跟之前训练的模型要保持一致。dataset是你要测试的图像的路径,params_file是之前训练时保存的参数文件的路径
"""
def use_CNN(dataset='olivettifaces.gif', params_file='params.pkl', nkerns=[5, 10]):
# 读取测试的图像,这里读取整个olivettifaces.gif,即全部样本,得到faces、label
faces, label = load_data(dataset)
face_num = faces.shape[0] # 有多少张人脸图
# 读入参数
layer0_params, layer1_params, layer2_params, layer3_params = load_params(params_file)
x = T.matrix('x') # 用变量x表示输入的人脸数据,作为layer0的输入
######################
# 用读进来的参数初始化各层参数W、b
######################
layer0_input = x.reshape((face_num, 1, 57, 47))
layer0 = LeNetConvPoolLayer(
input=layer0_input,
params_W=layer0_params[0],
params_b=layer0_params[1],
image_shape=(face_num, 1, 57, 47),
filter_shape=(nkerns[0], 1, 5, 5),
poolsize=(2, 2)
)
layer1 = LeNetConvPoolLayer(
input=layer0.output,
params_W=layer1_params[0],
params_b=layer1_params[1],
image_shape=(face_num, nkerns[0], 26, 21),
filter_shape=(nkerns[1], nkerns[0], 5, 5),
poolsize=(2, 2)
)
layer2_input = layer1.output.flatten(2)
layer2 = HiddenLayer(
input=layer2_input,
params_W=layer2_params[0],
params_b=layer2_params[1],
n_in=nkerns[1] * 11 * 8,
n_out=2000,
activation=T.tanh
)
layer3 = LogisticRegression(input=layer2.output, params_W=layer3_params[0], params_b=layer3_params[1], n_in=2000,
n_out=40)
# 定义theano.function,让x作为输入,layer3.y_pred(即预测的类别)作为输出
f = theano.function(
[x], # funtion 的输入必须是list,即使只有一个输入
layer3.y_pred
)
# 预测的类别pred
pred = f(faces)
# 将预测的类别pred与真正类别label对比,输出错分的图像
for i in range(face_num):
if pred[i] != label[i]:
print('picture: %i is person %i, but mis-predicted as person %i' % (i, label[i], pred[i]))
if __name__ == '__main__':
use_CNN()
"""一点笔记,对theano.function的理解,不一定正确,后面深入理解了再回头看看
在theano里面,必须通过function定义输入x和输出,然后调用function,才会开始计算,比如在use_CNN里面,在定义layer0时,即使将faces作为输入,将layer1~layer3定义好后,也无法直接用layer3.y_pred来获得所属类别。
因为在theano中,layer0~layer3只是一种“图”关系,我们定义了layer0~layer3,也只是创建了这种图关系,但是如果没有funtion,它是不会计算的。
这也是为什么要定义x的原因:
x = T.matrix('x')
然后将变量x作为layer0的输入。
最后,定义一个function:
f = theano.function(
[x], #funtion 的输入必须是list,即使只有一个输入
layer3.y_pred
)
将x作为输入,layer3.y_pred作为输出。
当调用f(faces)时,就获得了预测值
"""
运行就可以得到结果,这里不做具体说明,如有错误,还望大家说明。
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
本篇博文所有代码都已上传:位置在这里,绝对干货无所欺
–-----------------------------------------------------------------------------—--------------------------------------------------------—----
–-----------------------------------------------------------------------------—--------------------------------------------------------—----