tensorflow+keras+estimator完成cifar10数据集对Resnet50的训练

         虽然一直在使用tensorflow,但是,都是简单的实现训练,并没有对该软件有特别深入的了结,前段时间Google王铁震工程师在CSDN的课上讲述了estimator的方便之处,尤其是在分布式神经网络训练上的强大应用,才让我默然觉得这是个不错的东西,所以,就花了较长的时间看了一下这个东西怎么用,尤其是keras新版本中已经添加了关于模型转换estimator的部分,就i做了一些尝试,坦率来讲,tensorflow对与新手的难用之处在于用法过于灵活,导致写出来的代码花样层出,对于新手无疑是一种折磨。

          一般情况下我都是直接用keras完成所有操作,但是,目前最“骚气”的方式都是用tnesorlfow的Dataset做数据获取,keras做模型搭建,estimator做训练,keras本身可以训练,model.fit()就可以完成,但是,estimator用起来更灵活,尤其是在分布式训练中比较好,可以在不改变模型和训练代码的情况下,加入几行代码就能实现,因此,我就做了一定的尝试,分享一下自己拙劣的代码。

          再次说明一下,我的关于resnet50的代码并没有使用keras的application的特性,而是使用了自己在Adrew Ng(吴恩达)老师关于CNN课程的课后作业代码。个人感觉这篇博客可能对初学者有一定的帮助,所以,用课后作业代码更好理解,也希望大神们不要嘲笑。也感谢吴恩达老师的团队辛勤付出。这段代码在谷歌的colab上测试成功。

# -*- coding: utf-8 -*-
"""Untitled0.ipynb

Automatically generated by Colaboratory.

Original file is located at
    https://colab.research.google.com/github/lingyuhaunti/tensorflow/blob/master/estimator.ipynb
"""
import numpy as np
import time
import tensorflow as tf
from tensorflow.python import keras
from tensorflow.python.keras import layers
from tensorflow.python.keras.layers import Input, Add, Dense, Activation, ZeroPadding2D, BatchNormalization, Flatten, Conv2D, AveragePooling2D, MaxPooling2D, GlobalMaxPooling2D
from tensorflow.python.keras.models import Model, load_model
from tensorflow.python.keras.preprocessing import image
from tensorflow.python.keras.utils import layer_utils
from tensorflow.python.keras.utils.data_utils import get_file
from tensorflow.python.keras.applications.imagenet_utils import preprocess_input
from tensorflow.python.keras.datasets import cifar10
import pydot
from IPython.display import SVG
from tensorflow.python.keras.utils.vis_utils import model_to_dot
from tensorflow.python.keras.utils import plot_model
from tensorflow.python.keras.initializers import glorot_uniform
import scipy.misc
from matplotlib.pyplot import imshow
# %matplotlib inline
import keras.backend as K
K.set_image_data_format('channels_last')
K.set_learning_phase(1)

def identity_block(X, f, filters,stage, block):
  conv_name_base = 'res' + str(stage) + block + '_branch'
  bn_name_base = 'bn' + str(stage) + block + '_branch'
  
  F1, F2, F3 = filters
  
  X_shortcut = X
  
  X = Conv2D(filters = F1, kernel_size = (1, 1), strides = (1, 1), padding = 'valid', name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
  X = Activation('relu')(X)
  
  X = Conv2D(filters = F2, kernel_size = (f, f), strides = (1, 1), padding = 'same', name = conv_name_base+'2b', kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
  X = Activation('relu')(X)
  
  X = Conv2D(filters = F3, kernel_size = (1, 1), strides = (1, 1), padding = 'valid', name = conv_name_base+'2c', kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base+'2c')(X)
  
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)
  
  return X

def convolutional_block(X, f, filters, stage, block, s = 2):
  conv_name_base = 'res' + str(stage) + block +'_branch'
  bn_name_base = 'bn' + str(stage) + block + '_branch'
  
  F1, F2, F3 = filters
  
  X_shortcut = X
  
  X = Conv2D(F1, (1, 1), strides = (s, s), name = conv_name_base + '2a', kernel_initializer = glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base + '2a')(X)
  X = Activation('relu')(X)
  
  X = Conv2D(F2, (f, f), strides = (1, 1), name = conv_name_base + '2b', padding = 'same', kernel_initializer = glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base + '2b')(X)
  X = Activation('relu')(X)
  
  X = Conv2D(F3, (1, 1), strides = (1, 1), name = conv_name_base + '2c', padding = 'valid', kernel_initializer = glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = bn_name_base + '2c')(X)
  
  X_shortcut = Conv2D(F3, (1,1), strides=(s, s), name=conv_name_base+'1', padding='valid', kernel_initializer = glorot_uniform(seed=0))(X_shortcut)
  X_shortcut = BatchNormalization(axis = 3, name=bn_name_base+'1')(X_shortcut)
  
  X = Add()([X, X_shortcut])
  X = Activation('relu')(X)
  
  return X

def ResNet50(input_shape = (32, 32, 3), classes = 10):
  X_input = Input(input_shape)
  
  X = ZeroPadding2D((3, 3))(X_input)
  
  X = Conv2D(64, (7, 7), strides=(2, 2), name = 'conv1', kernel_initializer=glorot_uniform(seed=0))(X)
  X = BatchNormalization(axis = 3, name = 'bn_conv1')(X)
  X = Activation('relu')(X)
  X = MaxPooling2D((3, 3), strides=(2,2))(X)
  
  X = convolutional_block(X, f=3, filters=[64, 64, 256], stage=2, block ='a', s=1)
  X = identity_block(X, 3, [64, 64, 256], stage = 2, block = 'b')
  X = identity_block(X, 3, [64, 64, 256], stage = 2, block = 'c')
  
  X = convolutional_block(X, f=3, filters=[128, 128, 512], stage=3, block='a', s=2)
  X = identity_block(X, f=3, filters=[128, 128, 512], stage = 3, block = 'b')
  X = identity_block(X, f=3, filters=[128, 128, 512], stage = 3, block = 'c')
  X = identity_block(X, f=3, filters=[128, 128, 512], stage = 3, block = 'd')
  
  X = convolutional_block(X, f=3, filters=[256, 256, 1024], stage=4, block='a', s=2)
  X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block='b')
  X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block='c')
  X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block='d')
  X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block='e')
  X = identity_block(X, f=3, filters=[256, 256, 1024], stage=4, block='f')

  X = convolutional_block(X, f=3, filters=[512, 512, 2048], stage=5, block='a', s=2)
  X = identity_block(X, f=3, filters=[512, 512, 2048], stage=5, block='b')
  X = identity_block(X, f=3, filters=[512, 512, 2048], stage=5, block='c')
  
  X = Flatten()(X)
  X = Dense(classes, activation='softmax', name='fc'+str(classes), kernel_initializer = glorot_uniform(seed=0))(X)
  
  model = Model(inputs = X_input, outputs = X, name='ResNet50')
  
  return model

def input_fn():
  X = np.random.random((1, 32, 32, 3))
  Y = np.random.random((1, 10))
  dataset = tf.data.Dataset.from_tensor_slices((X, Y))
  dataset = dataset.repeat(10)
  dataset = dataset.batch(128)
  return dataset

model = ResNet50(input_shape = (32, 32, 3), classes = 10)

model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

(X_train_orig, Y_train_orig), (X_test_orig, Y_test_orig) = cifar10.load_data()

X_train = X_train_orig/255.
X_test = X_test_orig/255.

Y_train =  keras.utils.to_categorical(Y_train_orig,10)
Y_test =  keras.utils.to_categorical(Y_test_orig,10)

print ("number of training examples = " + str(X_train.shape[0]))
print ("number of test examples = " + str(X_test.shape[0]))
print ("X_train shape: " + str(X_train.shape))
print ("Y_train shape: " + str(Y_train.shape))
print ("X_test shape: " + str(X_test.shape))
print ("Y_test shape: " + str(Y_test.shape))

keras_estimator = tf.keras.estimator.model_to_estimator(keras_model = model)

train_input_fn = tf.estimator.inputs.numpy_input_fn(
  x={'input_1': X_train},
  y=Y_train,
  num_epochs = 10,
  batch_size = 128,
  shuffle=True)

time_start = time.time()
keras_estimator.train(input_fn=train_input_fn, steps=390)
time_end = time.time()
print (time_end - time_start)

test_input_fn = tf.estimator.inputs.numpy_input_fn(
    x={model.input_names[0]: X_test.astype(np.float32)},
    y=Y_test.astype(np.float32),
    num_epochs=1,
    shuffle=True)

eva =keras_estimator .evaluate(input_fn=test_input_fn)
print(eva)

上述代码中,通过keras建立模型,identity_block和convolutional_block分别表示Resnet50中两个不同类型的残差网络基本单元,每个基本单元由三个卷积层构成。我使用的是Model函数生成模型,有兴趣的同学也可以尝试使用model = Sequential() 然后用add()函数将模型一层一层加上去,这里就不做具体介绍了。

        目前我搭建的网络是允许输入32*32*3图像的网络,主要是因为cifar10数据集是50000*32*32*3,表示5w张32*32*3的图像,我们需要做的是将5w张图片和5w个标签放入网络中去训练,以前在使用keras的时候,X_train 和 Y_train是分开写的,但是estimator的输入是数据集,也就是说要把这两项合到一起灌入,一般矩阵要合在一起需要有相同的维度,但是50000*32*32*3和50000*10显然难度比较大,因此我们需要调用estimator下的模块tf.estimator.inputs.numpy_input_fn()通过这个函数,就能将数据读入其中。

         有了数据以后要加载模型,keras有专门的函数将模型转换成estimator,但是,值得注意的是,必须要在头文件写成

import tenforflow as tf
import tf.keras as ……

如果直接写成impor keras,会报错,error ‘Model’, not in_graph 这样的一个类似的错误

模型加载完成就是训练,需要在数据集生成的时候制定bath_size和num_epoch这些含义就很清楚了,前者是每次读入的数据量,后者是这个数据集要被遍历多少次。

最终estimator不用手动保存模型,会自动将模型保存到相应文件夹中,可以指定路径,也可以不指定,系统会默认放在tmp的一个文件夹下,而steps则表示每个epoch一共选择了多少个batch_size大小的数据。训练过程中,会有loss值自动输出出来。

global_step/sec比较神奇的一个表示,开始不知道是啥意思,但后来查了半天,处理batch的速度,应该是每秒处理了多少个batch。

以上是我的尝试代码,在colab上能看到loss值下降的过程,此后会继续更新……

评论 3
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值