图片2分类卷积神经网络模型训练、分类预测案例全过程(1)
前言
(1)尽管目前有关卷积神经网络深度学习的相关材料较多,但深度学习牵涉到数据预处理、模型构建、模型调用等环节,我也是一个初学者,中间有很多问题都是查阅好多资料才得以解决,我在这里会尽量的详细写清楚关键环节,供和我一样的初学者参考。超过2分类的多分类操作流程基本一样。
(2)这里我不提供深度学习的最基础原理,都是从实际操作中出发,能够上手应用。
(3)主要内容包括:数据预处理(包括tfrecord格式数据生成与读取喂给网络)、模型搭建及相关参数设置、模型调用等主要内容。
(4)如下的这些代码是基于win10系统、rtx3090显卡的tensorflow-gpu环境,如果没有gpu环境,cpu也是可以的。python版本3.8。
(5)本人也是个非专业的初学者,肯定还有埋坑的地方,也请大神能够给予指点,也方便我改进提高。
一、目标:图片的2分类预测如何完成?
(1)目前有一堆图片,我们只需要图片分为两类,一类的标签为dis、另一次标签为undis,图片格式为jpg,每个图片命名规则为:dis_***.jpg或undis_***.jpg。通过图片名字的split可以知道某张图片属于哪个类别。
(2)数据分别放到2个文件夹里面,一个用于模型训练的文件夹名字为train_da、另一个用于训练好的模型测试的数据文件夹名字为test_da(也可以认为是实际需要分类需要的数据)。
二、主要内容
1.数据预处理
为了提高数据读写性能,我们将数据写为tfrecord格式数据,包括图片本身的tfrecord和对应标签的tfrecord。
下面的函数分别用于生成tfrecord数据和读取tfrecord数据:
# 样本数据tfrecord格式文件制作与读取
def preprocess_image(_image):
'''这个函数用于对jpg图片数据进行解码,包括简单的归一化处理
resize后的尺寸根据后面网络结构输入数据进行相同的调整'''
_image = tf.image.decode_jpeg(_image, channels=3)
_image = tf.image.resize(_image, [256, 256])
_image /= 255.0 # normalize to [0,1] range
return _image
def load_and_preprocess_image(_path):
'''这个函数用于加载jpg图片数据'''
_image = tf.io.read_file(_path)
_image = preprocess_image(_image)
return _image
def img_parse(x):
'''这个函数用于读取jpg图片tfrecord数据时,对图片数据进行解析'''
result = tf.io.parse_tensor(x, out_type=tf.float32)
result = tf.reshape(result, [256, 256, 3])
return result
def label_parse(y):
'''这个函数用于读取jpg图片对应的标签tfrecord数据时,对标签数据进行解析'''
result = tf.io.parse_tensor(y, out_type=tf.int32)
result = tf.reshape(result, [2])
return result
def tfrecord_data_create(_data_path, _img_tfrecord_abspath, _label_tfrecord_abspath):
'''制作训练样本tdrecord格式文件
_data_path为jpg格式图片存放绝对路径
_img_tfrecord_abspath图片数据tfrecord保存绝对路径
_label_tfrecord_abspath标签数据tfrecord保存绝对路径'''
all_imgs_path = get_all_file_abspath_with_specific_format_under_one_folder(_data_path, '*.jpg')
random.shuffle(all_imgs_path)
label2index_dict = {'dis': 0, 'undis': 1}
all_labels = [label2index_dict.get(os.path.basename(a_img).split('_')[0])for a_img in all_imgs_path]
all_labels = tf.one_hot(indices=all_labels, depth=2, on_value=1, off_value=0, axis=-1)
ds_image = tf.data.Dataset.from_tensor_slices(all_imgs_path)
ds_image = ds_image.map(load_and_preprocess_image)
ds_image = ds_image.map(tf.io.serialize_tensor)
ds_image_record = tf.data.experimental.TFRecordWriter(_img_tfrecord_abspath)
ds_image_record.write(ds_image)
ds_label = tf.data.Dataset.from_tensor_slices(tf.cast(all_labels, tf.int32))
ds_label = ds_label.map(tf.io.serialize_tensor)
ds_label_record = tf.data.experimental.TFRecordWriter(_label_tfrecord_abspath)
ds_label_record.write(ds_label)
def tfrecord_data_read_decode(_img_tfrecord_abspath, _label_tfrecord_abspath, _ratio=0.2):
'''读取和解码训练样本tfrecord格式文件
_img_tfrecord_abspath图片数据tfrecord保存绝对路径
_label_tfrecord_abspath标签数据tfrecord保存绝对路径
_ratio训练模型验证数据集占比,默认值为20%'''
img_ds = tf.data.TFRecordDataset(_img_tfrecord_abspath)
_tem = 0
img_length = len([_tem for tem in img_ds])
AUTOTUNE = tf.data.experimental.AUTOTUNE
img_ds = img_ds.map(img_parse, num_parallel_calls=AUTOTUNE)
label_ds = tf.data.TFRecordDataset(_label_tfrecord_abspath)
_tem = 0
label_length = len([_tem for tem in label_ds])
label_ds = label_ds.map(label_parse, num_parallel_calls=AUTOTUNE)
if img_length != label_length:
print('......严重错误:图片数据和标签数据tfrecord格式文件长度不一致!')
else:
img_label_ds = tf.data.Dataset.zip((img_ds, label_ds))
# 模型训练数据和模型测试数据划分,保证两部分数据没有重叠
verify_ds_num = int(img_length*_ratio)
train_ds_num = img_length - verify_ds_num
print('样本总数据{}个, 训练数据有{}个, 验证数据有{}个.'.format(img_length, train_ds_num, verify_ds_num))
train_ds = img_label_ds.skip(verify_ds_num)
verify_ds = img_label_ds.take(verify_ds_num)
# 获取dis样本数量,这个是我的项目需要的内容,可以根据实际情况进行删减
dis_num = 0
undis_num = 0
for label in label_ds.take(img_length):
dis_value = label.numpy()[0]
if dis_value == 1:
dis_num += 1
else:
undis_num += 1
return train_ds, train_ds_num, verify_ds, verify_ds_num, dis_num, undis_num
# 为什么返回值的是train_ds, train_ds_num, verify_ds, verify_ds_num, dis_num, undis_num,是为了和后面模型训练直接对接,方便喂给神经网络
# 上述6个参数代表的含义分别是:第一个参数train_ds:模型训练数据
# 第二个参数train_ds_num:模型训练数据个数
# 第三个参数verify_ds:模型验证数据
# 第四个参数verify_ds_num:模型验证数据个数
# 第五个、第六个参数 dis_num和undis_num:分别是两个类别样本的个数(包括训练和验证数据)
# 可能有人问为什么不讲jpg图片数据和标签数据写入到一个tfrecord中,不更方便,我也知道这样方便,想到了但是没做到,请高人指点。
2.模型构建、数据喂入、模型训练与保存
(1)模型构建、数据喂入、模型训练与保存我写入到一个函数中了,方便调用。
(2)这里有两个函数,一个是模型第一次训练(model_train_save),第二个是基于前面模型训练的间断点数据进行模型再训练(model_retrain_save)。
def model_train_save(train_result_save_folder, checkpoint_file_folder, tensorboard_file_folder, save_file_name, train_data, train_data_number, verify_data, verify_data_number, dis_num, undis_num, private_optimizers='adam', batch_size=16, lr=5, loops=5, isDraw=1):
'''这个是模型第一次训练的函数
train_result_save_folder这个是一个文件夹的绝对路径,用于保存模型训练结果,包括①训练好的模型(h5)②模型训练、验证精度和损失曲线图片③模型训练的关键参数和精度、损失值。如第二个图片所示。
checkpoint_file_folder这个是一个文件夹的绝对路径,用于保存checkpoint文件,格式为.cp,这个文件可以保存模型训练过程的最优结果
tensorboard_file_folder这个是一个文件夹的绝对路径,用于保存模型训练的tensorboard相关文件,用于后期展示、查看等
save_file_name所有保存文件的统一名字,包括train_result_save_folder里面三个文件的名字、checkpoint_file_folder里面cp文件的名字,tensorboard_file_folder里面tensorboard相关文件的名字,大家的名字都一样,只是后后缀名不同。
train_data下面6个参数就不重复说明了,可回看“数据预处理”最后面,有详细说明
train_data_number
verify_data
verify_data_number
dis_num
undis_num
private_optimizers='adam'这个是优化器,默认值是adam,这个也比较通用
batch_size=16批处理大小
lr=5学习速率初始值,这里的5表示10的负5次方
loops=5模型训练次数
isDraw=1模型训练、验证过程精度、损失曲线是否绘图,默认是绘图
'''
# 数据分批次读取
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_data = train_data.cache(r'D:\Train_Cache_Temporary_Data\Train_Cache_Temporary_Data')
# cache用于加快数据取用速度
train_data = train_data.shuffle(buffer_size=train_data_number)
# 把训练数据乱序
train_data = train_data.repeat()
# 数据重复,参数为空表示重复无数遍,不会存在读取不到数据batch的情况
train_data = train_data.batch(batch_size).prefetch(buffer_size=AUTOTUNE)
# 数据分批次读取和预读取
train_data = train_data.prefetch(buffer_size=AUTOTUNE)
verify_data = verify_data.batch(batch_size)
# 上面两句话是验证数据的读取,不需要乱序、重复等操作
# 下面是建立模型,包括迁移模型和自己搭建网络,这里迁移学习以resnet50为例,其他模型与此相同
resnet_base_structure = tf.keras.applications.resnet50.ResNet50(input_shape=(256, 256, 3), weights='imagenet', include_top=False)
# include_top=False表示不需要resnet50的全连接和分类层,根据自己的需要构建后面的全连接和分类层
resnet_base_structure.trainable = True
# 如果是迁移学习,模型构建如下:
transf_model = tf.keras.Sequential([
resnet_base_structure
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(2, 'softmax')])
# 如果是自己搭建网络,模型构建如下。至于卷积层个数、卷积核大小、步长、填充方式、激活函数,我都使用了最常用的参数,具体可以根据实验结果进行优化调整
private_model = tf.keras.Sequential([
tf.keras.layers.Conv2D(filters=64, kernel_size=3, strides=1, padding='same', input_shape=(256, 256, 3), activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=64, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(filters=128, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=128, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(filters=256, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=256, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
# tf.keras.layers.Conv2D(filters=256, kernel_size=3, strides=1, padding='same', activation='relu',),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
# tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
tf.keras.layers.BatchNormalization(),
# tf.keras.layers.Conv2D(filters=512, kernel_size=3, strides=1, padding='same', activation='relu',),
# tf.keras.layers.BatchNormalization(),
tf.keras.layers.MaxPooling2D(),
tf.keras.layers.GlobalAveragePooling2D(),
tf.keras.layers.Dense(1024, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(512, activation='relu'),
tf.keras.layers.Dropout(0.5),
tf.keras.layers.Dense(2, 'softmax')])
# 计算初始学习速率
lr_value = float(math.pow(10, -1*lr))
# 选择确定优化器函数
if private_optimizers == 'sgd':
private_optimizers_result = tf.keras.optimizers.SGD(learning_rate=lr_value, momentum=0.9)
elif private_optimizers == 'adagrad':
private_optimizers_result = tf.keras.optimizers.Adagrad(learning_rate=lr_value)
elif private_optimizers == 'adadelta':
private_optimizers_result = tf.keras.optimizers.Adadelta(learning_rate=lr_value)
elif private_optimizers == 'rmsprop':
private_optimizers_result = tf.keras.optimizers.RMSprop(learning_rate=lr_value)
elif private_optimizers == 'adam':
private_optimizers_result = tf.keras.optimizers.Adam(learning_rate=lr_value)
elif private_optimizers == 'adamax':
private_optimizers_result = tf.keras.optimizers.Adamax(learning_rate=lr_value)
print('______学习速率为{},优化器函数为{}.'.format(lr_value, private_optimizers))
# 模型编译,如果是迁移学习,就是如下。如果是自己搭建的网络结构,就使用private_model.compile
transf_model.compile(optimizer=private_optimizers_result, loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['acc'])
# 计算模型训练、验证steps
steps_train = train_data_number//batch_size
steps_verify = verify_data_number//batch_size
# 为模型保存的所有文件生成文件名,这里我将优化器、学习速率、批大小都保存到文件名中了,方便查看,一目了然
save_name = 'Trained_model_' + save_file_name + '_Op=' + private_optimizers
save_name = save_name + "_Lr=" + str(lr) + '_Bs=' + str(batch_size)
# 设置cp文件回调函数
checkpoint_file_name = save_name + '.ckpt'
checkpoint_file_abspath = os.path.join(checkpoint_file_folder, checkpoint_file_name)
cp_callback_function = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_file_abspath, monitor='val_loss', save_best_only=True, save_weights_only=False)
# 设置tensorboard回调函数
tensorboard_file_abspath = os.path.join(tensorboard_file_folder, save_name)
tensorboard_callback_function = tf.keras.callbacks.TensorBoard(log_dir=tensorboard_file_abspath, histogram_freq=1)
# 设置学习速率递减回调函数set the Learning rate decay callback function
lr_decay_ratio = U_lr_decay_ratio
lr_reduce_callback_cunction = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=lr_decay_ratio, patience=U_learn_rate_reduce_patience_epochs, mode='auto', verbose=1)
# 设置早停回调函数
early_stopping_function = tf.keras.callbacks.EarlyStopping(monitor='val_loss', min_delta=0.00001, patience=10, verbose=1, mode='auto', baseline=None, restore_best_weights=False)
# 下面开始模型训练,还是以迁移学习模型为例
if dis_num <= undis_num:
# 这里的weights是调整2个样本数据不平衡使用的,如果两种标签的样本数量差距很大对训练结果不利影响很大,通过weights参数可以减少这种不利影响
weights = {0: round(undis_num/dis_num), 1: 1}
else:
weights = {0: 1, 1: round(dis_num/undis_num)}
start_time = time.time()
history = transf_model.fit(train_data, epochs=loops, steps_per_epoch=steps_train,
validation_data=verify_data, validation_steps=steps_verify,
class_weight=weights,
callbacks=[cp_callback_function, tensorboard_callback_function, lr_reduce_callback_cunction, early_stopping_function])
print('本程序训练过程耗时{}s.'.format(round((time.time() - start_time), 2)))
save_path = os.path.join(train_result_save_folder, (save_name + '.png'))
if isDraw == 1:
draw_history(history.epoch,
history.history.get('acc'),
history.history.get('loss'),
history.history['val_acc'],
history.history['val_loss'],
savepath=save_path)
# 计算训练后模型损失和精度评估值
evaluation_loss, evaluation_accuracy = model.evaluate(verify_data, verbose=1)
# save_model_parameter是用于方便折叠这一段程序
save_model_parameter = True
if save_model_parameter:
# 模型保存
model_save_path = os.path.join(train_result_save_folder, (save_name + '.h5'))
model.save(model_save_path)
# 将模型训练结果参数保存到excel
xls_file_path = os.path.join(train_result_save_folder, (save_name + '.xls'))
_file = xlwt.Workbook(encoding='utf-8')
_sheet = _file.add_sheet('TrainedResult', cell_overwrite_ok=True)
_sheet.write(0, 0, 'Train Data Number')
_sheet.write(0, 1, train_data_number)
_sheet.write(0, 2, 'Verfy Data Number')
_sheet.write(0, 3, verify_data_number)
_sheet.write(1, 0, 'Epochs')
_sheet.write(1, 1, loops)
_sheet.write(1, 2, 'Shuffle Number')
_sheet.write(1, 3, train_data_number)
_sheet.write(2, 0, 'Batch Size')
_sheet.write(2, 1, batch_size)
_sheet.write(3, 0, 'Initial learn Rate')
_sheet.write(3, 1, lr_value)
_sheet.write(3, 2, 'Learning rate decay ratio')
_sheet.write(3, 3, lr_decay_ratio)
_sheet.write(4, 0, 'Model Eval_loss')
_sheet.write(4, 1, evaluation_loss)
_sheet.write(4, 2, 'Modle Eval_accuracy')
_sheet.write(4, 3, evaluation_accuracy)
_sheet.write(5, 0, 'Epoch')
_sheet.write(5, 1, 'Train Accuracy')
_sheet.write(5, 2, 'Train Loss')
_sheet.write(5, 3, 'Verify Accuracy')
_sheet.write(5, 4, 'Verify Loss')
_sheet.write(5, 5, 'Learn rate')
for i_xls in range(loops):
_sheet.write(i_xls + 6, 0, i_xls + 1)
_sheet.write(i_xls + 6, 1, history.history.get('acc')[i_xls])
_sheet.write(i_xls + 6, 2, history.history['loss'][i_xls])
_sheet.write(i_xls + 6, 3, history.history.get('val_acc')[i_xls])
_sheet.write(i_xls + 6, 4, history.history['val_loss'][i_xls])
_sheet.write(i_xls + 6, 5, float(history.history.get('lr')[i_xls]))
_file.save(xls_file_path)
return evaluation_loss, evaluation_accuracy
# 至此模型搭建、数据喂入、模型训练已全部完成
# 上面还用到1个绘图函数draw_history,该函数具体如下:
def draw_history(epochs, train_acc, train_loss, validation_acc, validation_loss, savepath=None):
'''模型训练过程accuracy、loss变化绘图'''
plt.figure(figsize=(12, 4))
plt.subplot(1, 2, 1)
plt.plot(epochs, train_acc, label='Train process')
plt.plot(epochs, validation_acc, label='Validation process')
plt.title('Train History of Accuracy for train and validation process')
plt.ylabel('Accuracy')
plt.xlabel('Epoch')
plt.legend()
plt.subplot(1, 2, 2)
plt.plot(epochs, train_loss, label='Train process')
plt.plot(epochs, validation_loss, label='Validation process')
plt.title('Train History of Loss for train and validation process')
plt.ylabel('Loss')
plt.xlabel('Epoch')
plt.legend()
# plt.show()
if savepath:
plt.savefig(savepath)
第二个图片在这里:
excel打开后是这样的:
png图片是这样的:
如果是基于前面模型训练的间断点数据进行模型再训练,模型搭建、数据未如、模型训练完整函数具体如下:
def model_retrain_save(train_result_save_folder, trained_model_path,
checkpoint_file_folder, tensorboard_file_folder, retrained_save_file_name,
train_data, train_data_number, verify_data, verify_data_number, dis_num, undis_num,
private_optimizers='adam', batch_size=16,lr=5, loops=5, isDraw=1):
'''
所有参数与model_retrain_save函数中的参数意义完全一样
只有trained_model_path,这参数是上一次模型训练后保存的cp文件绝对路径
'''
# 数据分批次读取
train_data = train_data.repeat().shuffle(shuffle_number).batch(batch_size)
AUTOTUNE = tf.data.experimental.AUTOTUNE
train_data = train_data.cache()
train_data = train_data.prefetch(buffer_size=AUTOTUNE)
verify_data = verify_data.batch(batch_size)
# 加载模型
trained_model = tf.keras.models.load_model(trained_model_path)
steps_train = train_data_number//batch_size
steps_verify = verify_data_number//batch_size
lr_value = float(math.pow(10, -1*lr))
if private_optimizers == 'sgd':
private_optimizers_result = tf.keras.optimizers.SGD(learning_rate=lr_value, momentum=0.9)
elif private_optimizers == 'adagrad':
private_optimizers_result = tf.keras.optimizers.Adagrad(learning_rate=lr_value)
elif private_optimizers == 'adadelta':
private_optimizers_result = tf.keras.optimizers.Adadelta(learning_rate=lr_value)
elif private_optimizers == 'rmsprop':
private_optimizers_result = tf.keras.optimizers.RMSprop(learning_rate=lr_value)
elif private_optimizers == 'adam':
private_optimizers_result = tf.keras.optimizers.Adam(learning_rate=lr_value)
elif private_optimizers == 'adamax':
private_optimizers_result = tf.keras.optimizers.Adamax(learning_rate=lr_value)
trained_model.compile(optimizer=private_optimizers_result, loss=tf.keras.losses.CategoricalCrossentropy(), metrics=['acc'])
# set the name for saved files
save_name = 'Retrained_model_' + retrained_save_file_name + '_Op=' + private_optimizers
save_name = save_name + "_Lr=" + str(lr) + '_Bs=' + str(batch_size)
# set the checkpoint parameters
checkpoint_file_name = save_name + '.ckpt'
checkpoint_file_abspath = os.path.join(checkpoint_file_folder, checkpoint_file_name)
cp_callback_function = tf.keras.callbacks.ModelCheckpoint(filepath=checkpoint_file_abspath, monitor='val_loss',
save_best_only=True, save_weights_only=False)
# set the tensorboard parameter
tensorboard_file_abspath = os.path.join(tensorboard_file_folder, save_name)
tensorboard_callback_function = tf.keras.callbacks.TensorBoard(log_dir=tensorboard_file_abspath, histogram_freq=1)
# set the Learning rate decay callback function
lr_decay_ratio = U_lr_decay_ratio
lr_reduce_callback_cunction = tf.keras.callbacks.ReduceLROnPlateau(monitor='val_loss', factor=lr_decay_ratio, patience=U_learn_rate_reduce_patience_epochs, mode='auto', verbose=1)
if dis_num <= undis_num:
weights = {0: round(undis_num/dis_num), 1: 1}
else:
weights = {0: 1, 1: round(dis_num/undis_num)}
start_time = time.time()
history = trained_model.fit(train_data, epochs=loops, steps_per_epoch=steps_train, validation_data=verify_data, validation_steps=steps_verify,
class_weight=weights, callbacks=[cp_callback_function, tensorboard_callback_function, lr_reduce_callback_cunction])
print('本程序训练过程耗时{}s.'.format(round((time.time() - start_time), 2)))
save_path = os.path.join(train_result_save_folder, (save_name + '.png'))
if isDraw == 1:
draw_history(history.epoch, history.history.get('acc'), history.history.get('loss'), history.history['val_acc'], history.history['val_loss'], savepath=save_path)
evaluation_loss, evaluation_accuracy = trained_model.evaluate(verify_data, verbose=1)
# 方便折叠这一段程序
save_model_parameter = True
if save_model_parameter:
# 模型保存
model_save_path = os.path.join(train_result_save_folder, (save_name + '.h5'))
trained_model.save(model_save_path)
# 将模型训练结果参数保存到excel
xls_file_path = os.path.join(train_result_save_folder, (save_name + '.xls'))
_file = xlwt.Workbook(encoding='utf-8')
_sheet = _file.add_sheet('TrainedResult', cell_overwrite_ok=True)
_sheet.write(0, 0, 'Train Data Number')
_sheet.write(0, 1, train_data_number)
_sheet.write(0, 2, 'Verfy Data Number')
_sheet.write(0, 3, verify_data_number)
_sheet.write(1, 0, 'Epochs')
_sheet.write(1, 1, loops)
_sheet.write(1, 2, 'Shuffle Number')
_sheet.write(1, 3, shuffle_number)
_sheet.write(2, 0, 'Batch Size')
_sheet.write(2, 1, batch_size)
_sheet.write(3, 0, 'Initial learn Rate')
_sheet.write(3, 1, lr_value)
_sheet.write(3, 2, 'Learning rate decay ratio')
_sheet.write(3, 3, lr_decay_ratio)
_sheet.write(4, 0, 'Model Eval_loss')
_sheet.write(4, 1, evaluation_loss)
_sheet.write(4, 2, 'Modle Eval_accuracy')
_sheet.write(4, 3, evaluation_accuracy)
_sheet.write(5, 0, 'Epoch')
_sheet.write(5, 1, 'Train Accuracy')
_sheet.write(5, 2, 'Train Loss')
_sheet.write(5, 3, 'Verify Accuracy')
_sheet.write(5, 4, 'Verify Loss')
_sheet.write(5, 5, 'Learn rate')
for i_xls in range(loops):
_sheet.write(i_xls + 6, 0, i_xls + 1)
_sheet.write(i_xls + 6, 1, history.history.get('acc')[i_xls])
_sheet.write(i_xls + 6, 2, history.history['loss'][i_xls])
_sheet.write(i_xls + 6, 3, history.history.get('val_acc')[i_xls])
_sheet.write(i_xls + 6, 4, history.history['val_loss'][i_xls])
_sheet.write(i_xls + 6, 5, float(history.history.get('lr')[i_xls]))
_file.save(xls_file_path)
return evaluation_loss, evaluation_accuracy
该处使用的url网络请求的数据。
3.上述函数的实际操作使用
假设现在模型训练所有的数据都保存在traind_ds文件夹中,我们(1)制作tfrecord数据(2)读取tdrecord数据(3)将读取到的tdrecord数据喂给网络,并进行模型训练。
traind_ds = r'' #训练数据文件夹绝对路径
(1)制作tfrecord数据
# 假设生成的jpg图片数据及其对应标签数据tfrecord数据绝对路径分别为:img_tfrecord_abspath, label_tfrecord_abspath
tfrecord_data_create(traind_ds, img_tfrecord_abspath, label_tfrecord_abspath)
(2)读取tdrecord数据
train_ds, train_ds_num, verify_ds, verify_ds_num, dis_num, undis_num = tfrecord_data_read_decode(img_tfrecord_abspath, label_tfrecord_abspath, ratio=0.2)
# 这里的ratio=0.2是指训练样本占样本总数的80%,验证样本占比20%
(3)数据喂给网络,并进行网络训练
train_result_save_folder = r'' #模型训练结果保存文件夹绝对路径
checkpoint_file_folder = r'' #模型cp文件保存绝对路径
tensorboard_file_folder = r'' #模型tensorboard文件保存绝对路径
save_file_name = '模型第一次训练'
model_train_save(train_result_save_folder, checkpoint_file_folder, tensorboard_file_folder, save_file_name,
train_data, train_data_number, verify_data, verify_data_number, dis_num, undis_num,
private_optimizers='adam', batch_size=16, lr=5, loops=20, isDraw=1)
# 假设这一次模型训练后cp文件保存的绝对路径为:checkpoint_file_abspath = r'E:\.....\*****.ckpt'
# 其中*****.ckpt为cp文件名
# 如果我们想基于这个最优的结果对模型进行再训练,调用函数如下:
train_result_save_folder = r'' #模型训练结果保存文件夹绝对路径
trained_model_path = r'E:\.....\*****.ckpt'
checkpoint_file_folder = r'' #模型cp文件保存绝对路径
tensorboard_file_folder = r'' #模型tensorboard文件保存绝对路径
save_file_name = '模型第一次再训练'
model_retrain_save(train_result_save_folder, trained_model_path,
checkpoint_file_folder, tensorboard_file_folder, retrained_save_file_name,
train_data, train_data_number, verify_data, verify_data_number, dis_num, undis_num,
private_optimizers='adam', batch_size=16, lr=5, loops=15, isDraw=1):
总结
(1)函数较长,可能出现个别错误,如果运行报错,要具体分析看下。
(2)本来想把模型使用、分类预测也写进来,实在太长了,这个内容放到下一篇中。
(3)本人也是非专业初学者,还请路过的高人指点改进。