前段时间就开始学习Tensorflow了。虽然大致看了两本书,也没怎么上手实践过,但是学习期间就觉得Tensorflow有点乱,这两天试着写了个网络,主要目的就是给自己弄个初级的模板出来,以后有什么新的想法或者任务,就在这个模板的基础上添添改改就行,不必要每次都重新写很多东西。
为了以后的方便我目前主要想实现以下三个功能:
1. 使用tf.slim来构建网络的结构,因为看书上还有很多老一点的代码都还是一步步地做的,定义一个卷积层至少需要四五行代码,用slim的话一行就行,并且我看到一些新的网络的实现都是使用的slim。
2. 加上tensorboard 的功能,能够可视化的话,对理解网络有很大帮助,并且也很方便训练(不得不说tensorboard可是让我吃了个大亏)
3. 加上保存的功能,也就是持久化,持久化的部分我看书的时候就没怎么看懂,后来么,,,就不管原理了,能用就行
最后经过一番调试,算是完成了上述几个“简单”的目标,但是还是在参考别人代码的情况下完成的,,,我只想说Tensorflow确实对新手不太友好,也可能是我学习路径不太对。
tf.slim模块
slim模块是对tensorflow底层代码的一个高级封装,可以大大简化构造网络的代码量,比如实现一个卷积层,传统的tensorflow层可能需要
input = ...
with tf.name_scope('conv1_1') as scope:
kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
stddev=1e-1), name='weights')
conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')
biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
trainable=True, name='biases')
bias = tf.nn.bias_add(conv, biases)
conv1 = tf.nn.relu(bias, name=scope)
但是用slim的话
input = ...
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')
conv2d的变量作用分别是
input:代表输入的张量
128:代表生成的张量的depth,其实就是在该层使用多少卷积核,也就是能够生成多少feature map
scope: 用于命名
还有一个很好用的功能是
with slim.arg_scope([slim.conv2d], padding='SAME',
weights_initializer=tf.truncated_normal_initializer(stddev=0.01)
weights_regularizer=slim.l2_regularizer(0.0005)):
使用 slim.arg_scope时,对于右边中括号中的所有函数,后面的参数设置都作为他们的默认参数
所以我的网络是
def model2(inputs):
with slim.arg_scope([layers.conv2d, slim.fully_connected],
activation_fn = tf.nn.relu,
weights_initializer = tf.truncated_normal_initializer(0.0, 0.001),
reuse = tf.AUTO_REUSE,
normalizer_fn = slim.batch_norm):
with slim.arg_scope([slim.conv2d, layers_lib.max_pool2d], padding = 'SAME'):
net = layers.conv2d(inputs, 32, [1, 1], stride = 1, scope = 'conv2d_1_1x1')
net = layers.conv2d(net, 64, [3, 3], stride = 2 , scope = 'conv2d_2_3x3')
net = layers.max_pool2d(net, [3,3], stride = 2, scope = 'max_pool2d_3_3x3')
net = layers.conv2d(net, 128, [1, 1], stride = 1, scope = 'conv2d_4_1x1')
net = layers.conv2d(net, 256, [3, 3], stride = 2 , scope = 'conv2d_5_3x3')
net = layers.conv2d(net, 256, [3, 3], stride = 1 , scope = 'conv2d_5a_3x3')
net = layers.max_pool2d(net, [3,3], stride = 2, scope = 'max_pool2d_6_3x3')
net = layers.conv2d(net, 512, [3, 3], stride = 2 , scope = 'conv2d_7_3x3')
# net = tf.reshape(net, [batch_size, -1])
# net = layers.conv2d(net, 1, [1, 1], stride = 2, scope = 'conv2d_7_1x1')
net = slim.fully_connected(net, 256, scope = 'fc_7')
net = slim.fully_connected(net, 10, scope = 'fc_8')
return net
结构上是比较简单的毕竟只是第一次写,第一次训练,就用个简单的网络就好。输入inputs是一个张量,输出net也是,大小应该是[batch_size, 1, 1, 10],,这样就把网络看做一个函数,很多网络都是这么实现的,看起来很简洁,而不是把各种结构的声明定义全部放在一起。
这里要说明两点,
1) 我在刚开始运行的时候总是被提醒某个变量要么已经存在了,要么没有声明,我也不知道原因,只是根据提示添加了reuse = tf.AUTO_REUSE这个参数,还有在代码中添加了
tf.reset_default_graph()
这一行,具体原因我也不是很清楚,或许路过的哪位大神可以告诉我
2) 刚开始训练的时候没过多久,损失loss就完全不动了,按理说至少会有个波动,我就猜到是不是发生了梯度消失,所以就在代码中加了一个参数
normalizer_fn = slim.batch_norm
只是加了这么一句话,使用了BN,还都是默认参数,但是完全可以训练了,而且速度还挺快。
我用的是cifar-10的数据集
上图是没有BN的情况下,可以看到,才几百步(一步是一个batch),loss就完全不变了,我猜是因为梯度消失。
上图是加了BN的情况,可以看到将近5个小时的训练,将近10k步,accuracy 已经有92%了。可见BN确实是一个很牛逼的算法。
只是开始设置的时候训练的epoch数相对于我的渣电脑有点大,所以没有训练完就停了,没有测试在测试集上的准确度,不过无所谓,反正我只是想得到一个模板。
tensorboard 可视化
我一开始知道tensorflow有可视化的功能的时候就很想了解,对任何学科,任何问题,能够可视化都是可以加强自己的理解也帮助别人理解的好方法,但是我一开始也走了些弯路。
首先要想可视化需要在代码中添加一些代码,指出想要可视化哪些量,然而开始的时候我还以为可以直接可视化的,谁知道还要添加代码。
然后就是盘符的问题,我按照教程启动tensorboard之后总显示没有活动的事件(好像是这么说的,反正就是空白,没有数据),找了很多方法,反正又是Windows的问题,默认的是反斜杠,在写地址的时候最好写绝对地址,Windows下输入地址要用斜杠,不要用反斜杠
tensorboard --logdir =F:/TensorFlow_exp/mnist/log1
都是小问题,但是解决不了的话就很烦。
我们想要使用tensorboard的时候,首先需要进行summary operation,一般主要用的就是tf.summary.scalar()就是相当于将某个标量记录下来,用于在tensorboard中展示,用法如下
tf.summary.scalar('accuracy', accuracy)
# 或者
tf.summary.scalar("cost_function", cost)
就可以将最常用的accuracy 和loss 记录下来,另外还有tf.summary.histgram()用来记录数据的分布等,具体可以参见tensorboard的文档
我们可以定义很多这种summary操作,之后可以把他们merge一下,再定一个FileWriter就可以往一个event文件中添加数据了。tensorboard 是读取这个文件来进行可视化的。和定义网络结构的过程类似,以上几步也只是定义了一个步骤,在训练的时候还要往event文件中不断添加数据。这时候就要用到add_summary()这个方法来向event文件正式写入数据的,这一步是要在训练的时候进行的。总体的框架基本如下
# 定义损失,并将其加入到scalar中,之后loss就会在tensorboard面板中的scalar中出现
with tf.name_scope('cross_entropy'):
total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_, labels=y))
tf.summary.scalar("loss", total_loss)
# 定义accuracy,也将其加入summary
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
correct_pred = tf.equal(tf.argmax(y_, 1), tf.argmax(y, 1))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.summary.scalar('accuracy', accuracy)
# 将所有的summary 合并在一起,方便处理
merged_summary_op = tf.summary.merge_all()
# 定义两个FileWriter,分别储存训练数据和测试数据,这一步已经在 with tf.Session() as sess 里面了
train_writer = tf.summary.FileWriter(log_dir + '/train', sess.graph)
test_writer = tf.summary.FileWriter(log_dir + '/test')
# 对每个batch的处理,这里面每80个batch在测试集上测试,并将summary写入test_writer,注意这里面测试集和训练集的数据获取没有写
#每100个batch保存模型,以及一些运行相关的metadata,实测这个保存的操作挺浪费时间,所以才要间隔一定的batch
# 其他的正常步数就正常运行训练
if batch % 80 ==0:
summary, acc = sess.run([merged_summary_op, accuracy], feed_dict={x:X_test, y:y_test })
test_writer.add_summary(summary, i * total_batch + batch)
print('Accuracy at step %s: %s' % (i * total_batch + batch, acc))
else :
if batch % 100 == 0:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
_, summary = sess.run([optimizer, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
train_writer.add_run_metadata(run_metadata, 'step%d' % (i * total_batch + batch))
train_writer.add_summary(summary, i * total_batch + batch)
saver.save(sess, 'F:/TensorFlow_exp/cifar10/model.ckpt', i * total_batch + batch)
else:
_, summary = sess.run([optimizer, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
train_writer.add_summary(summary, i * total_batch + batch)
有上面这个框架应该够了,开始的时候我没有管test的准确率,后来有看到别人的代码感觉挺好的,就又借鉴了一下,这样在tensorboard上可以同时看到训练和测试的accuracy,很方便。再重复一遍
保存(持久化)
我们的模型训练完了是要保存下来用于inference的,或者有时候不方便,训练一半需要停下来,如果不进行持久化的话,在内存里面的好不容易训练的模型就没了。还有一个很重要的点就是我们很多时候没必要自己去训练网络,拿在ImageNet 上训练过的fine tune一下就好。但是说实话这种finetune我还不太清楚,回头搞清楚了可以写下来。所以持久化是很重要的,这里先写一下保存为ckpt格式的简单的保存。
其实很简单,其实主要就两句话
saver = tf.train.Saver()
# 声明一个saver(),然后用他就可以进行持久化
#然后在一个session中就可以
saver.save(sess, 'F:/TensorFlow_exp/cifar10_reg/model.ckpt')
保存后的文件是有四个,可以用于之后用的时候的恢复,但是对于模型的回复我还不是非常清楚,搞清楚了的话,我在记录一下
以上是主要的内容,比较简略,毕竟我也不是非常熟悉Tensorflow,之后再继续学习。不过这里要说明一下,上面的网络我加了测试集之后发现我的网络有很严重的过拟合现象
可以看到,训练集上面的准确度(橙色)在6k步之后就超过了0.9,但是在测试集上面的准确度却始终没有超过0.7,还有缓慢下降的趋势,可以说是很典型的过拟合了,在网络参数里面加入了正则化也没有用,总之目前原因还不太清楚,以后能够找出来的话再更新吧。
下面附上代码供参考
# from tensorflow.contrib.framework.python.ops import arg_scope
from tensorflow.contrib.layers.python.layers import layers as layers_lib
from tensorflow.contrib.layers.python.layers import regularizers
from tensorflow.examples.tutorials.mnist import input_data
import tensorflow.contrib.slim as slim
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
import _pickle as pickle
import seaborn
from tensorflow.contrib.framework.python.ops import arg_scope
from tensorflow.contrib.layers.python.layers import layers
from tensorflow.contrib.layers.python.layers import regularizers
from tensorflow.contrib.slim.nets import resnet_v2
from tensorflow.python.layers.core import Flatten
tf.reset_default_graph()
def model(inputs):
with slim.arg_scope([slim.conv2d, slim.fully_connected],
activation_fn=tf.nn.relu,
weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
weights_regularizer=slim.l2_regularizer(0.0005)):
net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
net = slim.max_pool2d(net, [2, 2], scope='pool1')
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
net = slim.max_pool2d(net, [2, 2], scope='pool2')
return net
def model2(inputs):
with slim.arg_scope([layers.conv2d, slim.fully_connected],
activation_fn = tf.nn.relu,
weights_initializer = tf.truncated_normal_initializer(0.0, 0.001),
reuse = tf.AUTO_REUSE,
normalizer_fn = slim.batch_norm,
weights_regularizer=slim.l2_regularizer(0.0005)):
with slim.arg_scope([slim.conv2d, layers_lib.max_pool2d], padding = 'SAME'):
net = layers.conv2d(inputs, 32, [1, 1], stride = 1, scope = 'conv2d_1_1x1')
net = layers.conv2d(net, 64, [3, 3], stride = 2 , scope = 'conv2d_2_3x3')
net = layers.max_pool2d(net, [3,3], stride = 2, scope = 'max_pool2d_3_3x3')
net = layers.conv2d(net, 128, [1, 1], stride = 1, scope = 'conv2d_4_1x1')
net = layers.conv2d(net, 256, [3, 3], stride = 2 , scope = 'conv2d_5_3x3')
net = layers.conv2d(net, 256, [3, 3], stride = 1 , scope = 'conv2d_5a_3x3')
net = layers.max_pool2d(net, [3,3], stride = 2, scope = 'max_pool2d_6_3x3')
net = layers.conv2d(net, 512, [3, 3], stride = 2 , scope = 'conv2d_7_3x3')
# net = tf.reshape(net, [batch_size, -1])
# net = layers.conv2d(net, 1, [1, 1], stride = 2, scope = 'conv2d_7_1x1')
net = slim.fully_connected(net, 256, scope = 'fc_7')
net = slim.fully_connected(net, 10, scope = 'fc_8')
return net
# g = tf.Graph()
learning_rate = 1e-3
training_iters = 100
batch_size = 100
display_step = 5
n_features = 3072 # 32*32*3
n_classes = 10
#tf.reset_default_graph()
def unpickle(filename):
'''解压数据'''
with open(filename, 'rb') as f:
d = pickle.load(f, encoding='latin1')
return d
def onehot(labels):
'''one-hot 编码'''
n_sample = len(labels)
n_class = max(labels) + 1
onehot_labels = np.zeros((n_sample, n_class))
onehot_labels[np.arange(n_sample), labels] = 1
return onehot_labels
# 训练数据集
data1 = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/data_batch_1')
data2 = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/data_batch_2')
data3 = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/data_batch_3')
data4 = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/data_batch_4')
data5 = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/data_batch_5')
X_train = np.concatenate((data1['data'], data2['data'], data3['data'], data4['data'], data5['data']), axis=0)
y_train = np.concatenate((data1['labels'], data2['labels'], data3['labels'], data4['labels'], data5['labels']), axis=0)
y_train = onehot(y_train)
# 测试数据集
test = unpickle('F:/TensorFlow_exp/cifar10-TensorFlow-tensorboard/cifar10-TensorFlow-tensorboard/cifar-10-batches-py/test_batch')
X_test = test['data'][:5000, :]
y_test = onehot(test['labels'])[:5000, :]
del test
print('Training dataset shape:', X_train.shape)
print('Training labels shape:', y_train.shape)
print('Testing dataset shape:', X_test.shape)
print('Testing labels shape:', y_test.shape)
with tf.name_scope('input'):
x = tf.placeholder(tf.float32, [None, n_features])
y = tf.placeholder(tf.float32, [None, n_classes])
with tf.name_scope('input_reshape'):
x4d = tf.reshape(x, [-1, 32, 32, 3])
y_ = model2(x4d)
#y_ = resnet_v2(x4d)
#loss = slim.losses.softmax_cross_entropy(y_,y)
# reg_loss = slim.add_n(slim.losses.get_regularization_losses())
with tf.name_scope('cross_entropy'):
total_loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=y_, labels=y))
tf.summary.scalar("loss", total_loss)
# total_loss = slim.losses.get_total_loss()
tf.summary.scalar('losses/total_loss', total_loss)
# train_op = slim.learning(total_loss, tf.train.GradientDescentOptimizer(learning = 0.001))
# 评估模型
#y_cur = tf.reshape(y_, [batch_size, -1])
y_cur = Flatten()(y_)
with tf.name_scope('accuracy'):
with tf.name_scope('correct_prediction'):
correct_pred = tf.equal(tf.argmax(y_cur, 1), tf.argmax(y, 1))
with tf.name_scope('accuracy'):
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))
tf.summary.scalar('accuracy', accuracy)
with tf.name_scope('train'):
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(total_loss)
merged_summary_op = tf.summary.merge_all()
init = tf.global_variables_initializer()
log_dir = 'F:/TensorFlow_exp/cifar10_reg/log'
saver = tf.train.Saver()
with tf.Session() as sess:
sess.run(init)
# summary_writer = tf.summary.FileWriter(log_dir, graph=tf.get_default_graph())
train_writer = tf.summary.FileWriter(log_dir + '/train', sess.graph)
test_writer = tf.summary.FileWriter(log_dir + '/test')
# slim.learning.train(train_op, log_dir, number_of_steps = 1000,
# save_summaries_secs = 30, save_interval_secs = 60)
c = []
total_batch = int(X_train.shape[0] / batch_size)
for i in range(training_iters):
for batch in range(total_batch):
batch_x = X_train[batch*batch_size : (batch+1)*batch_size, :]
batch_y = y_train[batch*batch_size : (batch+1)*batch_size, :]
if batch % 80 ==0:
summary, acc = sess.run([merged_summary_op, accuracy], feed_dict={x:X_test, y:y_test })
test_writer.add_summary(summary, i * total_batch + batch)
print('Accuracy at step %s: %s' % (i * total_batch + batch, acc))
else :
if batch % 100 == 0:
run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE)
run_metadata = tf.RunMetadata()
_, summary = sess.run([optimizer, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
train_writer.add_run_metadata(run_metadata, 'step%d' % (i * total_batch + batch))
train_writer.add_summary(summary, i * total_batch + batch)
saver.save(sess, 'F:/TensorFlow_exp/cifar10_reg/model.ckpt', i * total_batch + batch)
else:
_, summary = sess.run([optimizer, merged_summary_op], feed_dict={x: batch_x, y: batch_y})
train_writer.add_summary(summary, i * total_batch + batch)
train_writer.close()
test_writer.close()
print("Optimization Finished!")
# Test
test_acc = sess.run(accuracy, feed_dict={x: X_test, y: y_test})
print("Testing Accuracy:", test_acc)
plt.plot(c)
plt.xlabel('Iter')
plt.ylabel('Cost')
plt.title('lr=%f, ti=%d, bs=%d, acc=%f' % (learning_rate, training_iters, batch_size, test_acc))
plt.tight_layout()
plt.savefig('cnn-tf-cifar10-%s.png' % test_acc, dpi=200)