如何学习TensorFlow?
TensorFlow有很多优秀的教程,包括谷歌官方教程。
本文将带领你了解在TensorFlow训练模型所需要做的事情。如果您想了解更多或需要进一步学习在这里没有充分解释的主题,请参阅文末链接的参考教程。
- 加载数据集
- 了解实例模型
- 训练模型
加载数据集
import tensorflow as tf
import numpy as np
import math
import timeit
import matplotlib.pyplot as plt
%matplotlib inlinefrom cs231n.data_utils import load_CIFAR10
def get_CIFAR10_data(num_training=49000, num_validation=1000, num_test=10000):
“””
Load the CIFAR-10 dataset from disk and perform preprocessing to prepare
it for the two-layer neural net classifier. These are the same steps as
we used for the SVM, but condensed to a single function.
“””
# Load the raw CIFAR-10 data
cifar10_dir = ‘cs231n/datasets/cifar-10-batches-py’
X_train, y_train, X_test, y_test = load_CIFAR10(cifar10_dir)
# Subsample the data
mask = range(num_training, num_training + num_validation)
X_val = X_train[mask]
y_val = y_train[mask]
mask = range(num_training)
X_train = X_train[mask]
y_train = y_train[mask]
mask = range(num_test)
X_test = X_test[mask]
y_test = y_test[mask]
# Normalize the data: subtract the mean image
mean_image = np.mean(X_train, axis=0)
X_train -= mean_image
X_val -= mean_image
X_test -= mean_image
return X_train, y_train, X_val, y_val, X_test, y_test
X_train, y_train, X_val, y_val, X_test, y_test = get_CIFAR10_data()
print(‘Train data shape: ‘, X_train.shape)
print(‘Train labels shape: ‘, y_train.shape)
print(‘Validation data shape: ‘, X_val.shape)
print(‘Validation labels shape: ‘, y_val.shape)
print(‘Test data shape: ‘, X_test.shape)
print(‘Test labels shape: ‘, y_test.shape)
了解实例模型
一些有用的实用工具
. 请记住,我们的图像数据最初是N x H x W x C,其中:
N是数据点的数量。
H是像素中每个图像的高度。
W是以像素为单位的每个图像的高度
C是信道的数目(通常为3:R,G,B)。
我们正在做的东西像一个2D卷积,这需要理解空间的像素彼此之间的关系。当我们将图像数据输入到完全连接的仿射层时,我们希望每个数据实例都用一个向量来表示,而对数据的不同通道、行和列进行分隔则不再有用。
示例模型本身
训练自己模型的第一步是定义它的体系结构。
这里是一个卷积神经网络在TysFoeLoad中定义的例子——尝试了解每一行正在做什么,记住每一层都是由前一层组成的。我们还没有训练任何东西,现在,我们希望你了解一切是如何建立起来的。
在这个例子中,你会看到2D卷积层(VAR2D)、Relu激活和全连接层(线性)。还可以看到铰链损失函数和亚当优化器。
确保你理解为什么线性层的参数是5408和10。
我们将首先具体化变量,然后初始化网络模型。
# clear old variables
tf.reset_default_graph()
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
def simple_model(X,y):
Wconv1 = tf.get_variable(“Wconv1”, shape=[7, 7, 3, 32])
bconv1 = tf.get_variable(“bconv1”, shape=[32])
W1 = tf.get_variable(“W1”, shape=[5408, 10])
b1 = tf.get_variable(“b1”, shape=[10])
# define our graph (e.g. two_layer_convnet)
a1 = tf.nn.conv2d(X, Wconv1, strides=[1,2,2,1], padding=’VALID’) + bconv1
h1 = tf.nn.relu(a1)
h1_flat = tf.reshape(h1,[-1,5408])
y_out = tf.matmul(h1_flat,W1) + b1
return y_out
y_out = simple_model(X,y)
# define our loss
total_loss = tf.losses.hinge_loss(tf.one_hot(y,10),logits=y_out)
mean_loss = tf.reduce_mean(total_loss)
# define our optimizer
optimizer = tf.train.AdamOptimizer(5e-4) # select optimizer and set learning rate
train_step = optimizer.minimize(mean_loss)
TensorFlow 流支持许多其他层类型、损失函数和优化器-您将在下面进行实验。这里是这些API的官方文档(如果上面使用的任何参数都不清楚,这个资源也会有帮助)。
- Layers, Activations, Loss functions : https://www.tensorflow.org/api_guides/python/nn
- Optimizers: https://www.tensorflow.org/api_guides/python/train#Optimizers
- BatchNorm: https://www.tensorflow.org/api_docs/python/tf/layers/batch_normalization
当我们定义了上面的操作图时,为了执行张紧流图,通过给它们输入数据和计算结果,我们首先需要创建一个TF.session对象。会话封装TensorFlOW 运行时的控件和状态。有关更多信息,请参见TensorFlOW入门指南。
可选地,我们还可以指定设备上下文,如/CPU:0或/GPU:0。有关此行为的文档,请参阅此张量流指南。
您应该看到大约0.4到0.6的验证损失和低于0.30到0.35的准确度。
>
def run_model(session, predict, loss_val, Xd, yd,
epochs=1, batch_size=64, print_every=100,
training=None, plot_losses=False):
# have tensorflow compute accuracy
correct_prediction = tf.equal(tf.argmax(predict,1), y)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
# shuffle indicies
train_indicies = np.arange(Xd.shape[0])
np.random.shuffle(train_indicies)
training_now = training is not None
# setting up variables we want to compute (and optimizing)
# if we have a training function, add that to things we compute
variables = [mean_loss,correct_prediction,accuracy]
if training_now:
variables[-1] = training
# counter
iter_cnt = 0
for e in range(epochs):
# keep track of losses and accuracy
correct = 0
losses = []
# make sure we iterate over the dataset once
for i in range(int(math.ceil(Xd.shape[0]/batch_size))):
# generate indicies for the batch
start_idx = (i*batch_size)%Xd.shape[0]
idx = train_indicies[start_idx:start_idx+batch_size]
# create a feed dictionary for this batch
feed_dict = {X: Xd[idx,:],
y: yd[idx],
is_training: training_now }
# get batch size
actual_batch_size = yd[idx].shape[0]
# have tensorflow compute loss and correct predictions
# and (if given) perform a training step
loss, corr, _ = session.run(variables,feed_dict=feed_dict)
# aggregate performance stats
losses.append(loss*actual_batch_size)
correct += np.sum(corr)
# print every now and then
if training_now and (iter_cnt % print_every) == 0:
print("Iteration {0}: with minibatch training loss = {1:.3g} and accuracy of {2:.2g}"\
.format(iter_cnt,loss,np.sum(corr)/actual_batch_size))
iter_cnt += 1
total_correct = correct/Xd.shape[0]
total_loss = np.sum(losses)/Xd.shape[0]
print("Epoch {2}, Overall loss = {0:.3g} and accuracy of {1:.3g}"\
.format(total_loss,total_correct,e+1))
if plot_losses:
plt.plot(losses)
plt.grid(True)
plt.title('Epoch {} Loss'.format(e+1))
plt.xlabel('minibatch number')
plt.ylabel('minibatch loss')
plt.show()
return total_loss,total_correct
with tf.Session() as sess:
with tf.device(“/cpu:0”): #”/cpu:0” or “/gpu:0”
sess.run(tf.global_variables_initializer())
print(‘Training’)
run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step,True)
print(‘Validation’)
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)
训练模型
在本节中,我们将为您指定一个模型来构造。这里的目标不是要获得好的性能(下一步),而是要更好地理解ToSoFraseDebug和配置自己的模型。
使用上面提供的代码作为指导,并使用以下张紧流文档,指定具有以下架构的模型:
7x7卷积层,具有32个滤波器和1的步幅
Relu激活层
空间批量归一化层(可训练参数,具有刻度和定心)
2x2最大池合并步幅为2
具有1024个输出单元的仿射层
Relu激活层
从1024个输入单元到10个输出的仿射层
# clear old variables
tf.reset_default_graph()
# define our input (e.g. the data that changes every batch)
# The first dim is None, and gets sets automatically based on batch size fed in
X = tf.placeholder(tf.float32, [None, 32, 32, 3])
y = tf.placeholder(tf.int64, [None])
is_training = tf.placeholder(tf.bool)
# define model
def complex_model(X,y,is_training):
pass
y_out = complex_model(X,y,is_training)
为了确保你做的是正确的事情,使用下面的工具检查你的输出的维数(它应该是64×10,因为我们的批次有大小64,最终仿射层的输出应该是10,对应于我们的10个类):
# Now we’re going to feed a random batch into the model
# and make sure the output is the right size
x = np.random.randn(64, 32, 32,3)
with tf.Session() as sess:
with tf.device(“/cpu:0”): #”/cpu:0” or “/gpu:0”
tf.global_variables_initializer().run()
ans = sess.run(y_out,feed_dict={X:x,is_training:True})
%timeit sess.run(y_out,feed_dict={X:x,is_training:True})
print(ans.shape)
print(np.array_equal(ans.shape, np.array([64, 10])))
您应该从上面的运行中看到以下内容
(64, 10)
True
现在你已经看到了如何定义一个模型并通过它做一些数据的向前传递,让我们来看看你如何在训练数据上训练一个完整的纪元(使用上面所创建的复杂模型)。
确保您理解下面使用的每个TysFooSoad函数对应于您在自定义神经网络实现中实现的功能。
首先,建立一个RMSPROP优化器(使用1E-3学习率)和交叉熵损失函数。有关更多信息,请参阅TensorFlow文档
# Inputs
# y_out: is what your model computes
# y: is your TensorFlow variable with label information
# Outputs
# mean_loss: a TensorFlow variable (scalar) with numerical loss
# optimizer: a TensorFlow optimizer
# This should be ~3 lines of code!
mean_loss = None
optimizer = None
pass# batch normalization in tensorflow requires this extra dependency
extra_update_ops = tf.get_collection(tf.GraphKeys.UPDATE_OPS)
with tf.control_dependencies(extra_update_ops):
train_step = optimizer.minimize(mean_loss)
下面我们将创建一个会话并在一个时期内对模型进行训练。你应该看到1.4到2的损失和0.4到0.5的准确度。由于随机的种子和初始化的差异,将会有一些变化。
sess = tf.Session()
sess.run(tf.global_variables_initializer())
print(‘Training’)
run_model(sess,y_out,mean_loss,X_train,y_train,1,64,100,train_step)
让我们看看列车和测试代码在行动中——在评估下面开发的模型时,可以自由使用这些方法。你应该看到1.3到2的损失,精确度是0.45到0.55。
print(‘Validation’)
run_model(sess,y_out,mean_loss,X_val,y_val,1,64)
训练技巧
对于您尝试的每个网络体系结构,您应该调整学习速率和正则化强度。当这样做时,有一些重要的事情需要牢记:
如果参数工作正常,在几百次迭代中应该看到改进。
请记住对超参数调谐的粗到精方法:开始测试大范围的超参数,只需进行几次训练迭代即可找到所有工作的参数组合。
一旦找到了一些看起来有用的参数,就可以更精细地搜索这些参数。你可能需要训练更多的时间。
您应该使用hyperparame验证集
新架构
[2]: DenseNets ,其中输入到以前的层串联在一起。