this guide gets you started programming in the low-level Tensorflow APIs(Tensorflow Core), showing you how to:
(1)manage your own Tensorflow program(a tf.Graph) and Tensorflow runtime(a tf.Session), instead of relying on Estimators to manage them
(2)run Tensorflow oprations, using a tf.Session
(3)use high level components(datasets, layers, and feature_columns) in this low level environment
(4)build your own training loop, instead of using the one provided by Estimators
we recommend using the higher level APIs to build models when possible. knowing Tensorflow Core is valuable for the following reasons:
(1)experimentation and debugging are both more straight forward when you can use low level Tensorflow operations directly
(2)it gives you a mental model of how things work internally when using the higher level APIs
the central unit of data in Tensorflow is the tensor. a tensor consists of a set of primitive values shaped into an array of any number of dimensions. a tensor's rank is its number of dimensions, while its shape is a tuple of integers specifying the array's length along each dimension.
you might think of Tensorflow Core programs as consisting of two discrete sections:
(1)building the computational graph (a tf.Graph)
(2)running the computational graph (using a tf.Session)
a computational graph is a series of Tensorflow operations arranged into a graph. the graph is composed of two of objects:
(1)Operations(or "ops"):the nodes of the graph. operations describe calculations that consume and produce tensors
(2)Tensors:the edges in the graph. these represent the values that will flow through the graph. most Tensorflow functions return tf.Tensors
tf.Tensors do not have values, they are just handles to elements in the computation graph
notice that printing the tensors does not output the values 3.0, 4.0, and 7.0 as you might expect. the above statements only build the computation graph. these tf.Tensor objects just represent the results of the operations that will be run
each operation in a grpah is given a unique name. this name is independent of the names the objects are assigned to in python
tensorflow provides a utility called Tensorboard. one of tensorboard's many capabilities is visualizing a computation graph
first you save the computation graph to a tensorboard summary file as follows:
###
writer = tf.summary.FileWriter("/home/mao/tensorflow_gpu/Notebooks/events")
writer.add_graph(tf.get_default_graph())
to evaluate tensors, instantiate a tf.Session object, informally known as a session. a session encapsulates the state of the Tensorflow runtime, and runs Tensorflow operations. If a tf.Graph is like a .py file, a tf.Session is like the python executable
when you request the output of a node with Session.run tensorflow backtracks through the graph and runs all the nodes that provide input to the requested output node
some Tensorflow functions return tf.Operations instead of tf.Tensors. the result of calling run on an Operation is None. you run an operation to cause a side-effect, not to retrieve a value. Examples of this include the initialization, and training ops demonstrated later
a graph can be parameterized to accept external inputs, known as placeholders. a placeholders is a promise to provide a value later, like a function argument
the preceding three lines are a bit like a function in which we define two input parameters(x and y) and then an operation on them
also note that the feed_dict argument can be used to overwrite any tensor in the graph. the only difference between placeholders and other tf.Tensors is that placeholders throw an error if no value is fed to them
placeholders work for simple experiments, but Datasets are the preferred method of streaming data into a model
to get a runnable tf.Tensor from a Dataset you must first convert it to a tf.data.Iterator, and then call the Iterator's get_next method
the simplest way to create an iterator is with the make_one_shot_iterator method
if the Dataset depends on stateful operations you may need to initialize the iterator before using it
a trainable model must modify the values in the graph to get new outputs with the same input. Layers are the preferred way to add trainable parameters to a graph
Layers package together both the variables and the operations that act on them. for example a densely-connected layer performs a weighted sum across all inputs for each output and applies an optional activation function. the connection weights and biases are managed by the layer object
the following code creates a Dense layer that takes a batch of input vectors, and produces a single output value for each. to apply a layer to an input, call the layer as if it were a function
###
x = tf.placeholder(tf.float32, shape=[None, 3])
linear_model = tf.layers.Dense(units=1)
y = linear_model(x)
the layer inspects its input to determine sizes for its internal variables.so here we must set the shape of the x placeholder so that the layer can build a weight matrix of the correct size
the layer contains variables that must be initialized before they can be used. while it is possible to initialize variables individually, you can easily initialize all the variables in a Tensorflow graph as follows:
###
init = tf.global_variables_initializer()
sess.run(init)
calling tf.global_variables_initializer only creates and returns a handle to a tensorflow operation. that op will initialize all the global variables when we run it with tf.Session.run
also note that this global_variables_initializer only initializes variables that existed in the graph when the initializer was created. so the initializer should be one of the last things added during graph construction
for each layer class(like tf.layers.Dense) tensorflow also supplies a shortcut function(like tf.layers.dense). the only difference is that the shortcut function versions create and run the layer in a single call
###
x = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.layers.dense(x, units=1)
init = tf.global_variables_initializer()
sess.run(init)
while convenient, this approach allows no access to the tf.layers.Layer object. this makes introspection and debugging more difficult, and layer reuse impossible
the easiest way to experiment with feature columns is using the tf.feature_column.input_layer function. this function only accepts dense columns as inputs, so to view the result of a categorical column you must wrap it in an tf.feature_column.indicator_column
###
features = {
'sales': [[5], [10], [8], [9]],
'department': ['sports', 'sports', 'gradening', 'gardening']
}
department_column = tf.feature_column.categorical_column_with_vocabulary_list(
'department', ['sports', 'gardening'])
department_column = tf.feature_column.indicator_column(department_column)
columns = [
tf.feature_column.numeric_column('sales'),
department_cloumn
]
inputs = tf.feature_column.input_layer(features, columns)
running the inputs tensor will parse the features into a batch of vectors
feature columns can have internal state, like layers, so they often need to be initialized. categorical columns use lookup tables internally and these requires a separate initialization op, tf.table_initializer
###
var_init = tf.global_variables_initializer()
table_init = tf.tables_initializer()
sess = tf.Session()
sess.run([var_init, table_init])
once the internal state has been initialized you can run inputs like any other tf.Tensor
###
print(sess.run(inputs))
this shows how the feature columns have packed the input vectors, with the one-hot "department" as the first two indices and "sales" as the third
[[ 1. 0. 5.]
[ 1. 0. 10.]
[ 0. 1. 8.]
[ 0. 1. 9.]]
to optimize a model, you first need to define the loss. we'll use the mean square error, a standard loss for regression problems
while you could do this manually with lower level math operations, the tf.losses module provides a set of common loss functions
tensorflow provides optimizers implementing standard optimization algorithms. these are implemented as sub-classes of tf.train.Optimizer. they incrementally change each variable in order to minimize the loss. the simplest optimization algorithm is gradient descent, implemented by tf.train.GradientDescentOptimizer. it modifies each variable according to the magnitude of the derivative of loss with respect to that variable
this code builds all the graph components necessary for the optimization, and returns a training operation. when run, the training op will update variables in the graph
since train is an op, not a tensor, it doesn't return a value when run
(1)manage your own Tensorflow program(a tf.Graph) and Tensorflow runtime(a tf.Session), instead of relying on Estimators to manage them
(2)run Tensorflow oprations, using a tf.Session
(3)use high level components(datasets, layers, and feature_columns) in this low level environment
(4)build your own training loop, instead of using the one provided by Estimators
we recommend using the higher level APIs to build models when possible. knowing Tensorflow Core is valuable for the following reasons:
(1)experimentation and debugging are both more straight forward when you can use low level Tensorflow operations directly
(2)it gives you a mental model of how things work internally when using the higher level APIs
the central unit of data in Tensorflow is the tensor. a tensor consists of a set of primitive values shaped into an array of any number of dimensions. a tensor's rank is its number of dimensions, while its shape is a tuple of integers specifying the array's length along each dimension.
you might think of Tensorflow Core programs as consisting of two discrete sections:
(1)building the computational graph (a tf.Graph)
(2)running the computational graph (using a tf.Session)
a computational graph is a series of Tensorflow operations arranged into a graph. the graph is composed of two of objects:
(1)Operations(or "ops"):the nodes of the graph. operations describe calculations that consume and produce tensors
(2)Tensors:the edges in the graph. these represent the values that will flow through the graph. most Tensorflow functions return tf.Tensors
tf.Tensors do not have values, they are just handles to elements in the computation graph
notice that printing the tensors does not output the values 3.0, 4.0, and 7.0 as you might expect. the above statements only build the computation graph. these tf.Tensor objects just represent the results of the operations that will be run
each operation in a grpah is given a unique name. this name is independent of the names the objects are assigned to in python
tensorflow provides a utility called Tensorboard. one of tensorboard's many capabilities is visualizing a computation graph
first you save the computation graph to a tensorboard summary file as follows:
###
writer = tf.summary.FileWriter("/home/mao/tensorflow_gpu/Notebooks/events")
writer.add_graph(tf.get_default_graph())
to evaluate tensors, instantiate a tf.Session object, informally known as a session. a session encapsulates the state of the Tensorflow runtime, and runs Tensorflow operations. If a tf.Graph is like a .py file, a tf.Session is like the python executable
when you request the output of a node with Session.run tensorflow backtracks through the graph and runs all the nodes that provide input to the requested output node
some Tensorflow functions return tf.Operations instead of tf.Tensors. the result of calling run on an Operation is None. you run an operation to cause a side-effect, not to retrieve a value. Examples of this include the initialization, and training ops demonstrated later
a graph can be parameterized to accept external inputs, known as placeholders. a placeholders is a promise to provide a value later, like a function argument
the preceding three lines are a bit like a function in which we define two input parameters(x and y) and then an operation on them
also note that the feed_dict argument can be used to overwrite any tensor in the graph. the only difference between placeholders and other tf.Tensors is that placeholders throw an error if no value is fed to them
placeholders work for simple experiments, but Datasets are the preferred method of streaming data into a model
to get a runnable tf.Tensor from a Dataset you must first convert it to a tf.data.Iterator, and then call the Iterator's get_next method
the simplest way to create an iterator is with the make_one_shot_iterator method
if the Dataset depends on stateful operations you may need to initialize the iterator before using it
a trainable model must modify the values in the graph to get new outputs with the same input. Layers are the preferred way to add trainable parameters to a graph
Layers package together both the variables and the operations that act on them. for example a densely-connected layer performs a weighted sum across all inputs for each output and applies an optional activation function. the connection weights and biases are managed by the layer object
the following code creates a Dense layer that takes a batch of input vectors, and produces a single output value for each. to apply a layer to an input, call the layer as if it were a function
###
x = tf.placeholder(tf.float32, shape=[None, 3])
linear_model = tf.layers.Dense(units=1)
y = linear_model(x)
the layer inspects its input to determine sizes for its internal variables.so here we must set the shape of the x placeholder so that the layer can build a weight matrix of the correct size
the layer contains variables that must be initialized before they can be used. while it is possible to initialize variables individually, you can easily initialize all the variables in a Tensorflow graph as follows:
###
init = tf.global_variables_initializer()
sess.run(init)
calling tf.global_variables_initializer only creates and returns a handle to a tensorflow operation. that op will initialize all the global variables when we run it with tf.Session.run
also note that this global_variables_initializer only initializes variables that existed in the graph when the initializer was created. so the initializer should be one of the last things added during graph construction
for each layer class(like tf.layers.Dense) tensorflow also supplies a shortcut function(like tf.layers.dense). the only difference is that the shortcut function versions create and run the layer in a single call
###
x = tf.placeholder(tf.float32, shape=[None, 3])
y = tf.layers.dense(x, units=1)
init = tf.global_variables_initializer()
sess.run(init)
while convenient, this approach allows no access to the tf.layers.Layer object. this makes introspection and debugging more difficult, and layer reuse impossible
the easiest way to experiment with feature columns is using the tf.feature_column.input_layer function. this function only accepts dense columns as inputs, so to view the result of a categorical column you must wrap it in an tf.feature_column.indicator_column
###
features = {
'sales': [[5], [10], [8], [9]],
'department': ['sports', 'sports', 'gradening', 'gardening']
}
department_column = tf.feature_column.categorical_column_with_vocabulary_list(
'department', ['sports', 'gardening'])
department_column = tf.feature_column.indicator_column(department_column)
columns = [
tf.feature_column.numeric_column('sales'),
department_cloumn
]
inputs = tf.feature_column.input_layer(features, columns)
running the inputs tensor will parse the features into a batch of vectors
feature columns can have internal state, like layers, so they often need to be initialized. categorical columns use lookup tables internally and these requires a separate initialization op, tf.table_initializer
###
var_init = tf.global_variables_initializer()
table_init = tf.tables_initializer()
sess = tf.Session()
sess.run([var_init, table_init])
once the internal state has been initialized you can run inputs like any other tf.Tensor
###
print(sess.run(inputs))
this shows how the feature columns have packed the input vectors, with the one-hot "department" as the first two indices and "sales" as the third
[[ 1. 0. 5.]
[ 1. 0. 10.]
[ 0. 1. 8.]
[ 0. 1. 9.]]
to optimize a model, you first need to define the loss. we'll use the mean square error, a standard loss for regression problems
while you could do this manually with lower level math operations, the tf.losses module provides a set of common loss functions
tensorflow provides optimizers implementing standard optimization algorithms. these are implemented as sub-classes of tf.train.Optimizer. they incrementally change each variable in order to minimize the loss. the simplest optimization algorithm is gradient descent, implemented by tf.train.GradientDescentOptimizer. it modifies each variable according to the magnitude of the derivative of loss with respect to that variable
this code builds all the graph components necessary for the optimization, and returns a training operation. when run, the training op will update variables in the graph
since train is an op, not a tensor, it doesn't return a value when run