Chapter 9 Up and Running with TensorFlow

最新推荐文章于 2020-12-19 21:53:53 发布

boywaiter

最新推荐文章于 2020-12-19 21:53:53 发布

阅读量289

点赞数

本文链接：https://blog.csdn.net/boywaiter/article/details/87885299

版权

Hands-On Machine Learning with Scik 专栏收录该内容

15 篇文章 1 订阅

订阅专栏

Chapter 9 Up and Running with TensorFlow

OReilly. Hands-On Machine Learning with Scikit-Learn and TensorFlow读书笔记

TensorFlow first defines in Python a graph of computations to perform, and then takes that graph and runs it efficiently using optimized C++ code.

It is possible to break up the graph into several chunks and run them in parallel across multiple CPUs or GPUs. TensorFlow also supports distributed computing, so you can train colossal neural networks on humongous training sets in a reasonable amount of time by splitting the computations across hundreds of servers.

9.1 Installation

Create a virtual environment using $\verb+virtualenv+$ command, then activate it.

$ cd $ML_PATH # Your ML working directory (e.g., $HOME/ml)
$ source env/bin/activate

Install TensorFlow

$ pip3 install --upgrade tensorflow

Test your installation

$ python3 -c 'import tensorflow; print(tensorflow.__version__)'
1.0.0

9.2 Creating Your First Graph and Running It in a Session

Create a graph

import tensorflow as tf
x=tf.Variable(3,name="x")
y=tf.Variable(4,name="y")
f=x*x*y+y+2

A TensorFlow session takes care of placing the operations onto devices such as CPUs and GPUs and running them, and it holds all the variable values.

creates a session, initializes the variables, and evaluates $\verb+f+$ , then closes the session.

sess=tf.Session()
sess.run(x.initializer)
sess.run(y.initializer)
result=sess.run(f)
print(result)#42
sess.close()

Alternatives that fulfil same task are as follows.

with tf.Session() as sess:
    x.initializer.run()
    y.initializer.run()
    result=f.eval()
print(result)

init=tf.global_variables_initializer()# prepare an init node
with tf.Session() as sess:
    init.run()# actually initialize all the variables
    result =f.eval()
print(result)

sess=tf.InteractiveSession()
init.run()
result=f.eval()
print(result)
sess.close()

A TensorFlow program is typically split into two parts: the first part builds a computation graph (this is called the construction phase), and the second part runs it (this is the execution phase). The construction phase typically builds a computation graph representing the ML model and the computations required to train it. The execution phase generally runs a loop that evaluates a training step repeatedly (for example, one step per mini-batch), gradually improving the model parameters.

9.3 Managing Graphs

Any node you create is automatically added to the default graph:

x1=tf.Variable(1)
x1.graph is tf.get_default_graph()#True

Managing multiple independent graphs at one time by creating a new Graph and temporarily making it the default graph inside a $\verb+with+$ block :

graph=tf.Graph()
with graph.as_default():
    x2=tf.Variable(2)
x2.graph is graph#True
x2.graph is tf.get_default_graph()#False

Resetting the default graph:

tf.reset_default_graph()
x1.graph is tf.get_default_graph()#False

9.4 Lifecycle of a Node Value

When you evaluate a node, TensorFlow automatically determines the set of nodes that it depends on and it evaluates these nodes first.

w=tf.constant(3)
x=w+2
y=x+5
z=x*3

with tf.Session() as sess:
    print(y.eval())#10
    print(z.eval())#15

All node values are dropped between graph runs, except variable values, which are maintained by the session across graph runs. A variable starts its life when its initializer is run, and it ends when the session is closed.

More efficient,

with tf.Session() as sess:
    y_val,z_val=sess.run([y,z])
    print(y_val)#10
    print(z_val)#15

In single-process TensorFlow, multiple sessions do not share any state, even if they reuse the same graph (each session would have its own copy of every variable).

9.5 Linear Regression with TensorFlow

Using Normal Equation to compute $\hat \theta$ .

import numpy as np
from sklearn.datasets import fetch_california_housing

housing =fetch_california_housing()
m,n=housing.data.shape
housing_data_plus_bias= np.c_[np.ones((m,1)),housing.data]

X=tf.constant(housing_data_plus_bias,dtype=tf.float32,name="X")
y=tf.constant(housing.target.reshape(-1,1),dtype=tf.float32,name="y")
#–1 means “unspecified”
XT=tf.transpose(X)
theta=tf.matmul(tf.matmul(tf.matrix_inverse(tf.matmul(XT,X)),XT),y)

with tf.Session() as sess:
    theta_value=theta.eval()

9.6 Implementing Gradient Descent

When using Gradient Descent, remember that it is important to first normalize the input feature vectors, or else training may be much slower.

from sklearn.preprocessing import StandardScaler
scaler=StandardScaler()
scaled_housing_data=scaler.fit_transform(housing.data)
scaled_housing_data_plus_bias=np.c_[np.ones((m,1)),scaled_housing_data]

9.6.1 Manually Computing the Gradients

n_epochs=1000
learning_rate=0.01

X=tf.constant(scaled_housing_data_plus_bias,dtype=tf.float32,name="X")
y=tf.constant(housing.target.reshape(-1,1),dtype=tf.float32,name="y")
theta=tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")
#creates a node in the graph that will generate a tensor containing random #values, given its shape and value range, much like NumPy’s rand() function.
y_pred=tf.matmul(X,theta,name="predictions")
error=y_pred-y
mse=tf.reduce_mean(tf.square(error),name="mse")
#Computes the mean of elements across dimensions of a tensor.
#Reduces `input_tensor` along the dimensions given in `axis`.
gradients=2/m*tf.matmul(tf.transpose(X),error)#Equation 4-6
training_op=tf.assign(theta,theta-learning_rate*gradients)
#creates a node that will assign a new value to a variable.

init=tf.global_variables_initializer()

with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch%100==0:
            print("Epoch",epoch,"MSE=",mse.eval())
        sess.run(training_op)
    best_theta=theta.eval()

9.6.2 Using autodiff

Using symbolic differentiation can automatically find the equations for the partial derivatives, but the resulting code would not necessarily be very efficient (see Appendix D Autodiff).

TensorFlow’s autodiff feature can automatically and efficiently compute the gradients. Replace the $\verb+gradients=...+$ line in the Section 9.6.1 with the following line:

gradients = tf.gradients(mse, [theta])[0]

The gradients() function takes an op (in this case $\verb+mse+$ ) and a list of variables (in this case just $\verb+theta+$ ), and it creates a list of ops (one per variable) to compute the gradients of the op with regards to each variable. So the $\verb+gradients+$ node will compute the gradient vector of the MSE with regards to $\verb+theta+$ .

9.6.3 Using an Optimizer

Replace the $\verb+gradients=...+$ and $training_op = ... \verb+training_op = ...+$ lines in the Section 9.6.1 with the following line:

optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

use a momentum optimizer:

optimizer = tf.train.MomentumOptimizer(learning_rate=learning_rate,
momentum=0.9)

9.7 Feeding Data to the Training Algorithm

Placeholder nodes don’t actually perform any computation, they just output the data you tell them to output at runtime. They are typically used to pass the training data to TensorFlow
during training. If you don’t specify a value at runtime for a placeholder, you get an exception.

A=tf.placeholder(tf.float32,shape=(None,3))#None means "any size"
B=A+5
with tf.Session() as sess:
    B_val_1=B.eval(feed_dict={A:[[1,2,3]]})
    #pass a feed_dict to the eval() method that specifies the value of A.
    B_val_2=B.eval(feed_dict={A:[[4,5,6],[7,8,9]]})
print(B_val_1)
print(B_val_2)

Mini-batch Gradient Descent:

n_epochs=1000
learning_rate=0.01
batch_size=100
n_batches=int(np.ceil(m/batch_size))

X=tf.placeholder(dtype=tf.float32,shape=(None,n+1),name="X")
y=tf.placeholder(dtype=tf.float32,shape=(None,1),name="y")
theta=tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")
y_pred=tf.matmul(X,theta,name="predictions")
error=y_pred-y
mse=tf.reduce_mean(tf.square(error),name="mse")
optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate)
training_op = optimizer.minimize(mse)

init=tf.global_variables_initializer()
def fetch_batch(epoch,batch_index,batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch,y_batch
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch%100==0:
            print("Epoch",epoch,"MSE=",mse.eval(feed_dict={X:X_batch,y:y_batch}))
        for batch_index in range(n_batches):
            X_batch,y_batch=fetch_batch(epoch,batch_index,batch_size)
            sess.run(training_op,feed_dict={X:X_batch,y:y_batch})        
    best_theta=theta.eval()

We don’t need to pass the value of $\verb+X+$ and $\verb+y+$ when evaluating theta since it does not depend on either of them.

9.8 Saving and Restoring Models

TensorFlow makes saving and restoring a model very easy. Just create a $\verb+Saver+$ node at the end of the construction phase (after all variable nodes are created); then, in the execution phase, just call its $\verb+save()+$ method whenever you want to save the model, passing it the session and path of the checkpoint file:

[...]
theta = tf.Variable(tf.random_uniform([n + 1, 1], -1.0, 1.0), name="theta")
[...]
init = tf.global_variables_initializer()
saver = tf.train.Saver()

with tf.Session() as sess:
    sess.run(init)
    for epoch in range(n_epochs):
        if epoch % 100 == 0: # checkpoint every 100 epochs
            save_path = saver.save(sess, "/tmp/my_model.ckpt")
        sess.run(training_op)
    best_theta = theta.eval()
    save_path = saver.save(sess, "/tmp/my_model_final.ckpt")

Restoring a model is just as easy: you create a $\verb+Saver+$ at the end of the construction phase just like before, but then at the beginning of the execution phase, instead of initializing the variables using the $\verb+init+$ node, you call the $\verb+restore()+$ method of the $\verb+Saver+$ object:

with tf.Session() as sess:
    saver.restore(sess, "/tmp/my_model_final.ckpt")
    [...]

By default a $\verb+Saver+$ saves and restores all variables under their own name, but if you need more control, you can specify which variables to save or restore, and what names to use. For example, the following $\verb+Saver+$ will save or restore only the theta variable under the name $\verb+ weights+$ :

saver = tf.train.Saver({"weights": theta})

9.9 Visualizing the Graph and Training Curves Using TensorBoard

from datetime import datetime
now=datetime.utcnow().strftime("%Y%m%d%H%M%S")
root_logdir="tf_logs"
logdir="{}/run-{}/".format(root_logdir,now)

n_epochs=1000
learning_rate=0.01
batch_size=100
n_batches=int(np.ceil(m/batch_size))

X=tf.placeholder(dtype=tf.float32,shape=(None,n+1),name="X")
y=tf.placeholder(dtype=tf.float32,shape=(None,1),name="y")
theta=tf.Variable(tf.random_uniform([n+1,1],-1.0,1.0),name="theta")
y_pred=tf.matmul(X,theta,name="predictions")
error=y_pred-y
mse=tf.reduce_mean(tf.square(error),name="mse")
gradients=2/m*tf.matmul(tf.transpose(X),error)#Equation 4-6
training_op=tf.assign(theta,theta-learning_rate*gradients)

#creates a node that will evaluate the MSE value and write it
#to a TensorBoard-compatible binary log string called a summary. 
mse_summary=tf.summary.scalar('MSE',mse)
#write summaries to logfiles in the log directory
file_writer=tf.summary.FileWriter(logdir,tf.get_default_graph())

init=tf.global_variables_initializer()
def fetch_batch(epoch,batch_index,batch_size):
    np.random.seed(epoch * n_batches + batch_index)  # not shown in the book
    indices = np.random.randint(m, size=batch_size)  # not shown
    X_batch = scaled_housing_data_plus_bias[indices] # not shown
    y_batch = housing.target.reshape(-1, 1)[indices] # not shown
    return X_batch,y_batch
with tf.Session() as sess:
    sess.run(init)
    
    for epoch in range(n_epochs):
        if epoch%100==0:
            print("Epoch",epoch,"MSE=",mse.eval(feed_dict={X:X_batch,y:y_batch}))
        for batch_index in range(n_batches):
            X_batch,y_batch=fetch_batch(epoch,batch_index,batch_size)
            if batch_index % 10 == 0:
                summary_str = mse_summary.eval(feed_dict={X: X_batch, y: y_batch})
                step = epoch * n_batches + batch_index
                file_writer.add_summary(summary_str, step)
            sess.run(training_op,feed_dict={X:X_batch,y:y_batch})        
    best_theta=theta.eval()
    file_writer.close()

Starts the TensorBoard web server, listening on port 6006. Be sure to use the full path if you encounter an error on showing the results.

$ tensorboard --logdir C:\ProgramData\Anaconda3\envs\tensorflow\HML\tf_logs\

9.10 Name Scopes

Create name scopes to group related nodes. Define $\verb+error+$ and $\verb+mse+$ within name scope $\verb+loss+$ :

with tf.name_scope("loss") as scope:
    error=y_pred-y
    mse=tf.reduce_mean(tf.square(error),name="mse")
print(error.op.name)
#The varaible error is not given a name, so return the operator sub(tract)
#loss/sub
print(mse.op.name)
#loss/mse

9.11 Modularity

A Rectified Linear Unit (ReLU) computes a linear function of the inputs, and outputs the result if it is positive, and 0 otherwise.

Equation 9-1. Rectified linear unit
$h_{\textbf w,b}(\textbf W)=\max (\textbf X\cdot\textbf w+b,0)$
Create a graph that adds the output of five ReLUs. The first ReLU contains nodes named “weights”, “bias”, “z”, and “relu” (plus many more nodes with their default name, such as “MatMul”); the second ReLU contains nodes named “weights_1”, “bias_1”, and so on; the third ReLU contains nodes named “weights_2”, “bias_2”, and so on.

def relu(X):
    w_shape=(int(X.get_shape()[1]),1)
    w=tf.Variable(tf.random_normal(w_shape),name="weights")
    b=tf.Variable(0.0,name="bias")
    z=tf.add(tf.matmul(X,w),b,name="z")
    return tf.maximum(z,0.,name="relu")

n_features=3
X=tf.placeholder(tf.float32,shape=(None,n_features),name="X")
relus=[relu(X) for i in range(5)]
output=tf.add_n(relus, name="ooutput")
#compute the sum of a list of tensors

When you create a node, TensorFlow checks whether its name already exists, and if it does it appends an underscore followed by an index to make the name unique.

Moving all the content of the $\verb+relu()+$ function inside a name scope makes the graph much clearer. Only name scopes are appended _1, _2, and so on. Same node names such as “weights”, “bias” and “z” under different name scopes.

def relu(X):
    with tf.name_scope("relu"):
        [...]

9.12 Sharing Variables

If you want to share a variable between various components of your graph

one simple option is to create it first, then pass it as a parameter to the functions that need it.

def relu(X,threshold):
    with tf.name_scope("relu"):
        w_shape=(int(X.get_shape()[1]),1)
        w=tf.Variable(tf.random_normal(w_shape),name="weights")
        b=tf.Variable(0.0,name="bias")
        z=tf.add(tf.matmul(X,w),b,name="z")
        return tf.maximum(z,threshold,name="max")

n_features=3
threshold=tf.Variable(0.0,name="threshold")
X=tf.placeholder(tf.float32,shape=(None,n_features),name="X")
relus=[relu(X,threshold) for i in range(5)]
output=tf.add_n(relus, name="ooutput")

create a Python dictionary containing all the variables in their model, and pass it around to
every function.
create a class for each module (e.g., a ReLU class using class variables to handle the shared parameter).
another option is to set the shared variable as an attribute of the relu() function upon the first call

def relu(X):
    with tf.variable_scope("relu"):
        if not hasattr(relu,"threshold"):
            relu.threshold= tf.Variable(0.0,name="threshold")
        w_shape=(int(X.get_shape()[1]),1)
        w=tf.Variable(tf.random_normal(w_shape),name="weights")
        b=tf.Variable(0.0,name="bias")
        z=tf.add(tf.matmul(X,w),b,name="z")
        return tf.maximum(z,relu.threshold,name="max")

use the get_variable() function to create the shared variable if it does not exist yet, or reuse it if it already exists. Creating or reusing is controlled by $\verb+reuse+$ attribute of the current variable_scope(), which is $\verb+False+$ as default.

#create
with tf.variable_scope("relu"):
    threshold=tf.get_variable("threshold",shape=(),
                              initializer=tf.constant_initializer(0.0))

#reuse
with tf.variable_scope("relu", reuse=True):
    threshold = tf.get_variable("threshold")

#reuse
with tf.variable_scope("relu") as scope:
    scope.reuse_variables()
    threshold = tf.get_variable("threshold")

Combine creating and reusing, we have

import tensorflow as tf
n_features=3
def relu(X):
    with tf.variable_scope("relu", reuse=True):
        threshold = tf.get_variable("threshold") # reuse existing variable
        print(threshold.op.name)
        w_shape=(int(X.get_shape()[1]),1)
        w=tf.Variable(tf.random_normal(w_shape),name="weights")
        print(w.op.name)
        b=tf.Variable(0.0,name="bias")
        z=tf.add(tf.matmul(X,w),b,name="z")
        return tf.maximum(z, threshold, name="max")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
with tf.variable_scope("relu"): # create the variable
    threshold = tf.get_variable("threshold", shape=(),initializer=tf.constant_initializer(0.0))
relus = [relu(X) for relu_index in range(5)]
output = tf.add_n(relus, name="output")

output:

relu/threshold
relu_1/weights
relu/threshold
relu_2/weights
relu/threshold
relu_3/weights
relu/threshold
relu_4/weights
relu/threshold
relu_5/weights

Another clearer solution that put the initiation of $\verb+threshold+$ into $\verb+relu()+$ .

import tensorflow as tf
n_features=3
def relu(X):
    threshold = tf.get_variable("threshold", shape=(),
                                initializer=tf.constant_initializer(0.0))
    print(threshold.op.name)
    w_shape=(int(X.get_shape()[1]),1)
    w=tf.Variable(tf.random_normal(w_shape),name="weights")
    print(w.op.name)
    b=tf.Variable(0.0,name="bias")
    z=tf.add(tf.matmul(X,w),b,name="z")
    return tf.maximum(z, threshold, name="max")
X = tf.placeholder(tf.float32, shape=(None, n_features), name="X")
relus = []
for relu_index in range(5):
    with tf.variable_scope("relu", reuse=(relu_index >= 1)) as scope:
        relus.append(relu(X))
output = tf.add_n(relus, name="output")

output

relu/threshold
relu/weights
relu/threshold
relu_1/weights
relu/threshold
relu_2/weights
relu/threshold
relu_3/weights
relu/threshold
relu_4/weights

boywaiter

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
Chapter 9 Up and Running with TensorFlow

Chapter 9 Up and Running with TensorFlowOReilly. Hands-On Machine Learning with Scikit-Learn and TensorFlow读书笔记TensorFlow first defines in Python a graph of computations to perform, and then takes t...
复制链接

扫一扫

专栏目录