1. Graph operators
1.1 collection
tf.add_to_collection(name, value)
:
tf.get_collection(key)
: return The list of values in the collection with the given name
tf.add_to_collection('losses', cross_entropy_mean)
# The total loss is defined as the cross entropy loss plus all of the weight
# decay terms (L2 loss).
return tf.add_n(tf.get_collection('losses'), name='total_loss')
tf.add_n(inputs)
: Adds all input tensors element-wise.
常用数值计算
- 绝对值:
tf.abs
- 最大值:
rf.reduce_max
- tf.maximum:用法tf.maximum(a,b),返回的是a,b之间的最大值,
- tf.miniimum:用法tf.miiinimum(a,b),返回的是a,b之间的最小值,
- tf.argmax:用法tf.argmax(a,dimension),返回的是a中的某个维度最大值的索引,
import tensorflow as tf;
a = [1,5,3]
f1 = tf.maximum(a, 3)
f2 = tf.minimum(a, 3)
f3 = tf.argmax(a, 0)
f4 = tf.argmin(a, 0)
with tf.Session() as sess:
print sess.run(f1)#print f1.eval()
print sess.run(f2)
print sess.run(f3)
print sess.run(f4)
#### Results
[3 5 3]
[1 3 3]
1
0
condition
tf.where
tf.where(
condition,
x=None,
y=None,
name=None
)
# The condition tensor acts as a mask that chooses, based on the value at each element, whether the corresponding element / row in the output should be taken from x (if true) or y (if false).
LOW LEVEL API
1. Introduction
1.1 data -> tensor
- rank: number of dimensions
- shape: a tuple of integers specifying the array’s length along each dimension
- A tensor consists of a set of primitive values shaped into an array of any number of dimensions
- TensorFlow uses numpy arrays to represent tensor values.
1.2 computation
- building computational graph(
tf.Graph
) - running computational grapsh(
tf.Session
) - Graph: A computational graph is a series of TensorFlow operations arranged into a graph
- Operations(“ops”): nodes of graph
- Tensors: edges in the graph
- tf.Tensors do not have values, they are just handles to elements in the computation graph
1.3 Session
- can pass multiple tensors to tf.Session.run. The run method transparently handles any combination of tuples or dictionaries
sess = tf.Session()
print(sess.run({'ab':(a, b), 'total':total}))
# {'total': 7.0, 'ab': (3.0, 4.0)}
- a consistent value during a single run
vec = tf.random_uniform(shape=(3,)) # default in tf.constant
out1 = vec + 1
out2 = vec + 2
print(sess.run(vec))
print(sess.run(vec))
print(sess.run((out1, out2)))
## results
[ 0.52917576 0.64076328 0.68353939]
[ 0.66192627 0.89126778 0.06254101]
(
array([ 1.88408756, 1.87149239, 1.84057522], dtype=float32),
array([ 2.88408756, 2.87149239, 2.84057522], dtype=float32)
)
1.4 Feeding
feed_dict
argument can be used to overwrite any tensor in the graph- The only difference between placeholders and other tf.Tensors is that placeholders throw an error if no value is fed to them.
a = tf.constant(1)
b = tf.constant(2)
total = a + b
sess.run(total, feed_dict={a: 4})
>> 6
1.5 layers
2. Tensor
- tf.Variable
- tf.constant
- tf.placeholder
- tf.SparseTensor
2.1 shape
zeros = tf.zeros(my_matrix.shape[1])
2.2 data type
float_tensor = tf.cast(tf.constant([1, 2, 3]), dtype=tf.float32)
3. Variable
- represent shared, persistent state manipulated by your program.
- A tf.Variable represents a tensor whose value can be changed by running ops on it.
- Unlike tf.Tensor objects, a tf.Variable exists outside the context of a single session.run call.
3.1 creating
- default, float32, tf.glorot_uniform_initializer
my_variable = tf.get_variable("my_variable", [1, 2, 3])
# initialze by specified type
my_int_variable = tf.get_variable("my_int_variable", [1, 2, 3], dtype=tf.int32,
initializer=tf.zeros_initializer)
# initialize using specified value
other_variable = tf.get_variable("other_variable", dtype=tf.int32,
initializer=tf.constant([23, 42]))
3.2 variable collections
- named lists of tensors or other objects
- every
tf.Variable
gets placed in the following two collections
tf.GraphKeys.GLOBAL_VARIABLES
: variables that can be shared across multiple devicestf.GraphKeys.TRAINABLE_VARIABLES
: variables for which TensorFlow will calculate gradients.
- want variable not trainable
my_local = tf.get_variable("my_local", shape=(),
collections=[tf.GraphKeys.LOCAL_VARIABLES])
# or
my_non_trainable = tf.get_variable("my_non_trainable",
shape=(),
trainable=False)
- use own collection
# no need to explicitly create a collection
tf.add_to_collection("my_collection_name", my_local)
# retrieve a list of all the variables
tf.get_collection("my_collection_name")
3.3 initializing
- tf.global_variables_initializer
- initializing all variables in the tf.GraphKeys.GLOBAL_VARIABLES collection
3.4 using, assigning
- treat it like a normal tf.Tensor
- To assign a value to a variable, use the methods assign, assign_add
v = tf.get_variable("v", shape=(), initializer=tf.zeros_initializer())
assignment = v.assign_add(1)
tf.global_variables_initializer().run()
sess.run(assignment) # or assignment.op.run(), or assignment.eval()
3.5 sharing
- Implicitly wrapping
tf.Variable
objects withintf.variable_scope
objects.
def conv_relu(input, kernel_shape, bias_shape):
# Create variable named "weights".
weights = tf.get_variable("weights", kernel_shape,
initializer=tf.random_normal_initializer())
# Create variable named "biases".
biases = tf.get_variable("biases", bias_shape,
initializer=tf.constant_initializer(0.0))
conv = tf.nn.conv2d(input, weights,
strides=[1, 1, 1, 1], padding='SAME')
return tf.nn.relu(conv + biases)
def my_image_filter(input_images):
with tf.variable_scope("conv1"):
# Variables created here will be named "conv1/weights", "conv1/biases".
relu1 = conv_relu(input_images, [5, 5, 32, 32], [32])
with tf.variable_scope("conv2"):
# Variables created here will be named "conv2/weights", "conv2/biases".
return conv_relu(relu1, [5, 5, 32, 32], [32])
- reusing
def my_variable_sharing():
# variable sharing
# opt1
with tf.variable_scope("model") as scope:
output1 = my_image_filter(input1)
scope.reuse_variables()
output2 = my_image_filter(input2)
# opt2
with tf.variable_scope("model") as scope:
output1 = my_image_filter(input1)
with tf.variable_scope(scope, reuse=True):
output2 = my_image_filter(input2)
4. Graphs and Sessions
tf.Operation
nodetf.Tensor
edge
4.1 Naming operation
tf.Tensor
objects are implicitly named after thetf.Operation
that produces the tensor as output. A tensor name has the form"<OP_NAME>:<i>"
where:
"<OP_NAME>"
is the name of the operation that produces it."<i>"
is an integer representing the index of that tensor among the operation’s outputs.
4.2 Tensor-like objects
- operations take one or more
tf.Tensor
objects as arguments - these functions will accept a tensor-like object in place of a
tf.Tensor
, and implicitly convert it to a tf.Tensor using thetf.convert_to_tensor
method - Tensor-like objects
- tf.Variable
- numpy.ndarray
- list
- scalar pyton types
- TensorFlow will create a new tf.Tensor each time you use the same tensor-like object. use it multiple times, you may run out of memory.
- manually call
tf.convert_to_tensor
on the tensor-like object once and use the returned tf.Tensor instead.
- manually call
4.3 Session
to do
5. Save
to do
TensorFlow cookbook 笔记Chap1-2
Chap1
Tensor
- primary data structure
zero_tsr = tf.zeros([row_dim, col_dim])
ones_tsr = tf.ones([row_dim, col_dim])
# Create a constant filled tensor. Use the following
filled_tsr = tf.fill([row_dim, col_dim], 42)
constant_tsr = tf.constant([1,2,3])
- declare as variables or feed as placeholders
- sequence tensor
- stop include
- limit exclude
linear_tsr = tf.linspace(start=0, stop=1, start=3)
integer_seq_tsr = tf.range(start=6, limit=15, delta=3)
y_vals = np.repeat(10., 100)
x_vals = np.random.normal(1, 0.1, 100)
- random tensor
- uniform distribution
- normal distribution
randunif_tsr = tf.random_uniform([row_dim, col_dim],
minval=0, maxval=1)
randnorm_tsr = tf.random_normal([row_dim, col_dim],
mean=0.0, stddev=1.0)
- random entries of arrays
shuffled_output = tf.random_shuffle(input_tensor)
cropped_output = tf.random_crop(input_tensor, crop_size)
Placeholders and Variables
- Variables are the parameters of the algorithm and TensorFlow keeps track of how to change these to optimize the algorithm.
- Placeholders are objects that allow you to feed in data of a specific type and shape and depend on the results of the computational graph, such as the expected outcome of a computation.
- Placeholders are just holding the position for data to be fed into the graph. Placeholders get data from a
feed_dict
argument in the session. To put a placeholder in the graph, we must perform at least one operation on the placeholder
my_var = tf.Variable(tf.zeros([row_dim, col_dim]))
x_data = tf.placeholder(tf.float32, shape=(3, 5))
for x_val in x_vals:
print(sess.run(add1, feed_dict={x_data: x_val}))
# vary columns
x_data = tf.placeholder(tf.float32, shape=(3,None))
notes
- use
tf.get_variable
instead oftf.Variable
in work env
- it will make it way easier to refactor your code if you need to share variables at any time, e.g. in a multi-gpu setting
tf.Variable
will always create a new variable, whethertf.get_variable
gets from the graph an existing variable with those parameters, and if it does not exists, it creates a new one.- default: xavier initializer
W = tf.get_variable("W", shape=[784, 256], initializer=tf.contrib.layers.xavier_initializer())
with tf.variable_scope("one"):
a = tf.get_variable("v", [1]) #a.name == "one/v:0"
with tf.variable_scope("one"):
b = tf.get_variable("v", [1]) #ValueError: Variable one/v already exists
with tf.variable_scope("one", reuse = True):
c = tf.get_variable("v", [1]) #c.name == "one/v:0"
with tf.variable_scope("two"):
d = tf.get_variable("v", [1]) #d.name == "two/v:0"
e = tf.Variable(1, name = "v", expected_shape = [1]) #e.name == "two/v_1:0"
assert(a is c) #Assertion is true, they refer to the same object.
assert(a is d) #AssertionError: they are different objects
assert(d is e) #AssertionError: they are different objects
Matrices
tf.diag
tf.convert_to_tensor
identity_matrix = tf.diag([1.0, 1.0, 1.0])
A = tf.truncated_normal([2, 3])
B = tf.fill([2,3], 5.0)
C = tf.random_uniform([3,2])
D = tf.convert_to_tensor(np.array([[1., 2., 3.],[-3., -7.,
-1.],[0., 5., -2.]]))
print(sess.run(identity_matrix)
tf.matmul(A,B)
mat multiplicationtf.transpose(C)
transposetf.matrix_determinant()
tf.matrix_inverse()
tf.self_adjoint_eig()
eigenvalues and vectors
- outputs the eigenvalues in the first row
- the subsequent vectors in the remaining vectors
Operations
- add, sub, mul, div, mod
div()
returns the same type as the inputs. This means it really returns the floor of the divisiontruediv()
floordiv()
rounded down to the nearest integer
print(sess.run(tf.div(3,4)))
0
print(sess.run(tf.truediv(3,4)))
0.75
print(sess.run(tf.floordiv(3.0,4.0)))
0.0
- customize
def custom_polynomial(value):
return(tf.sub(3 * tf.square(value), value) + 10)
print(sess.run(custom_polynomial(11)))
362
Activation function
relu
rectified linear unit
relu6
# max(0,x)
print(sess.run(tf.nn.relu([-3., 3., 10.])))
[ 0. 3. 10.]
# min(max(0,x), 6)
print(sess.run(tf.nn.relu6([-3., 3., 10.])))
[ 0. 3. 6.]
- sigmoid
- not zero centered, require zero-mean the data
print(sess.run(tf.nn.sigmoid([-1., 0., 1.])))
[ 0.26894143 0.5
0.7310586 ]
- hyper tangent
- range between -1 and 1
# ((exp(x)-exp(-x)/exp(x)+exp(-x))
print(sess.run(tf.nn.tanh([-1., 0., 1.])))
[-0.76159418 0.
0.76159418 ]
- softsign, softplus, ELU
Chap2
Session
import tensorflow as tf
sess = tf.Session()
- use the same TensorFlow script if we reset the graph first
from tensorflow.python.framework import ops
ops.reset_default_graph()
sess = tf.Session()
Loss Function
- L2 norm
- it is very curved near the target
- algorithms can use this fact to converge to the target more slowly
nn.l2_loss()
half the L2-norm
l2_y_vals = tf.square(target - x_vals)
l2_y_out = sess.run(l2_y_vals)
- L1 norm
- The L1 norm is better for outliers than the L2 norm because it is not as steep for larger values
- L1 norm is not smooth at the target and this can result in algorithms not converging well
l1_y_vals = tf.abs(target - x_vals)
- Pseudo-Huber
- a continuous and smooth approximation to the Huber loss
- attempts to take the best of the L1 and L2 norms by being convex near the target and less steep for extreme values
- Hinge loss
hinge_y_vals = tf.maximum(0., 1. - tf.mul(target, x_vals))
hinge_y_out = sess.run(hinge_y_vals)
- cross entropy
# this might be for two classes?
# from cs231n, loss = - y log P(y|x)
# y is label, one-hot, so result is loss = -sum(log(a_i))
xentropy_y_vals = - tf.mul(target, tf.log(x_vals)) - tf.mul((1. -
target), tf.log(1. - x_vals))
xentropy_y_out = sess.run(xentropy_y_vals)
- sigmoid cross entropy
- weighted cross entropy
- softmax cross entropy
- sparse softmax corss-entropy
- metrics
- stable: whether smooth near target
- robust: whether sensitive to outliers
- batches
- specific loss function expects batches of data
my_output_expanded = tf.expand_dims(my_output, 0)
y_target_expanded = tf.expand_dims(y_target, 0)
Back Propagation
- minimize loss function
MomentumOptimizer()
AdagradOptimizer()
my_opt = tf.train.GradientDescentOptimizer(0.05)
train_step = my_opt.minimize(xentropy)
for i in range(1400):
rand_index = np.random.choice(100)
rand_x = [x_vals[rand_index]]
rand_y = [y_vals[rand_index]]
Batch and Stochastic Training
- mean loss
tf.reduce_mean()
loss = tf.reduce_mean(tf.square(my_output - y_target))
- record loss
loss_batch = []
for i in range(100):
rand_index = np.random.choice(100, size=batch_size)
rand_x = np.transpose([x_vals[rand_index]])
rand_y = np.transpose([y_vals[rand_index]])
sess.run(train_step, feed_dict={x_data: rand_x, y_target: rand_y})
if (i+1)%5==0:
print('Step #' + str(i+1) + ' A = ' + str(sess.run(A)))
temp_loss = sess.run(loss, feed_dict={x_data: rand_x, y_target: rand_y})
print('Loss = ' + str(temp_loss))
loss_batch.append(temp_loss)
Evaluation
- regression: an aggregate measure of the distance between predictions and actual targets
- classification: a measure of how close we are to the
truth from our predictions - split train and validation
train_indices = np.random.choice(len(x_vals), round(len(x_vals)*0.8), replace=False)
test_indices = np.array(list(set(range(len(x_vals))) - set(train_indices)))
- prediction operation
y_prediction = tf.squeeze(tf.round(tf.nn.sigmoid(tf.add(x_data,
A))))
correct_prediction = tf.equal(y_prediction, y_target)
accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
acc_value_test = sess.run(accuracy, feed_dict={x_data: [x_vals_test], y_target: [y_vals_test]})
acc_value_train = sess.run(accuracy, feed_dict={x_data: [x_vals_train], y_target: [y_vals_train]})
print('Accuracy' on train set: ' + str(acc_value_train))
print('Accuracy' on test set: ' + str(acc_value_test))
Accuracy on train set: 0.925
Accuracy on test set: 0.95
Others
1. broadcasting
- Broadcasting is the process of making arrays with different shapes have compatible shapes for arithmetic operations.
|1 2 3| + |7 8 9|
|4 5 6|
------------
|1 2 3| + |7 8 9| = |8 10 12|
|4 5 6| |7 8 9| |11 13 15|
-------------
|7| ==> |7 7 7|
|8| |8 8 8|
|9| |9 9 9|
Miscellaneous
Thoughts
- activation function
- loss function
- optimize method(gradient)
- update method(batch, mini)
Scope
- place manual layer with a named scope
so that it is identifiable and collapsible/expandable on the computational graph
with tf.name_scope('Custom_Layer') as scope:
custom_layer1 = custom_layer(mov_avg_layer)
image
- 4 dimentions
- image number, height, width, and channel
conv2d
- takes a piecewise product of the window and a filter we specify
- an input tensor of shape
[batch, in_height, in_width, in_channels]
- a filter / kernel tensor of shape
[filter_height, filter_width, in_channels, out_channels]
conv2d(
input,
filter,
strides,
padding,
use_cudnn_on_gpu=True,
data_format='NHWC',
name=None
)
squeeze
drop the extra dimensions of our image that are of size 1
- matrix multiplication only operates on two-dimensional matrices,
crop
randomly cropping an image
cropped_image = tf.random_crop(my_image, [height/2, width/2,
3])