tensorflow control flow 1---初探控制流

本文链接：https://blog.csdn.net/zhenhailiu/article/details/81154516

tensorflow control flow 1---初探控制流

控制流

控制流是指按一定的顺序排列程序元素来决定程序执行的顺序。简单的说，我们经常使用的编程语言C++/java/python里的if..else/while/case等就是控制流。这些语句结构决定程序的运行轨迹。tensorflow 计算图里也有这样的结构。

tensorflow 提供了几个往计算图中嵌入控制流的low level api。下面以几段代码片段为例，对这些api 做简要介绍，详细的api 介绍请参考api 文档。

tf.cond()

Signature: tf.cond(pred, true_fn=None, false_fn=None, strict=False, name=None, fn1=None, fn2=None) Docstring: Return true_fn() if the predicate pred is true else false_fn(). (deprecated arguments)

与寻常的编程语言条件语句类似，tf.cond 根据输入的条件的真假决定执行两个分支中的某一个。以下面这段代码为例：

import tensorflow as tf
x = tf.constant(2)
y = tf.constant(5)
def f1(): return tf.multiply(x, 17)
def f2(): return tf.add(y, 23)
r = tf.cond(tf.less(x, y), f1, f2)
with tf.Session() as session:
  session.run(tf.initialize_all_variables())
  print(r.eval())

这里x<y,f1对应的op 序列会被执行，执行的结果是：

tf.while_loop()

Signature: tf.while_loop(cond, body, loop_vars, shape_invariants=None, parallel_iterations=10, back_prop=True, swap_memory=False, name=None, maximum_iterations=None, return_same_structure=False) Docstring: Repeat body while the condition cond is true.

与编程语言while 循环类似，tf.while_loop根据条件表达式的结果，继续循环或跳出循环。以下面这段使用tensorflow计算第3个fabnacci 数的代码段为例：

import tensorflow as tf
def cond(a,b,i):
    print("from cond")
    return i<3
def body(a,b,i):
    print("from body")
    return (b,a+b,i+1)
r1,r2,r3=tf.while_loop(cond, body,[1,1,1])
with tf.Session() as sess:
    writer = tf.summary.FileWriter("logs/", sess.graph)
    tf.global_variables_initializer().run()
    print(r1.eval())
    #g=tf.get_default_graph()._as_graph_def()
    #print(g)

这里输出的结果是：

from cond
from body
2

有趣的是，这里 cond 和 body 两个函数只被执行了一次。后面章节会讲到，事实上，tf.while_loop只调用一次cond 和 body来建立计算图，而不是循环地调用cond 和 body。

tf.case()

Signature: tf.case(pred_fn_pairs, default=None, exclusive=False, strict=False, name='case') Docstring: Create a case operation.

与C/C++ 里的case 类似，tf.case 根据条件决定执行某个分支。以下面这段代码为例：

import tensorflow as tf
x = tf.constant(2)
y = tf.constant(5)
z = tf.constant(8)
def f1(): return tf.constant(17)
def f2(): return tf.constant(23)
def f3(): return tf.constant(-1)
r = tf.case({tf.less(x, y): f1, tf.greater(x, z): f2},default=f3, exclusive=True)
with tf.Session() as sess:
    print(r.eval())

这里这里x<y为真，f1对应的op序列被执行。

最终输出结果是：

tf.control_dependency()

在开始我们的讨论之前，我们先看一段代码

i=0
i=i+1
print(i)

这段代码输出的结果是：

编译器和物理机能够识别读i在对i 赋值之前，保证在执行print(i)时，i=i+1对应的指令已近执行完成。

一段貌似等价的tf 代码如下:

import tensorflow as tf
a = tf.Variable(0)
asign_op=tf.assign_add(a,1)
b=tf.tf.identity(a)#the tf way of reading a variable
with tf.Session() as sess:
    print(b.eval())

这段代码的输出是：

这个结果可能会使人感到意外。tf api 并没有传统的编译器和硬件编译、调度、执行指令那样，识别这种写后读依赖（read after write）。tf 提供了一个api 来供用户显式地指定op 之间的依赖关系。

正确地代码应该是：

import tensorflow as tf
a = tf.Variable(0)
asign_op=tf.assign_add(a,1)
with tf.control_dependencies([asign_op]):
    b=tf.tf.identity(a)#the tf way of reading a variable
with tf.Session() as sess:
    print(b.eval())

这段代码的输出是：

tf.group()

Signature: tf.group(*inputs, **kwargs) Docstring: Create an op that groups multiple operations.

When this op finishes, all ops in inputs have finished. This op has no output.

tf.group 不算严格意义上的控制流，只能算是一个helper function。tf.group 一般用来把多个有side effects 的op聚合在一起。tf.group 的一个典型的应用是在优化器中。一个优化器训练一步一般分为三个步骤，首先计算梯度，然后对梯度做一些操作，最后使用修整过的梯度更新模型权重。一个优化器执行优化，输出的 train_op 一般就是是调用tf.group的结果，这个op什么也不做，只是把多个更新权重的op 聚合在一起。以下面这段代码为例。

import tensorflow as tf
a = tf.Variable(3)
b = tf.Variable(4)
c = tf.Variable(5)
d =tf.assign_add(a,1)
e =tf.assign_add(b,1)
f =tf.assign_add(c,1)
ops = tf.group(d,e,f)
with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("a before ops:%s"%a.eval())
    print("b before ops:%s"%b.eval())
    print("c before ops:%s"%c.eval())
    print("run the group op")
    sess.run(ops)
    print("a after ops:%s"%a.eval())
    print("b after ops:%s"%b.eval())
    print("c after ops:%s"%c.eval())

执行ops 后，a、b、c 都加了1，也就是说，d、e、f都被执行了一次。

最后的输出结果为：

a before ops:3
b before ops:4
c before ops:5
run the group op
a after ops:4
b after ops:5
c after ops:6

tf.tuple()

Signature: tf.tuple(tensors, name=None, control_inputs=None) Docstring: Group tensors together.

This creates a tuple of tensors with the same values as the tensors argument, except that the value of each tensor is only returned after the values of all tensors have been computed.

tf.tuple 也不算严格意义上的控制流，只能算是一个helper function。tf.tuple 用来同步多个tensor 的计算，其作用类似于同步屏障（synchronization barrier),在这个屏障之前的op 都完成，才会输出结果。如果不考虑控制流，tf 的计算图就是简单的DAG图。一个DAG 图可能有多个拓扑排序，也就是说，DAG图中没有上下游关系的两个op的执行先后顺序是不确定的。设想这样一种情况，一个模型有两个权重变量，w1 和 w2。计算y 对w1的梯度需要使用w2的值，在计算y 对w1的梯度的时候，w2可能已经更新了，使用的是w2更新后的值。当然，这种不确定性也可以看作是某种程度的正则，而且能够增加计算的并行度，因为大部分情况下深度学习寻找到的都是局部最优解。一般而言，优化器可以指定权重变量间更新的同步策略。如果是严格的同步，对多个梯度对应的tensor调用 tf.tuple,这能保证，在任何一个梯度被用到的时候，其他的梯度也已经计算好，也就不存在上面的情况。

以下面的这段代码为例：

import tensorflow as tf
a = tf.Variable(3)
b = tf.Variable(4)
c = tf.Variable(5)
counter=tf.Variable(0)
add =tf.assign_add(counter,1)
d =a+1
with tf.control_dependencies([add]):
    e =b+1
f=c+1
dd,ee,ff = tf.tuple([d,e,f])

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print("eval conter before eval dd:"%counter.eval())
    print("eval dd:"%dd.eval())
    print("eval conter after eval dd:"%counter.eval())

运行这段代码，打印结果是

eval conter before eval dd:0
eval dd:4
eval conter after eval dd:1

dd是4,同时counter增加了1,也就是说add 被执行了。

一些很微妙的地方

但是在当controlflow和tf.Variable、control_dependencies在一起使用的时候，有时候结果可能咋一看是有些微妙的。

还是以一些代码片段为例。我只给出这样的一些例子，对这些例子现在暂且不做详细的解释，在接下来的博客中，我会从源码和实现的角度做出解释。

1 、test control dependency and whileloop

import tensorflow as tf

i = tf.get_variable("ii", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
n = tf.constant(10)
outcounter = tf.get_variable("counter1", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
incounter=tf.get_variable("counter2", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
outop = tf.assign_add(outcounter, 1, name="addcounter1")


def cond(a):
    return a < n


def body(a):
    innerop = tf.assign_add(incounter, 1, name="addcounter2")
    with tf.control_dependencies([outop,innerop]):
        a = a + 1
        return a


a = tf.while_loop(cond, body, [i])
#print(a)
#print(outop)
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(a.eval())
    print(incounter.eval())
    print(outcounter.eval())

运行上面这段代码，控制台打印出的结果是10，10，1，对应a,incounter,outcounter。为什么incounter和outcounter的值不相等呢？现在不解释。

2、control dependency、whileloop and cond

import tensorflow as tf

i = tf.get_variable("ii", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
n = tf.constant(10)
outcounter = tf.get_variable("counter1", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
incounter=tf.get_variable("counter2", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
outop = tf.assign_add(outcounter, 1, name="addcounter1")
pred = tf.constant(False)

def update_x_t(a):
	with tf.control_dependencies([outop]):
		return a+1
def update_x_f(a):
	with tf.control_dependencies([outop]):
		return a+1


def cond(a):
    # innerop

    return a < n


def body(a):
    innerop = tf.assign_add(incounter, 1, name="add2")
    with tf.control_dependencies([innerop]):
        a = a + 1
        y = tf.cond(pred,lambda:update_x_t(a),lambda:update_x_f(a))
        return a+y


a = tf.while_loop(cond, body, [i])
print(a)
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(a.eval())
    print(incounter.eval())
    print(outcounter.eval())

运行上面这段代码，控制台打印出的结果是21，3，0，对应a,incounter,outcounter。outcounter为什么是0呢？只有来自cond所在的while loop（whileloop 可以嵌套）里的控制依赖才会生效，之外的不会生效。

3、control dependency、whileloop and tf.Variable

import tensorflow as tf 
i = tf.get_variable("ii", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
n = tf.constant(10)
outcounter = tf.get_variable("counter1", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
incounter=tf.get_variable("counter2", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
outop = tf.assign_add(outcounter, 1, name="addcounter1")
b = tf.get_variable("b", dtype=tf.int32, shape=[], initializer=tf.zeros_initializer)
def variable():
        with tf.control_dependencies([outop]):
                return tf.Variable(lambda:b+1)
def variable2():
        def depadd():
                with tf.control_dependencies([outcounter.initializer]):
                        with tf.control_dependencies([tf.assign_add(outcounter, 1, name="addcounter1")]):
                                return b+1
        return tf.Variable(depadd)
                
	
def cond(a):
    return  a< n
def body(a):
    innerop=tf.assign_add(incounter,1,name="addcounter2")
    with tf.control_dependencies([innerop]):
        a = a + 1
        #在这选上面定义的variable或variable2,结果会不同
        var=variable2()
        #or
        #var=variable()
        return a+var
		

a= tf.while_loop(cond, body, [i])
with tf.Session() as sess:
    tf.global_variables_initializer().run()
    print(a.eval())
    print(incounter.eval())
    print(outcounter.eval())

运行上面这段代码，控制台打印出的结果是10，5，1，对应a,incounter,outcounter。以上使用的是variable2这个函数，当使用variable这个函数时，输出为 10，5，0。为什么有这样的区别呢？现在不解释。

在接下来的博客中，我会介绍tensorfow control flow 的原理，以及解读一些源码，顺便解答上面的一些疑惑。