Tensorflow中的变量共享

最新推荐文章于 2023-04-09 15:24:04 发布

白马负金羁

最新推荐文章于 2023-04-09 15:24:04 发布

阅读量7.2k

点赞数 8

分类专栏：有待整理文章标签： TensorFlow get_variable tf.variable_scope 深度学习

本文链接：https://blog.csdn.net/baimafujinji/article/details/50485557

版权

有待整理专栏收录该内容

5 篇文章 41 订阅

订阅专栏

Tensorflow中有两个关于variable的op，tf.Variable()与tf.get_variable()，其中前者的初始化方法如下：

__init__(
    initial_value=None,
    trainable=True,
    collections=None,
    validate_shape=True,
    caching_device=None,
    name=None,
    variable_def=None,
    dtype=None,
    expected_shape=None,
    import_scope=None
)

后者的声明如下：

get_variable(
    name,
    shape=None,
    dtype=None,
    initializer=None,
    regularizer=None,
    trainable=True,
    collections=None,
    caching_device=None,
    partitioner=None,
    validate_shape=True,
    use_resource=None,
    custom_getter=None,
    constraint=None
)

正如上面所展示的那样，tf.get_variable()的构造函数中有很多参数，但是其中最为常用的为下面参数列表中的三个：

tf.get_variable(<name>, <shape>, <initializer>)

其中，name就是变量的名称，shape是变量的维度，initializer是变量初始化的方式，初始化的方式有以下几种：

tf.constant_initializer：常量初始化函数
tf.random_normal_initializer：正态分布
tf.truncated_normal_initializer：截取的正态分布
tf.random_uniform_initializer：均匀分布
tf.zeros_initializer：全部是0
tf.ones_initializer：全是1
tf.uniform_unit_scaling_initializer：满足均匀分布，但不影响输出数量级的随机值

来看一个例子：

import tensorflow as tf;
import numpy as np;

a1 = tf.get_variable(name='a1', shape=[2,3], initializer=tf.random_normal_initializer(mean=0, stddev=1))
a2 = tf.get_variable(name='a2', shape=[1], initializer=tf.constant_initializer(1))
a3 = tf.get_variable(name='a3', shape=[2,3], initializer=tf.ones_initializer())

with tf.Session() as sess:
    sess.run(tf.global_variables_initializer())
    print sess.run(a1)
    print sess.run(a2)
    print sess.run(a3)

上述代码的输出结果如下：

$ python 003.py

[[-0.62331879 -0.81951612 -0.08562502]
 [-1.11456835 -0.39181954 -1.27018476]]
[ 1.]
[[ 1.  1.  1.]
 [ 1.  1.  1.]]

此外，你可以看到的一个区别是前者name可以为空，也就是说系统会自行处理。而后者的Name则是必须被指定的，所以可见系统不会自行处理。这个看似细微的区别其实恰恰反映出二者本质上的不同。事实上使用tf.Variable时，如果检测到命名冲突，系统会自己处理。使用tf.get_variable()时，系统不会处理冲突，而会报错，例如：

import tensorflow as tf

w_1 = tf.Variable(3, name="w_1")
w_2 = tf.Variable(1, name="w_1")

print(w_1.name)
print(w_2.name)

如果你执行程序，你会得到类似下面这样的输出：

$ python 001.py
w_1:0
w_1_1:0

但是同样的事情如果使用get_variable则会报错。

import tensorflow as tf

## initializer: Initializer for the variable if one is created.
w_1 = tf.get_variable(name="w_1", initializer=tf.zeros_initializer)
w_2 = tf.get_variable(name="w_1", initializer=tf.ones_initializer)

输出的错误信息如下：

ValueError: Variable w_1 already exists, disallowed. 
Did you mean to set reuse=True in VarScope?

可见，如果使用get_variable，那么不同的变量之间不能有相同的名字，除非你定义了variable_scope，这样才可以有相同的名字。来看下面一段代码：

import tensorflow as tf

with tf.variable_scope("scope1"):
    w1 = tf.get_variable("w1", shape=[])
    w2 = tf.Variable(0.0, name="w2")
with tf.variable_scope("scope1", reuse=True):
    w1_p = tf.get_variable("w1", shape=[])
    w2_p = tf.Variable(1.0, name="w2")

print(w1 is w1_p)
print(w2 is w2_p)

上述代码的输出如下：

$ python 004.py
True
False

所以，get_variable()与Variable的实质区别在于 tf.Variable() 每次都在创建新对象，所有reuse=True 和它并没有什么关系。而对于get_variable()来说，如果已经创建的变量对象，就把那个对象返回，如果没有创建变量对象的话，就创建一个新的（在其他情况下，这两个操作的用法是一样的）。

有些时候神经网络中的某些结构要共享同一套变量，所以TensorFlow引入了变量共享机制。而tf.get_variable与tf.variable_scope正是TensorFlow中用以实现共享变量的两个主要函数。

正如前面所演示的，tf.get_variable 和tf.Variable不同的一点是，前者拥有一个变量检查机制，会检测已经存在的变量是否设置为共享变量，如果已经存在的变量没有设置为共享变量，TensorFlow 运行到第二个拥有相同名字的变量的时候，就会报错。

例如下面的代码：

import numpy as np
import tensorflow as tf

def my_function():
    weights = tf.Variable(tf.random_normal([1, 5], mean=0, stddev=1), name="weights")
    biases = tf.Variable(tf.ones([1,5]), name="biases")
    return  weights + biases


result1 = my_function()
result2 = my_function()

sess = tf.Session()

init = tf.global_variables_initializer()  
sess.run(init)  

print(sess.run(result1))
print(sess.run(result2))

有两个变量weighs和biases，如果直接调用两次，不会出什么问题，但是会生成两套变量；

[[ 0.36753052  1.73602903  2.14643383 -0.56646061  1.92033052]]
[[ 1.55050373  3.71504855  1.55975127  0.54990262  0.7442745 ]]

如果把 tf.Variable 改成 tf.get_variable，直接调用两次，就会出问题了：

import numpy as np
import tensorflow as tf

def my_function():
    weights = tf.get_variable(name = "weights", shape=[1, 5], 
                              initializer=tf.random_normal_initializer(mean=0, stddev=1))
    biases = tf.get_variable(name = "biases", shape=[1,5], initializer=tf.ones_initializer())
    return weights + biases


result1 = my_function()
result2 = my_function()

sess = tf.Session()

init = tf.global_variables_initializer()  
sess.run(init)  

print(sess.run(result1))
print(sess.run(result2))

上述代码输出的错误信息如下：

ValueError: Variable weights already exists, disallowed. 
Did you mean to set reuse=True in VarScope?

为了解决这个问题，便可以使用TensorFlow 中的 tf.variable_scope 函数：它的主要作用是，在一个作用域 scope 内共享一些变量，例如：

import numpy as np
import tensorflow as tf

def my_function():
    weights = tf.get_variable(name = "weights", shape=[1, 5], 
                              initializer=tf.random_normal_initializer(mean=0, stddev=1))
    biases = tf.get_variable(name = "biases", shape=[1,5], initializer=tf.ones_initializer())
    return weights + biases

with tf.variable_scope('my_scope1'):
    result1 = my_function()

with tf.variable_scope('my_scope1', reuse = True):
    result2 = my_function()
    
sess = tf.Session()

init = tf.global_variables_initializer()  
sess.run(init)  

print(sess.run(result1))
print(sess.run(result2))

程序输出了两组一致的结果，可见变量被共享了。

[[ 2.36001348  1.99717593  2.47400618  0.1312393   2.0834167 ]]
[[ 2.36001348  1.99717593  2.47400618  0.1312393   2.0834167 ]]

上面variable_scope的部分你也可以写成（二者效果是一样的）：

with tf.variable_scope('my_scope1') as scope1:
    result1 = my_function()

with tf.variable_scope(scope1, reuse = True):
    result2 = my_function()

再或者也可以写成：

with tf.variable_scope("my_scope") as my_scope:
    result1 = my_function()
    my_scope.reuse_variables() # or 
    #tf.get_variable_scope().reuse_variables()
    result2 = my_function()

（本文完）