[root@dl3 cifar10]# python cifar10_multi_gpu_train.py --num_gpus=2
Traceback (most recent call last):
File "cifar10_multi_gpu_train.py", line 274, in <module>
tf.app.run()
File "/usr/lib/python2.7/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "cifar10_multi_gpu_train.py", line 270, in main
train()
File "cifar10_multi_gpu_train.py", line 211, in train
variables_averages_op = variable_averages.apply(tf.trainable_variables())
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/moving_averages.py", line 367, in apply
colocate_with_primary=True)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 113, in create_slot
return _create_slot_var(primary, val, "", validate_shape, None, None)
File "/usr/lib/python2.7/site-packages/tensorflow/python/training/slot_creator.py", line 66, in _create_slot_var
validate_shape=validate_shape)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 1065, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 962, in get_variable
use_resource=use_resource, custom_getter=custom_getter)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 367, in get_variable
validate_shape=validate_shape, use_resource=use_resource)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 352, in _true_getter
use_resource=use_resource)
File "/usr/lib/python2.7/site-packages/tensorflow/python/ops/variable_scope.py", line 682, in _get_single_variable
"VarScope?" % name)
ValueError: Variable conv1/weights/ExponentialMovingAverage/ does not exist, or was not created with tf.get_variable(). Did you mean to set reuse=None in VarScope?
you can find the answer to your problem here: Issue 6220
You need to put: with tf.variable_scope(tf.get_variable_scope()) in front of the loop which runs over your devices ...
so, do that:
with tf.variable_scope(tf.get_variable_scope()):for i in xrange(FLAGS.num_gpus):with tf.device('/gpu:%d'% i):
The explanation is given in the link... Here the quote:
When you do tf.get_variable_scope().reuse_variables() you set the current scope to reuse variables. If you call the optimizer in such scope, it's trying to reuse slot variables, which it cannot find, so it throws an error. If you put a scope around, the tf.get_variable_scope().reuse_variables() only affects that scope, so when you exit it, you're back in the non-reusing mode, the one you want.
Hope that helps, let me know if I should clarify more.