# Calculate the gradients for each model tower.
tower_grads = []
with tf.variable_scope(tf.get_variable_scope()):
for i in xrange(FLAGS.num_gpus):
with tf.device('/gpu:%d' % i):
with tf.name_scope(
'%s_%d' % (TOWER_NAME, i)) as scope:
# Calculate the loss for one tower of the CIFAR model.
# This function constructs the entire CIFAR model but
# shares the variables across all towers.
loss = tower_loss(scope)
# Reuse variables for the next tower.
tf.get_variable_scope().reuse_variables()
# Retain the summaries from the final tower.
summaries = tf.get_collection(tf.GraphKeys.SUMMARIES,
scope)
# Calculate the gradients for the batch of data on this
# MNIST tower.
grads = opt.compute_gradients(loss, gate_gradients=0)
# Keep track of the gradients across all towers.
tower_grads.append(grads)
# We must calculate the mean of each gradient. Note that this is the
# synchronization point across all towers.
grads = average_gradients(tower_grads)
train_op = opt.apply_gradients(grads, global_step=global_step)
先对每个Tower求gradient,然后求把各个Tower的gradient加起来,因为这个过程TF在计算的时候,发现要计算sum依赖于所有Tower的gradient,所以会分别触发多个GPU上的运算,这样就达到了多GPU的目的
求出了所有的gradient,接着就apply_gradient了
This setup requires that all GPUs share the model parameters. A well-known fact is that transferring data to and from GPUs is quite slow. For this reason, we decide to store and update all model parameters on the CPU (see green box). A fresh set of model parameters is transferred to the GPU when a new batch of data is processed by all GPUs.
The GPUs are synchronized in operation. All gradients are accumulated from the GPUs and averaged (see green box). The model parameters are updated with the gradients averaged across all model replicas.
Refer:
https://github.com/normanheckscher/mnist-multi-gpu