当使用多个gpu训练时,输入数据为batch_size*num_gpu,这样模型训练时间可以大大较小.
tensorflow中使用制定gpu可以通过tf.device()实现.例如我想使用0号显卡:
gpu_ind=0
with tf.device("/gpu:{}".format(gpu_ind))
下面介绍一下多gpu模型训练.代码参考自OpenSeq2Seq:https://github.com/NVIDIA/OpenSeq2Seq
关于多gpu模型定义文件为OpenSeq2Seq/model/model_base.py
首先将定义输入数据,并拆分为多个gpu的输入:
# placeholders for feeding data
self.x = tf.placeholder(tf.int32, [self.global_batch_size, None])
self.x_length = tf.placeholder(tf.int32, [self.global_batch_size])
self.y = tf.placeholder(tf.int32, [self.global_batch_size, None])
self.y_length = tf.placeholder(tf.int32, [self.global_batch_size])
# below we follow data parallelism for multi-GPU training
# actual per GPU data feeds
xs = tf.split(value=self.x, num_or_size_splits=num_gpus, axis=0)
x_lengths = tf.split(value=self.x