Thensorflow 在GPU上运行
首先,可以在sess中使得log_device_placement 为True来查看每一部分在哪里执行
import tensorflow astf
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
printsess.run(c)
得到的结果如下:
. . .
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus id: 0000:08:00.0
. . .
b: /job:localhost/replica:0/task:0/gpu:0
a: /job:localhost/replica:0/task:0/gpu:0
MatMul: /job:localhost/replica:0/task:0/gpu:0
…
[[ 22.28.]
[ 49.64.]]
…
如果你不想让系统给你选,而想自己设置就需要用tf.device来创建一个device上下文。当在一个系统中有好几个GPU的时候,低层次的GPU会被自动选择。但是如果我们想让它运行在另一个GPU上,可以用tf.device来实现:
import tensorflow astf
with tf.device('/gpu:2'):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
c = tf.matmul(a, b)
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
printsess.run(c)
如果我们有很多GPU,希望它们并行去解决一个问题,其实还是挺简单的,就是在前面加上with tf.device():, 给个栗子,可以按下面例子来配置GPU(这里作者偷懒,让两个GPU做一样的工作):
import tensorflow astf
c = []
for d in ['/gpu:2', '/gpu:3']:
with tf.device(d):
a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
c.append(tf.matmul(a, b))
with tf.device('/cpu:0'):
sum = tf.add_n(c)
# Creates a session with log_device_placement set toTrue.
sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
print sess.run(sum)
产生的结果是这样的:
. . .
Device mapping:
/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c
/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K40c
/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K40c
/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K40c
. . .
. . .
Const_3: /job:localhost/replica:0/task:0/gpu:3
I tensorflow/core/common_runtime/simple_placer.cc:289] Const_3: /job:localhost/replica:0/task:0/gpu:3
Const_2: /job:localhost/replica:0/task:0/gpu:3
I tensorflow/core/common_runtime/simple_placer.cc:289] Const_2: /job:localhost/replica:0/task:0/gpu:3
MatMul_1: /job:localhost/replica:0/task:0/gpu:3
I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul_1: /job:localhost/replica:0/task:0/gpu:3
Const_1: /job:localhost/replica:0/task:0/gpu:2
I tensorflow/core/common_runtime/simple_placer.cc:289] Const_1: /job:localhost/replica:0/task:0/gpu:2
Const: /job:localhost/replica:0/task:0/gpu:2
I tensorflow/core/common_runtime/simple_placer.cc:289] Const: /job:localhost/replica:0/task:0/gpu:2
MatMul: /job:localhost/replica:0/task:0/gpu:2
I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul: /job:localhost/replica:0/task:0/gpu:2
AddN: /job:localhost/replica:0/task:0/cpu:0
I tensorflow/core/common_runtime/simple_placer.cc:289] AddN: /job:localhost/replica:0/task:0/cpu:0
[[44.56.]
[98.128.]]
. . .
当然,也可以来个复杂点的��:
import numpy as np import tensorflow as tf import datetime
用numpy package创建了两个随机数:
A = np.random.rand(1e4, 1e4).astype('float32') B = np.random.rand(1e4, 1e4).astype('float32') n = 10
这两个是用来存结果的:
c1 = [] c2 = []
定义来一个matpow() 方程:
defmatpow(M, n): if n < 1: #Abstract cases where n < 1 return M else: return tf.matmul(M, matpow(M, n-1))
接下来才是正题,在一个gpu上算:
with tf.device('/gpu:0'): a = tf.constant(A) b = tf.constant(B) c1.append(matpow(a, n)) c1.append(matpow(b, n)) with tf.device('/cpu:0'): sum = tf.add_n(c1) t1_1 = datetime.datetime.now() with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: sess.run(sum) t2_1 = datetime.datetime.now()
在两个gpu上算,gpu0上存储A,然后算A^n,gpu1上存储B,然后算B^n,最后用CPU加起来:
with tf.device('/gpu:0'): #compute A^n and store result in c2 a = tf.constant(A) c2.append(matpow(a, n)) with tf.device('/gpu:1'): #compute B^n and store result in c2 b = tf.constant(B) c2.append(matpow(b, n)) with tf.device('/cpu:0'): sum = tf.add_n(c2) #Addition of all elements in c2, i.e. A^n + B^n t1_2 = datetime.datetime.now() with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess: # Runs the op. sess.run(sum) t2_2 = datetime.datetime.now()
最后看下运行时间:
print "Single GPU computation time: " + str(t2_1-t1_1) print "Multi GPU computation time: " + str(t2_2-t1_2)
附上英文的原文链接
http://www.jorditorres.org/first-contact-with-tensorflow/