Thensorflow 在GPU上运行_thensorflow gpu-CSDN博客

本文链接：https://blog.csdn.net/Queenazh/article/details/52894646

本文介绍了如何在TensorFlow中使用GPU进行计算，包括通过设置log_device_placement观察操作分配，使用tf.device指定GPU，以及如何在多个GPU上并行执行任务。示例展示了在不同GPU上手动和自动分配计算的操作。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

Thensorflow 在GPU上运行

首先，可以在sess中使得log_device_placement 为True来查看每一部分在哪里执行

import tensorflow astf

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

c = tf.matmul(a, b)

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

printsess.run(c)

得到的结果如下：

. . .

Device mapping:

/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus id: 0000:08:00.0

. . .

b: /job:localhost/replica:0/task:0/gpu:0

a: /job:localhost/replica:0/task:0/gpu:0

MatMul: /job:localhost/replica:0/task:0/gpu:0

…

[[ 22.28.]

[ 49.64.]]

…

如果你不想让系统给你选，而想自己设置就需要用tf.device来创建一个device上下文。当在一个系统中有好几个GPU的时候，低层次的GPU会被自动选择。但是如果我们想让它运行在另一个GPU上，可以用tf.device来实现：

import tensorflow astf

with tf.device('/gpu:2'):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')

c = tf.matmul(a, b)

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

printsess.run(c)

如果我们有很多GPU，希望它们并行去解决一个问题，其实还是挺简单的，就是在前面加上with tf.device():, 给个栗子，可以按下面例子来配置GPU（这里作者偷懒，让两个GPU做一样的工作）：

import tensorflow astf

c = []

for d in ['/gpu:2', '/gpu:3']:

with tf.device(d):

a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])

b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])

c.append(tf.matmul(a, b))

with tf.device('/cpu:0'):

sum = tf.add_n(c)

# Creates a session with log_device_placement set toTrue.

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))

print sess.run(sum)

产生的结果是这样的：

. . .

Device mapping:

/job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c

/job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K40c

/job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K40c

/job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K40c

. . .

Const_3: /job:localhost/replica:0/task:0/gpu:3

I tensorflow/core/common_runtime/simple_placer.cc:289] Const_3: /job:localhost/replica:0/task:0/gpu:3

Const_2: /job:localhost/replica:0/task:0/gpu:3

I tensorflow/core/common_runtime/simple_placer.cc:289] Const_2: /job:localhost/replica:0/task:0/gpu:3

MatMul_1: /job:localhost/replica:0/task:0/gpu:3

I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul_1: /job:localhost/replica:0/task:0/gpu:3

Const_1: /job:localhost/replica:0/task:0/gpu:2

I tensorflow/core/common_runtime/simple_placer.cc:289] Const_1: /job:localhost/replica:0/task:0/gpu:2

Const: /job:localhost/replica:0/task:0/gpu:2

I tensorflow/core/common_runtime/simple_placer.cc:289] Const: /job:localhost/replica:0/task:0/gpu:2

MatMul: /job:localhost/replica:0/task:0/gpu:2

I tensorflow/core/common_runtime/simple_placer.cc:289] MatMul: /job:localhost/replica:0/task:0/gpu:2

AddN: /job:localhost/replica:0/task:0/cpu:0

I tensorflow/core/common_runtime/simple_placer.cc:289] AddN: /job:localhost/replica:0/task:0/cpu:0

[[44.56.]

[98.128.]]

. . .

当然，也可以来个复杂点的��：

import numpy as np
import tensorflow as tf
import datetime

用numpy package创建了两个随机数:

A = np.random.rand(1e4, 1e4).astype('float32')
B = np.random.rand(1e4, 1e4).astype('float32')

n = 10

这两个是用来存结果的:

c1 = []
c2 = []

定义来一个matpow() 方程：

defmatpow(M, n):
    if n &lt; 1: #Abstract cases where n &lt; 1
       return M
    else:
       return tf.matmul(M, matpow(M, n-1))

接下来才是正题，在一个gpu上算:

with tf.device('/gpu:0'):
    a = tf.constant(A)
    b = tf.constant(B)
    c1.append(matpow(a, n))
    c1.append(matpow(b, n))

with tf.device('/cpu:0'):
sum = tf.add_n(c1)

t1_1 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
sess.run(sum)
t2_1 = datetime.datetime.now()

在两个gpu上算，gpu0上存储A，然后算A^n，gpu1上存储B，然后算B^n，最后用CPU加起来:

with tf.device('/gpu:0'):
    #compute A^n and store result in c2
    a = tf.constant(A)
    c2.append(matpow(a, n))
 
with tf.device('/gpu:1'):
    #compute B^n and store result in c2
    b = tf.constant(B)
    c2.append(matpow(b, n))

with tf.device('/cpu:0'):
    sum = tf.add_n(c2) #Addition of all elements in c2, i.e. A^n + B^n
    t1_2 = datetime.datetime.now()

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    # Runs the op.
    sess.run(sum)
t2_2 = datetime.datetime.now()

最后看下运行时间:

print "Single GPU computation time: " + str(t2_1-t1_1)
print "Multi GPU computation time: " + str(t2_2-t1_2)

附上英文的原文链接

http://www.jorditorres.org/first-contact-with-tensorflow/