使用GPU

最新推荐文章于 2024-07-02 08:01:16 发布

片刻小哥哥

最新推荐文章于 2024-07-02 08:01:16 发布

阅读量2.1k

点赞数

分类专栏： TensorFlow 文章标签：使用GPU 中文文档 ApacheCN TensorFlow

本文链接：https://blog.csdn.net/u010859707/article/details/73251648

版权

TensorFlow 专栏收录该内容

8 篇文章 0 订阅

订阅专栏

支持的设备

在典型的系统中，有多个计算设备。在TensorFlow中，支持的设备类型是CPU和GPU。它们被表示为strings。例如：

"/cpu:0"：机器的CPU
"/gpu:0"你的机器的GPU，如果你有一个。
"/gpu:1"你的机器的第二个GPU等

如果TensorFlow操作既具有CPU和GPU实现，则在将操作分配给设备时，GPU设备将被赋予优先级。例如， matmul具有CPU和GPU内核。在用设备的系统cpu:0和 gpu:0，gpu:0将选择运行matmul。

记录设备布局

要找出您的操作和张量被分配给哪些设备，请创建log_device_placement配置选项设置为的会话True。

 
        
             # Creates a graph. 
            
 
             a  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             2 
             ,  
             3 
             ], name 
             = 
             'a' 
             ) 
            
 
             b  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             3 
             ,  
             2 
             ], name 
             = 
             'b' 
             ) 
            
 
             c  
             =  
             tf.matmul(a, b) 
            
 
             # Creates a session with log_device_placement set to True. 
            
 
             sess  
             =  
             tf.Session(config 
             = 
             tf.ConfigProto(log_device_placement 
             = 
             True 
             )) 
            
 
             # Runs the op. 
            
 
             print 
             (sess.run(c))  
            
 
      

您应该看到以下输出：

 
        
             Device mapping: 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0  
             - 
             > device:  
             0 
             , name: Tesla K40c, pci bus 
            
 
             id 
             :  
             0000 
             : 
             05 
             : 
             00.0 
            
 
             b:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0 
            
 
             a:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0 
            
 
             MatMul:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0 
            
 
             [[  
             22.   
             28. 
             ] 
            
 
               
             [  
             49.   
             64. 
             ]]  
            
 
      

手动装置放置

如果您希望特定的操作在您选择的设备上运行，而不是自动选择with tf.device 的设备，则可以使用创建设备上下文，使该上下文中的所有操作具有相同的设备分配。

 
        
             # Creates a graph. 
            
 
             with tf.device( 
             '/cpu:0' 
             ): 
            
 
                
             a  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             2 
             ,  
             3 
             ], name 
             = 
             'a' 
             ) 
            
 
                
             b  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             3 
             ,  
             2 
             ], name 
             = 
             'b' 
             ) 
            
 
                
             c  
             =  
             tf.matmul(a, b) 
            
 
             # Creates a session with log_device_placement set to True. 
            
 
             sess  
             =  
             tf.Session(config 
             = 
             tf.ConfigProto(log_device_placement 
             = 
             True 
             )) 
            
 
             # Runs the op. 
            
 
             print 
             (sess.run(c))  
            
 
      

你会看到现在a并被b分配到cpu:0。

 
        
             Device mapping: 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0  
             - 
             > device:  
             0 
             , name: Tesla K40c, pci bus 
            
 
             id 
             :  
             0000 
             : 
             05 
             : 
             00.0 
            
 
             b:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             cpu: 
             0 
            
 
             a:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             cpu: 
             0 
            
 
             MatMul:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0 
            
 
             [[  
             22.   
             28. 
             ] 
            
 
               
             [  
             49.   
             64. 
             ]]  
            
 
      

允许GPU内存增长

默认情况下，TensorFlow将几乎所有GPU的GPU内存映射 CUDA_VISIBLE_DEVICES到该进程的可见内容。这样做可以通过减少内存碎片来更有效地使用设备上相对宝贵的GPU 内存资源。

在某些情况下，该过程仅需要分配可用存储器的一个子集，或只是根据该过程需要增加内存使用量。TensorFlow在会话上提供两个配置选项来控制。

第一个是allow_growth选项，它试图根据运行时分配分配只有GPU内存：它开始分配很少的内存，随着Sessions的运行和更多的GPU内存的需要，我们扩展了TensorFlow所需的GPU内存区域处理。请注意，我们不释放内存，因为这可能会导致更糟糕的内存碎片。要打开此选项，请在ConfigProto中将选项设置为：

 
             config  
             =  
             tf.ConfigProto() 
            
             config.gpu_options.allow_growth  
             =  
             True 
            
             session  
             =  
             tf.Session(config 
             = 
             config, ...)

第二种方法是per_process_gpu_memory_fraction选项，它决定了每个可见GPU应分配的总体内存量的分数。例如，您可以告诉TensorFlow仅通过以下方式分配每个GPU的总内存的40％：

 
             config  
             =  
             tf.ConfigProto() 
            
             config.gpu_options.per_process_gpu_memory_fraction  
             =  
             0.4 
            
             session  
             =  
             tf.Session(config 
             = 
             config, ...)

如果要真正限制TensorFlow进程可用的GPU内存量，这是非常有用的。

在多GPU系统上使用单个GPU

如果您的系统中有多个GPU，则默认情况下将选择具有最低ID的GPU。如果您想在不同的GPU上运行，则需要明确指定首选项：

 
        
             # Creates a graph. 
            
 
             with tf.device( 
             '/gpu:2' 
             ): 
            
 
                
             a  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             2 
             ,  
             3 
             ], name 
             = 
             'a' 
             ) 
            
 
                
             b  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             3 
             ,  
             2 
             ], name 
             = 
             'b' 
             ) 
            
 
                
             c  
             =  
             tf.matmul(a, b) 
            
 
             # Creates a session with log_device_placement set to True. 
            
 
             sess  
             =  
             tf.Session(config 
             = 
             tf.ConfigProto(log_device_placement 
             = 
             True 
             )) 
            
 
             # Runs the op. 
            
 
             print 
             (sess.run(c))  
            
 
      

如果您指定的设备不存在，您将获得 InvalidArgumentError：

 
        
             InvalidArgumentError: Invalid argument: Cannot assign a device to node  
             'b' 
             : 
            
 
             Could  
             not  
             satisfy explicit device specification  
             '/gpu:2' 
            
 
                 
             [[Node: b  
             =  
             Const[dtype 
             = 
             DT_FLOAT, value 
             = 
             Tensor< 
             type 
             :  
             float  
             shape: [ 
             3 
             , 
             2 
             ] 
            
 
                 
             values:  
             1  
             2  
             3. 
             ..>, _device 
             = 
             "/gpu:2" 
             ]()]]  
            
 
      

如果您想TensorFlow自动选择现有的支持机构运行的情况下，指定一个不存在的操作，您可以设置allow_soft_placement以True创建会话时的配置选项。

 
        
             # Creates a graph. 
            
 
             with tf.device( 
             '/gpu:2' 
             ): 
            
 
                
             a  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             2 
             ,  
             3 
             ], name 
             = 
             'a' 
             ) 
            
 
                
             b  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             3 
             ,  
             2 
             ], name 
             = 
             'b' 
             ) 
            
 
                
             c  
             =  
             tf.matmul(a, b) 
            
 
             # Creates a session with allow_soft_placement and log_device_placement set 
            
 
             # to True. 
            
 
             sess  
             =  
             tf.Session(config 
             = 
             tf.ConfigProto( 
            
 
                    
             allow_soft_placement 
             = 
             True 
             , log_device_placement 
             = 
             True 
             )) 
            
 
             # Runs the op. 
            
 
             print 
             (sess.run(c))  
            
 
      

使用多个GPU

如果您想在多个GPU上运行TensorFlow，您可以以多塔方式构建您的模型，其中每个塔分配给不同的GPU。例如：

 
        
             # Creates a graph. 
            
 
             c  
             =  
             [] 
            
 
             for  
             d  
             in  
             [ 
             '/gpu:2' 
             ,  
             '/gpu:3' 
             ]: 
            
 
                
             with tf.device(d): 
            
 
                  
             a  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             2 
             ,  
             3 
             ]) 
            
 
                  
             b  
             =  
             tf.constant([ 
             1.0 
             ,  
             2.0 
             ,  
             3.0 
             ,  
             4.0 
             ,  
             5.0 
             ,  
             6.0 
             ], shape 
             = 
             [ 
             3 
             ,  
             2 
             ]) 
            
 
                  
             c.append(tf.matmul(a, b)) 
            
 
             with tf.device( 
             '/cpu:0' 
             ): 
            
 
                
             sum  
             =  
             tf.add_n(c) 
            
 
             # Creates a session with log_device_placement set to True. 
            
 
             sess  
             =  
             tf.Session(config 
             = 
             tf.ConfigProto(log_device_placement 
             = 
             True 
             )) 
            
 
             # Runs the op. 
            
 
             print 
             (sess.run( 
             sum 
             ))  
            
 
      

您将看到以下输出。

 
        
             Device mapping: 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             0  
             - 
             > device:  
             0 
             , name: Tesla K20m, pci bus 
            
 
             id 
             :  
             0000 
             : 
             02 
             : 
             00.0 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             1  
             - 
             > device:  
             1 
             , name: Tesla K20m, pci bus 
            
 
             id 
             :  
             0000 
             : 
             03 
             : 
             00.0 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             2  
             - 
             > device:  
             2 
             , name: Tesla K20m, pci bus 
            
 
             id 
             :  
             0000 
             : 
             83 
             : 
             00.0 
            
 
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             3  
             - 
             > device:  
             3 
             , name: Tesla K20m, pci bus 
            
 
             id 
             :  
             0000 
             : 
             84 
             : 
             00.0 
            
 
             Const_3:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             3 
            
 
             Const_2:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             3 
            
 
             MatMul_1:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             3 
            
 
             Const_1:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             2 
            
 
             Const:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             2 
            
 
             MatMul:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             gpu: 
             2 
            
 
             AddN:  
             / 
             job:localhost 
             / 
             replica: 
             0 
             / 
             task: 
             0 
             / 
             cpu: 
             0 
            
 
             [[   
             44.    
             56. 
             ] 
            
 
               
             [   
             98.   
             128. 
             ]]  
            
 
      

该cifar10教程是一个很好的例子演示了如何做多GPU训练。

片刻小哥哥

关注

0
点赞
踩
2

收藏

觉得还不错? 一键收藏
0
评论
使用GPU

支持的设备在典型的系统中，有多个计算设备。在TensorFlow中，支持的设备类型是CPU和GPU。它们被表示为strings。例如："/cpu:0"：机器的CPU"/gpu:0"你的机器的GPU，如果你有一个。"/gpu:1"你的机器的第二个GPU等如果TensorFlow操作既具有CPU和GPU实现，则在将操作分配给设备时，GPU设备将被赋予优先级。例如， matmul
复制链接

扫一扫

专栏目录