TF_GPU_THREAD_MODE使用场景

最新推荐文章于 2023-06-29 23:15:31 发布

VIP文章松烟入墨

最新推荐文章于 2023-06-29 23:15:31 发布

阅读量754

点赞数

文章标签： tensorflow 深度学习

本文链接：https://blog.csdn.net/ccy_8491/article/details/121986245

版权

在tf源码中有这么一段注释：

  // Possible values:
  //   * global: PluggableDevice uses threads shared with CPU in the main
  //       compute thread-pool. This is currently the default.
  //   * gpu_private: PluggableDevice uses threads dedicated to this device.
  //   * gpu_shared: All PluggableDevices share a dedicated thread pool.

之前只能理解意思，但并不明白为什么要这么设置。在阅读Optimize TensorFlow GPU performance with the TensorFlow Profiler | TensorFlow Core

后，大致理解了其中的意思。

在多卡的场景中（如原生分布式或者其他分布式），在默认的global场景中，cpu和gpu共用一个线程池，而在tf中，线程池的线程数默认为cpu核数。因此当卡数越多，cpu与gpu，gpu与gpu之间就会造成争抢，当某个gpu抢占的线程数过多时，其他的gpu就因无线程可用而陷入了阻塞状态。从而性能下降。

为了解决这个问题，可以设置为gpu_private，再搭配上TF_GPU_THREAD_COUNT，使得每个gpu都有自己的线程池，并限制每个线程池的线程个数，从而避免争抢。

松烟入墨

关注

0
点赞
踩
1

收藏

觉得还不错? 一键收藏
0
评论
TF_GPU_THREAD_MODE使用场景

在tf源码中有这么一段注释： // Possible values: // * global: PluggableDevice uses threads shared with CPU in the main // compute thread-pool. This is currently the default. // * gpu_private: PluggableDevice uses threads dedicated to this device. /
复制链接

扫一扫