windows下cuda的安装过程推荐:http://www.jianshu.com/p/c245d46d43f0
使用TensorFlow-gpu+cuda,确实能让计算效率提高10倍,我在装完后时出现的问题:
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cublas64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cudnn64_5.dll locallyI c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library cufft64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library nvcuda.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\dso_loader.cc:128] successfully opened CUDA library curand64_80.dll locally
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1050
major: 6 minor: 1 memoryClockRate (GHz) 1.455
pciBusID 0000:01:00.0
Total memory: 2.00GiB
Free memory: 1.73GiB
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:906] DMA: 0
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:916] 0: Y
I c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1050, pci bus id: 0000:01:00.0)
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_device.cc:586] Could not identify NUMA node of /job:localhost/replica:0/task:0/gpu:0, defaulting to 0. Your kernel may not have been built with NUMA support.
什么是 NUMA support?
在网上找到这个:http://www.cnblogs.com/cenalulu/p/4358802.html,这个numa也会在mysql上作怪:
Jeremy Cole大神的两篇文章
- The MySQL “swap insanity” problem and the effects of the NUMA architecture
- A brief update on NUMA and MySQL
大神解释的非常详尽,有兴趣的读者可以直接看原文。博主这里做一个简单的总结:
- CPU规模因摩尔定律指数级发展,而总线发展缓慢,导致多核CPU通过一条总线共享内存成为瓶颈
- 于是NUMA出现了,CPU平均划分为若干个Chip(不多于4个),每个Chip有自己的内存控制器及内存插槽
- CPU访问自己Chip上所插的内存时速度快,而访问其他CPU所关联的内存(下文称Remote Access)的速度相较慢三倍左右
- 于是Linux内核默认使用CPU亲和的内存分配策略,使内存页尽可能的和调用线程处在同一个Core/Chip中
- 由于内存页没有动态调整策略,使得大部分内存页都集中在CPU 0上
- 又因为Reclaim策略优先淘汰/Swap本Chip上的内存,使得大量有用内存被换出
- 当被换出页被访问时问题就以数据库响应时间飙高甚至阻塞的形式出现了
总之问题可能出在操作系统上,如果用Linux可能就没这问题,另有大神:http://www.infocool.net/kb/WWW/201612/235061.html
里翻出TensorFlow源码,找到这警告的原因:
定位到 tensorflow/tensorflow/core/common_runtime/gpu/gpu_device.cc
根据注释的说明,只要我们不是使用多GPU,这个警告应该是可以忽略的,所以我们目前也不需要担心了。
另外可以用gpu-z看看gup和显存的使用情况: