tensorflow安装测试运行常见问题

tensorflow安装测试运行常见问题


1.源码安装tensorflow执行配置时,Cudnn版本选择:

默认6.0,此时选择5.0会重复请求,假设版本是5.1Cudnn,应该选择5.

2.GPU版本的警告问题:

2017-07-16 19:51:37.361838: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-16 19:51:37.361861: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-16 19:51:37.361865: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2017-07-16 19:51:37.361868: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2017-07-16 19:51:37.361872: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2017-07-16 19:51:37.398036: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:893] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2017-07-16 19:51:37.398215: I tensorflow/core/common_runtime/gpu/gpu_device.cc:940] Found device 0 with properties:
name: GeForce GTX 760
major: 3 minor: 0 memoryClockRate (GHz) 1.137
pciBusID 0000:01:00.0
Total memory: 1.95GiB
Free memory: 1.66GiB
2017-07-16 19:51:37.398228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:961] DMA: 0
2017-07-16 19:51:37.398232: I tensorflow/core/common_runtime/gpu/gpu_device.cc:971] 0:   Y
2017-07-16 19:51:37.398240: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1030] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 760, pci bus id: 0000:01:00.0)

此问题在安装bazel编译的GPU版本才可以消除,无影响,可以先不考虑处理。

3.新版本一些函数在更新过程中会改变:

WARNING:tensorflow:From /root/tensorflow/local/lib/python2.7/site-packages/tensorflow/python/util/tf_should_use.py:170: initialize_all_variables (from tensorflow.python.ops.variables) is deprecated and will be removed after 2017-03-02.
Instructions for updating:
Use `tf.global_variables_initializer` instead.


常见问题:

1、错误:The directory or its parent directory is not owned by the current user
在安装 Virtualenv 的时候可能会遇到如下错误:
The directory '/Users/valiantliu/Library/Caches/pip/http' or its parent directory is not owned by the current user and the cache has been disabled. Please check the permissions and owner of that directory. If executing pip with sudo, you may want sudo's -H flag.
其原因是使用了 sudo 指令来进行 virtualenv 的安装。请使用如下指令安装 Virtualenv:
sudo pip install --upgrade pip
1
    
sudo pip install --upgrade pip

2、警告:You are using pip version 7.1.2, however version 8.1.2 is available.
在安装 pip 的时候可能出现如下错误:
You are using pip version 7.1.2, however version 8.1.2 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
只需按照提示升级 pip 组件即可,但需要注意此时可能需要 root 权限:
sudo pip install --upgrade pip
1
    
sudo pip install --upgrade pip

3、错误:Library not loaded: @rpath/libcudart.7.5.dylib
安装后在测试安装环节使用:
import tensorflow as tf
1
    
import tensorflow as tf

可能会遇到如下错误:
ImportError: dlopen(/Users/valiantliu/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so, 10): Library not loaded: @rpath/libcudart.7.5.dylib
Referenced from: /Users/valiantliu/tensorflow/lib/python2.7/site-packages/tensorflow/python/_pywrap_tensorflow.so
Reason: image not found

这通常是由于没有安装 CUDA 7.5 驱动所致。请参考安装依赖中3)和4)安装有关依赖。

4、错误:Segmentation fault: 11
安装后在测试安装环节使用:
import tensorflow as tf
1
    
import tensorflow as tf

可能会遇到如下错误:
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcublas.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcudnn.dylib locally
I tensorflow/stream_executor/dso_loader.cc:108] successfully opened CUDA library libcufft.dylib locally
Segmentation fault: 11

5、错误:CUDA driver version is insufficient for CUDA runtime version
在运行 deviceQuery 进行检测时可能出现如下错误:
Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

cudaGetDeviceCount returned 35
-> CUDA driver version is insufficient for CUDA runtime version
Result = FAIL

这种情况通常是因为系统的 CUDA 驱动太老了,请参考 安装依赖 中的步骤3)安装最新版本驱动即可。
安装后再次跑 deviceQuery 用例,可以得到类似如下结果,就表示成功支持 CUDA 了:
Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 1 CUDA Capable device(s)

Device 0: "GeForce GTX 775M"
CUDA Driver Version / Runtime Version 7.5 / 7.5
CUDA Capability Major/Minor version number: 3.0
Total amount of global memory: 2048 MBytes (2147024896 bytes)
( 7) Multiprocessors, (192) CUDA Cores/MP: 1344 CUDA Cores
GPU Max Clock rate: 797 MHz (0.80 GHz)
Memory Clock rate: 2500 Mhz
Memory Bus Width: 256-bit
L2 Cache Size: 524288 bytes
Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096)
Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers
Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers
Total amount of constant memory: 65536 bytes
Total amount of shared memory per block: 49152 bytes
Total number of registers available per block: 65536
Warp size: 32
Maximum number of threads per multiprocessor: 2048
Maximum number of threads per block: 1024
Max dimension size of a thread block (x,y,z): (1024, 1024, 64)
Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535)
Maximum memory pitch: 2147483647 bytes
Texture alignment: 512 bytes
Concurrent copy and kernel execution: Yes with 2 copy engine(s)
Run time limit on kernels: Yes
Integrated GPU sharing Host Memory: No
Support host page-locked memory mapping: Yes
Alignment requirement for Surfaces: Yes
Device has ECC support: Disabled
Device supports Unified Addressing (UVA): Yes
Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0
Compute Mode:
< Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 7.5, CUDA Runtime Version = 7.5, NumDevs = 1, Device0 = GeForce GTX 775M
Result = PASS

6、错误:failed call to cuInit: CUDA_ERROR_NO_DEVICE
在运行 python 示例的时候可能出现如下错误,这通常是由于 cuda 驱动版本低导致的:
E tensorflow/stream_executor/cuda/cuda_driver.cc:509] failed call to cuInit: CUDA_ERROR_NO_DEVICE
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:158] retrieving CUDA diagnostic information for host: liuxiaos-iMac.local
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:165] hostname: liuxiaos-iMac.local
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:189] libcuda reported version is: Not found: was unable to find libcuda.so DSO loaded into this program
I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:193] kernel reported version is: Invalid argument: expected %d.%d or %d.%d.%d form for driver version; got “”

解决方法是升级 CUDA Driver 驱动。

7、错误:AttributeError: ‘GFile’ object has no attribute ‘Size’
如果运行示例 models/image/mnist/convolutional.py 时出现如下错误:
Traceback (most recent call last):
File "models/image/mnist/convolutional.py", line 326, in
tf.app.run()
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "models/image/mnist/convolutional.py", line 132, in main
train_data_filename = maybe_download('train-images-idx3-ubyte.gz')
File "models/image/mnist/convolutional.py", line 72, in maybe_download
size = f.Size()
AttributeError: 'GFile' object has no attribute 'Size'

修改方法是编辑文件 models/image/mnist/convolutional.py
查找:
size = f.Size()
1
    
size = f.Size()

修改为:
size = f.size()
1
    
size = f.size()

8、错误:Couldn’t open CUDA library libcuda.1.dylib
如果出现如下错误:
Couldn't open CUDA library libcuda.1.dylib

这是由于 CUDA 默认安装的库名字和 tensorflow 加载的库名字不一样。我们可以运行如下命令进行链接:
ln -sf /usr/local/cuda/lib/libcuda.dylib /usr/local/cuda/lib/libcuda.1.dylib
1
    
ln -sf /usr/local/cuda/lib/libcuda.dylib /usr/local/cuda/lib/libcuda.1.dylib

9、错误:PermissionError: [Errno 13] Permission denied: ‘/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/external/__init__.py’

如果安装时遇到如下错误:
Exception:
Traceback (most recent call last):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/basecommand.py", line 215, in main
status = self.run(options, args)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/commands/install.py", line 342, in run
prefix=options.prefix_path,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_set.py", line 784, in install
**kwargs
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_install.py", line 851, in install
self.move_wheel_files(self.source_dir, root=root, prefix=prefix)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/req/req_install.py", line 1064, in move_wheel_files
isolated=self.isolated,
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/wheel.py", line 377, in move_wheel_files
clobber(source, dest, False, fixer=fixer, filter=filter)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pip/wheel.py", line 323, in clobber
shutil.copyfile(srcfile, destfile)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/shutil.py", line 115, in copyfile
with open(dst, 'wb') as fdst:
PermissionError: [Errno 13] Permission denied: '/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/external/__init__.py'

可以考虑使用如下命令安装:
sudo pip3 install tensorflow-gpu
1
    
sudo pip3 install tensorflow-gpu
已标记关键词 清除标记
相关推荐
©️2020 CSDN 皮肤主题: 编程工作室 设计师:CSDN官方博客 返回首页