1. Tensorflow切换使用CPU/GPU
config = tf.ConfigProto(device_count = {'GPU': 0}) //0表示使用CPU,1则是GPU
with tf.Session(config=config) as sess:
2. can’t open CUDA library libcupti.so.8.0.
locate libcupti* 找到其所在位置
然后在环境变量里加上:
export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:/usr/local/cuda-8.0/extras/CUPTI/lib64:$LD_LIBRARY_PATH
3. could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
这个问题是我在用源码编译得到的tensorflow时碰到的,在用pip获取的没有这个问题,所以如非必要,解决方法就是换回用pip intsall的tensorflow
解决方案2:在你的.py文件里添加下面几行,这个是在github上看到有人的解决方法,主要解决由于显存问题导致的,我这边用这方案没成功
config = tf.ConfigProto()
config.gpu_options.allow_growth = True
sess = tf.Session(config=config)
或者
config = tf.ConfigProto()
config.gpu_options.per_process_gpu_memory_fraction = 0.4
session = tf.Session(config=config, ...)
解决方案3:用root权限跑,这个解决了我的问题
$ su
$ python test.py
4. ImportError: cannot import name string_int_label_map_pb2
https://github.com/tensorflow/models/blob/master/object_detection/g3doc/installation.md
# From tensorflow/models/
protoc object_detection/protos/*.proto --python_out=.
5. tensorflow多GPU训练
在/models/slim/deployment/model_deploy.py中,有说明:
DeploymentConfig parameters:
* num_clones: Number of model clones to deploy in each replica.
* clone_on_cpu: True if clones should be placed on CPU.
* replica_id: Integer. Index of the replica for which the model is deployed. Usually 0 for the chief replica.
* num_replicas: Number of replicas to use.
* num_ps_tasks: Number of tasks for theps
job. 0 to not use replicas.
* worker_job_name: A name for the worker job.
* ps_job_name: A name for the parameter server job.
num_clones即设置单机多GPU的参数,设置:
deploy_config = model_deploy.DeploymentConfig(num_clones=4, clone_on_cpu=False)
6.在终端设置tensorflow使用的GPU
$ CUDA_VISIBLE_DEVICES=2,3 python your_script.py