驱动与cuda cudnn的安装
参考tensoflow官方方法:https://www.tensorflow.org/install/gpu
# Add NVIDIA package repositories
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/7fa2af80.pub
sudo dpkg -i cuda-repo-ubuntu1804_10.1.243-1_amd64.deb
sudo apt-get update
wget http://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt install ./nvidia-machine-learning-repo-ubuntu1804_1.0.0-1_amd64.deb
sudo apt-get update
# Install NVIDIA driver
sudo apt-get install --no-install-recommends nvidia-driver-430
# Reboot. Check that GPUs are visible using the command: nvidia-smi
# Install development and runtime libraries (~4GB)
sudo apt-get install --no-install-recommends \
cuda-10-1 \
libcudnn7=7.6.4.38-1+cuda10.1 \
libcudnn7-dev=7.6.4.38-1+cuda10.1
# Install TensorRT. Requires that libcudnn7 is installed above.
sudo apt-get install -y --no-install-recommends libnvinfer6=6.0.1-1+cuda10.1 \
libnvinfer-dev=6.0.1-1+cuda10.1 \
libnvinfer-plugin6=6.0.1-1+cuda10.1
该代码能够简单快速安装。
Tensorflow 环境问题
pip install tensorflow-gpu==1.15
在1.15以后GPU版本和CPU版本是分开的,需要特别指定。使用这种方法安装后GPU的类型为XLA_GPU,这是tf做的加速,当用在有英伟达发布的stylegan中就不支持了,的使用conda安装才行。
英伟达的优化代码:
def _get_compute_cap(device):
caps_str = device.physical_device_desc
m = re.search('compute capability: (\\d+).(\\d+)', caps_str)
major = m.group(1)
minor = m.group(2)
return (major, minor)
def _get_cuda_gpu_arch_string():
gpus = [x for x in device_lib.list_local_devices() if x.device_type == 'GPU']#XLA_GPU则不支持
if len(gpus) == 0:
raise RuntimeError('No GPU devices found')
(major, minor) = _get_compute_cap(gpus[0])
return 'sm_%s%s' % (major, minor)
查看可用GPU代码:
def get_available_gpus():
"""
code from http://stackoverflow.com/questions/38559755/how-to-get-current-available-gpus-in-tensorflow
"""
from tensorflow.python.client import device_lib as _device_lib
local_device_protos = _device_lib.list_local_devices()
return [x.name for x in local_device_protos]
conda install tensorflow-gpu=1.15
该方法会自动安装cuda、cdnn ,前提是驱动已经安装好。