Ubuntu18.04 安装 CUDA10.0 + cuDNN7.6.5 + TensorFlow2.0

第一次安装 CUDA 的过程简直抓狂,中间出现了很多次莫名其妙的 bug,踩了很多坑。比如装好了 CUDA 重启后进不去桌面系统了,直接黑屏、比如鼠标键盘都不 work 了、再比如装好了却安装不了 TensorFlow-GPU......看了一圈网上的安装教程,发现还是官方指南真香了~

新年第一篇,分享一下我的 Ubuntu 18.04 + CUDA 10.0 + cuDNN 7.6.5 + TensorFlow 2.0 安装笔记,希望可以帮助大家少踩坑。

整个安装流程大致是:安装显卡驱动 -> 安装 CUDA[1] -> 安装 cuDNN[2] -> 安装 tensorflow-gpu 并测试。

全文目录:

  1. Ubuntu安装与更新

  2. 安装显卡驱动

  3. 安装CUDA

  4. 安装cuDNN

  5. 安装TensorFlow2.0 GPU及测试


1. Ubuntu安装和更新

先进行Ubuntu18.04系统一些基本的安装和更新,具体的操作系统安装过程省略,比较容易,大家可自行百度,有很多教程。


    
    
  1. sudo apt-get update # 更新源
  2. sudo apt-get upgrade # 更新已安装的包
  3. sudo apt-get install vim

2. 安装显卡驱动

2.1 禁用 Nouveau 驱动

注意:Linux 系统下有两种方案安装 CUDA:一种是 Package Manager Installation (.deb),另一种是 Runfile Installation (.run)。本文采取的是第一种(也是官方推荐的方式)。如果使用deb方式安装CUDA可以忽略此步,本人测试OK。如果使用 runfile 安装CUDA需要手动禁用系统自带的 Nouveau 驱动:


    
    
  1. lsmod | grep nouveau # 要确保这条命令无输出

    
    
  1. vim /etc/modprobe.d/blacklist-nouveau.conf
  2. # 添加下面两行:
  3. #######################################################
  4. blacklist nouveau
  5. options nouveau modeset= 0
  6. #######################################################
  7. # 保存后重启:
  8. sudo update-initramfs -u
  9. sudo reboot
  10. # 再次输入以下命令,无输出就表示设置成功了
  11. lsmod | grep nouveau

2.2 安装合适的显卡驱动[3]


    
    
  1. # 先清空现有的显卡驱动及依赖并重启
  2. sudo apt-get remove --purge nvidia*
  3. sudo apt autoremove
  4. sudo reboot

    
    
  1. # 添加ppa源并安装最新的驱动
  2. sudo add-apt-repository ppa:graphics-drivers/ppa
  3. sudo apt update
  4. ubuntu-drivers devices
  5. sudo apt install nvidia-driver -440
  6. # 为了防止自动更新驱动导致的兼容性问题,我们还可以锁定驱动版本:
  7. sudo apt-mark hold nvidia-driver -440
  8. # nvidia-driver-440 set on hold.

并在【软件和更新】菜单中的附加驱动列表中,可以找到刚刚安装的nvidia-driver-440,选定即可。输入sudo reboot重启后,输入nvidia-smi,显示下图信息,这样表示显卡驱动已经 ready:


    
    
  1. lsmod | grep nvidia # 看到下面的输出则为安装成功,如果无输出,表示有问题

也可以手动去官网下载对应的安装程序安装显卡[4]


    
    
  1. # 动态监测显卡使用的方式:
  2. watch -n 1 nvidia-smi # 1表示每1秒刷新一次
  3. watch -n 0.01 nvidia-smi # 也可改成0.01s刷新一次
  4. # 也可以用gpustat
  5. pip install gpustat
  6. gpustat -i 1 -P

3. 安装 CUDA

百度百科:CUDA(Compute Unified Device Architecture),是显卡厂商NVIDIA[5]推出的运算平台。CUDA 是一种由 NVIDIA 推出的通用并行计算[6]架构,该架构使GPU[7]能够解决复杂的计算问题。

Linux 系统下有两种方案安装 CUDA:一种是 Package Manager Installation (.deb),另一种是 Runfile Installation (.run)。本文采取的是第一种(也是官方推荐的方式)。

另外,CUDA 对于系统环境有严格的依赖,比如对于 CUDA10.0 有如下的要求。其他的版本可查看对应的Online Documentation[8]

3.1 安装前的准备

在安装 CUDA 之前需要先确定环境是 ready 的,以免出现乱七八糟的 bug 无从下手。直接引用官网的说明:

Some actions must be taken before the CUDA Toolkit and Driver can be installed on Linux:

  • Verify the system has a CUDA-capable GPU.

  • Verify the system is running a supported version of Linux.

  • Verify the system has gcc installed.

  • Verify the system has the correct kernel headers and development packages installed.

  • Download the NVIDIA CUDA Toolkit.

  • Handle conflicting installation methods.

3.1.1 确认你有支持 CUDA 的 GPU

    
    
  1. lspci | grep -i nvidia | grep VGA
3.1.2 确认你的 linux 版本

    
    
  1. uname -m && cat /etc /*release
  2. uname -a
  3. # The x86_64 line indicates you are running on a 64-bit system.
3.1.3 确认 gcc 版本

    
    
  1. gcc --version
  2. # gcc (Ubuntu 7.4.0-1ubuntu1~18.04.1) 7.4.0
3.1.4 安装对应内核版本的头文件

查看 kernel 的版本:


    
    
  1. uname -r
  2. # 5.0.0-37-generic

This is the version of the kernel headers and development packages that must be installed prior to installing the CUDA Drivers.

安装对应内核版本的头文件:


    
    
  1. sudo apt-get install linux-headers-$(uname -r)
3.1.5 选择安装方式

下载对应的安装包(以官方推荐的 Deb packages 安装方式为例)[9]

The CUDA Toolkit can be installed using either of two different installation mechanisms: distribution-specific packages (RPM and Deb packages), or a distribution-independent package (runfile packages).

(1) The distribution-independent package has the advantage of working across a wider set of Linux distributions, but does not update the distribution's native package management system.

(2) The distribution-specific packages interface with the distribution's native package management system. It is recommended to use the distribution-specific packages, where possible.

3.1.6 彻底卸载之前安装过的相关应用,避免冲突

如果是全新的 ubuntu,可忽略此部分,执行 3.2 部分即可。

如果 ubuntu 下用 RPM/Deb 安装的:


    
    
  1. sudo apt-get --purge remove <package_name>
  2. sudo apt autoremove

如果是 runfile 安装的:


    
    
  1. sudo /usr/bin/nvidia-uninstall
  2. sudo /usr/local/cuda-X.Y/bin/uninstall_cuda_X.Y.pl

3.2 安装

首先确保已经下载好对应的.deb 文件,然后执行:


    
    
  1. sudo dpkg -i cuda-repo-ubuntu1804 -10 -0-local -10.0 .130 -410.48_1 .0 -1_amd64.deb
  2. sudo apt-key add / var/cuda-repo-<version>/ 7fa2af80.pub # 根据执行完第一步的提示输入,比如我是:
  3. # sudo apt-key add /var/cuda-repo-10-0-local-10.0.130-410.48/7fa2af80.pub
  4. sudo apt-get update
  5. sudo apt-get install cuda-toolkit -10 -0 # 注意不是cuda,因为在第二步中装过驱动了,此过程安装cuda-toolkit-10-0即可

3.3 安装后

安装之后需要手动进行一些设置才能使 CUDA 正常的工作。


    
    
  1. export PATH=/usr/local/cuda -10.0/bin${PATH:+:${PATH}}

    
    
  1. nvcc -V # 检查CUDA是否安装成功
  2. # OUTPUT:
  3. nvcc: NVIDIA (R) Cuda compiler driver
  4. Copyright (c) 2005 -2018 NVIDIA Corporation
  5. Built on Sat_Aug_25_21: 08: 01_CDT_2018
  6. Cuda compilation tools, release 10.0, V10 .0 .130

最好关闭系统的自动更新,防止安装好的环境突然 bug:


    
    
  1. sudo vi /etc/apt/apt.conf.d/ 10periodic
  2. # 修改为:
  3. APT::Periodic::Update-Package-Lists "0";
  4. APT::Periodic::Download-Upgradeable-Packages "0";
  5. APT::Periodic::AutocleanInterval "0";

也可以通过桌面设置:System Settings => Software&Updates => updates

4. 安装 cuDNN[10]

NVIDIA cuDNN 是用于深度神经网络的 GPU 加速库。首先需要注册下载对应 CUDA 版本号的 cuDNN 安装包: 链接[11]

比如对应 CUDA10.0,我下载的是:tar -zxvf cudnn-10.0-linux-x64-v7.6.5.32.tgz


    
    
  1. tar -zxvf cudnn -10.0-linux-x64-v7 .6 .5 .32.tgz
  2. sudo cp cuda/ include/cudnn.h /usr/local/cuda/ include
  3. sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
  4. sudo chmod a+r /usr/local/cuda/ include/cudnn.h /usr/local/cuda/lib64/libcudnn*

验证是否安装成功:


    
    
  1. cat /usr/local/cuda/ include/cudnn.h | grep CUDNN_MAJOR -A 2
  2. # 输出
  3. "" "
  4. #define CUDNN_MAJOR 7
  5. #define CUDNN_MINOR 6
  6. #define CUDNN_PATCHLEVEL 5
  7. --
  8. #define CUDNN_VERSION (CUDNN_MAJOR * 1000 + CUDNN_MINOR * 100 + CUDNN_PATCHLEVEL)
  9. #include "driver_types.h "
  10. " ""

更推荐使用 Debian File 去安装,因为可以通过里面的样例去验证 cuDNN 是否成功安装。首先下载下面三个文件:


    
    
  1. # 分别下载
  2. sudo dpkg -i libcudnn7_7 .6 .5 .32 -1+cuda10 .0_amd64.deb
  3. sudo dpkg -i libcudnn7-dev_7 .6 .5 .32 -1+cuda10 .0_amd64.deb
  4. sudo dpkg -i libcudnn7-doc_7 .6 .5 .32 -1+cuda10 .0_amd64.deb
  5. # 安装完验证:
  6. cp -r /usr/src/cudnn_samples_v7/ $HOME
  7. cd $HOME/cudnn_samples_v7/mnistCUDNN
  8. make clean && make
  9. ./mnistCUDNN
  10. # Test passed!

另外也可以用 conda 来安装 cudatoolkit 和 cuDNN,但要保证驱动是 ready 的。


    
    
  1. conda install cudatoolkit= 10.0
  2. conda install -c anaconda cudnn

5. 安装 TensorFlow2.0 GPU及测试


    
    
  1. # 安装conda
  2. wget https: //repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh && bash Miniconda3-latest-Linux-x86_64.sh
  3. source ~/.bashrc
  4. conda create -y -n tf2 python= 3.7
  5. conda activate tf2
  6. pip install --upgrade pip
  7. pip config set global.index-url https: //pypi.tuna.tsinghua.edu.cn/simple
  8. pip install tensorflow-gpu
  9. pip install catboost

测试:


    
    
  1. import tensorflow as tf
  2. print(tf.__version__)
  3. print( "Num GPUs Available: ", len(tf.config.experimental.list_physical_devices( 'GPU')))
  4. "" "
  5. 2.0.0
  6. Num GPUs Available: 2
  7. " ""

    
    
  1. "" "
  2. 测试程序:
  3. 源链接:https://github.com/dragen1860/TensorFlow-2.x-Tutorials/blob/master/08-ResNet/main.py
  4. " ""
  5. import os
  6. os.environ[ "CUDA_VISIBLE_DEVICES"] = "1" # os.environ["CUDA_VISIBLE_DEVICES"] = "0,1"
  7. import tensorflow as tf
  8. import numpy as np
  9. from tensorflow import keras
  10. tf.random.set_seed( 22)
  11. np.random.seed( 22)
  12. os.environ[ 'TF_CPP_MIN_LOG_LEVEL'] = '2'
  13. assert tf.__version__.startswith( '2.')
  14. (x_train, y_train), (x_test, y_test) = keras.datasets.fashion_mnist.load_data()
  15. x_train, x_test = x_train.astype(np.float32) / 255., x_test.astype(
  16. np.float32) / 255.
  17. # [b, 28, 28] => [b, 28, 28, 1]
  18. x_train, x_test = np.expand_dims(x_train, axis= 3), np.expand_dims(x_test,
  19. axis= 3)
  20. # one hot encode the labels. convert back to numpy as we cannot use a combination of numpy
  21. # and tensors as input to keras
  22. y_train_ohe = tf.one_hot(y_train, depth= 10).numpy()
  23. y_test_ohe = tf.one_hot(y_test, depth= 10).numpy()
  24. print(x_train.shape, y_train.shape)
  25. print(x_test.shape, y_test.shape)
  26. # 3x3 convolution
  27. def conv3x3(channels, stride= 1, kernel=( 3, 3)):
  28. return keras.layers.Conv2D(
  29. channels,
  30. kernel,
  31. strides=stride,
  32. padding= 'same',
  33. use_bias= False,
  34. kernel_initializer=tf.random_normal_initializer())
  35. class ResnetBlock(keras.Model):
  36. def __init__(self, channels, strides=1, residual_path=False):
  37. super(ResnetBlock, self).__init__()
  38. self.channels = channels
  39. self.strides = strides
  40. self.residual_path = residual_path
  41. self.conv1 = conv3x3(channels, strides)
  42. self.bn1 = keras.layers.BatchNormalization()
  43. self.conv2 = conv3x3(channels)
  44. self.bn2 = keras.layers.BatchNormalization()
  45. if residual_path:
  46. self.down_conv = conv3x3(channels, strides, kernel=(1, 1))
  47. self.down_bn = tf.keras.layers.BatchNormalization()
  48. def call(self, inputs, training=None):
  49. residual = inputs
  50. x = self.bn1(inputs, training=training)
  51. x = tf.nn.relu(x)
  52. x = self.conv1(x)
  53. x = self.bn2(x, training=training)
  54. x = tf.nn.relu(x)
  55. x = self.conv2(x)
  56. # this module can be added into self.
  57. # however, module in for can not be added.
  58. if self.residual_path:
  59. residual = self.down_bn(inputs, training=training)
  60. residual = tf.nn.relu(residual)
  61. residual = self.down_conv(residual)
  62. x = x + residual
  63. return x
  64. class ResNet(keras.Model):
  65. def __init__(self, block_list, num_classes, initial_filters=16, **kwargs):
  66. super(ResNet, self).__init__(**kwargs)
  67. self.num_blocks = len(block_list)
  68. self.block_list = block_list
  69. self.in_channels = initial_filters
  70. self.out_channels = initial_filters
  71. self.conv_initial = conv3x3(self.out_channels)
  72. self.blocks = keras.models.Sequential(name='dynamic-blocks')
  73. # build all the blocks
  74. for block_id in range(len(block_list)):
  75. for layer_id in range(block_list[block_id]):
  76. if block_id != 0 and layer_id == 0:
  77. block = ResnetBlock(self.out_channels,
  78. strides=2,
  79. residual_path=True)
  80. else:
  81. if self.in_channels != self.out_channels:
  82. residual_path = True
  83. else:
  84. residual_path = False
  85. block = ResnetBlock(self.out_channels,
  86. residual_path=residual_path)
  87. self.in_channels = self.out_channels
  88. self.blocks.add(block)
  89. self.out_channels *= 2
  90. self.final_bn = keras.layers.BatchNormalization()
  91. self.avg_pool = keras.layers.GlobalAveragePooling2D()
  92. self.fc = keras.layers.Dense(num_classes)
  93. def call(self, inputs, training=None):
  94. out = self.conv_initial(inputs)
  95. out = self.blocks(out, training=training)
  96. out = self.final_bn(out, training=training)
  97. out = tf.nn.relu(out)
  98. out = self.avg_pool(out)
  99. out = self.fc(out)
  100. return out
  101. def main():
  102. num_classes = 10
  103. batch_size = 128
  104. epochs = 2
  105. # build model and optimizer
  106. model = ResNet([2, 2, 2], num_classes)
  107. model.compile(optimizer=keras.optimizers.Adam(0.001),
  108. loss=keras.losses.CategoricalCrossentropy(from_logits=True),
  109. metrics=['accuracy'])
  110. model.build(input_shape=(None, 28, 28, 1))
  111. print("Number of variables in the model :", len(model.variables))
  112. model.summary()
  113. # train
  114. model.fit(x_train,
  115. y_train_ohe,
  116. batch_size=batch_size,
  117. epochs=epochs,
  118. validation_data=(x_test, y_test_ohe),
  119. verbose=1)
  120. # evaluate on test set
  121. scores = model.evaluate(x_test, y_test_ohe, batch_size, verbose=1)
  122. print("Final test loss and accuracy :", scores)
  123. if __name__ == '__main__':
  124. main()

监测 GPU 使用:


    
    
  1. watch -n 0.01 nvidia-smi

测试 catboost 使用 CPU:


    
    
  1. from catboost.datasets import titanic
  2. import numpy as np
  3. from sklearn.model_selection import train_test_split
  4. from catboost import CatBoostClassifier, Pool, cv
  5. from sklearn.metrics import accuracy_score
  6. train_df, test_df = titanic()
  7. null_value_stats = train_df.isnull().sum(axis= 0)
  8. null_value_stats[null_value_stats != 0]
  9. train_df.fillna( -999, inplace= True)
  10. test_df.fillna( -999, inplace= True)
  11. X = train_df.drop( 'Survived', axis= 1)
  12. y = train_df.Survived
  13. X_train, X_validation, y_train, y_validation = train_test_split(X, y, train_size= 0.75, random_state= 42)
  14. X_test = test_df
  15. categorical_features_indices = np.where(X.dtypes != np.float)[ 0]
  16. model = CatBoostClassifier(
  17. task_type= "GPU",
  18. custom_metric=[ 'Accuracy'],
  19. random_seed= 666,
  20. logging_level= 'Silent'
  21. )
  22. model.fit(
  23. X_train, y_train,
  24. cat_features=categorical_features_indices,
  25. eval_set=(X_validation, y_validation),
  26. logging_level= 'Verbose', # you can comment this for no text output
  27. plot= True
  28. );

监测 GPU 使用:


    
    
  1. watch -n 0.01 nvidia-smi

REFERENCE

[1]

安装CUDA: https://developer.nvidia.com/cuda-toolkit-archive

[2]

安装cuDNN: https://developer.nvidia.com/rdp/cudnn-download

[3]

安装合适的显卡驱动: http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux

[4]

也可以手动去官网下载对应的安装程序安装显卡: https://www.geforce.cn/drivers

[5]

NVIDIA: https://baike.baidu.com/item/NVIDIA

[6]

并行计算: https://baike.baidu.com/item/并行计算/113443

[7]

GPU: https://baike.baidu.com/item/GPU

[8]

Online Documentation: https://developer.nvidia.com/cuda-toolkit-archive

[9]

下载对应的安装包(以官方推荐的Deb packages安装方式为例): https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=deblocal

[10]

安装cuDNN: https://developer.nvidia.com/rdp/cudnn-download

[11]

链接: https://developer.nvidia.com/rdp/cudnn-download

[12]

官方-NVIDIA CUDA Installation Guide for Linux: https://docs.nvidia.com/cuda/archive/10.0/cuda-installation-guide-linux/index.html

[13]

CUDA_Quick_Start_Guide-pdf: https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Quick_Start_Guide.pdf

[14]

CUDA_Installation_Guide_Linux-pdf: https://developer.download.nvidia.com/compute/cuda/10.0/Prod/docs/sidebar/CUDA_Installation_Guide_Linux.pdf

[15]

官方-cuDNN安装: https://docs.nvidia.com/deeplearning/sdk/cudnn-install/index.html#install-linux

[16]

[How To] Install Latest NVIDIA Drivers In Linux: http://www.linuxandubuntu.com/home/how-to-install-latest-nvidia-drivers-in-linux

推荐原创干货阅读:  

 聊聊近状, 唠十块钱的

【Deep Learning】详细解读LSTM与GRU单元的各个公式和区别

【手把手AI项目】一、安装win10+linux-Ubuntu16.04的双系统(全网最详细)

【Deep Learning】为什么卷积神经网络中的“卷积”不是卷积运算?

【TOOLS】Pandas如何进行内存优化和数据加速读取(附代码详解)

【TOOLS】python3利用SMTP进行邮件Email自主发送

【手把手AI项目】七、MobileNetSSD通过Ncnn前向推理框架在PC端的使用

【时空序列预测第一篇】什么是时空序列问题?这类问题主要应用了哪些模型?主要应用在哪些领域?

  • 0
    点赞
  • 8
    收藏
    觉得还不错? 一键收藏
  • 1
    评论
好的,下面是安装步骤: 1. 安装CUDA 8.0 首先,你需要下载CUDA 8.0安装包,可以在官网上下载或者使用以下命令: ``` wget https://developer.nvidia.com/compute/cuda/8.0/Prod2/local_installers/cuda_8.0.61_375.26_linux-run ``` 下载完成后,运行以下命令进行安装: ``` sudo sh cuda_8.0.61_375.26_linux-run ``` 安装过程中需要你进行一些选择,可以按照默认选项进行。 安装完成后,在`~/.bashrc`文件中添加以下环境变量: ``` export PATH=/usr/local/cuda-8.0/bin:$PATH export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH ``` 然后运行以下命令使配置生效: ``` source ~/.bashrc ``` 2. 安装cuDNN cuDNN是NVIDIA提供的深度学习库,可以加速深度学习的训练和推理过程。你需要先在官网上注册一个账号,然后才能下载cuDNN。 下载完成后,解压文件并将文件复制到CUDA安装路径下: ``` tar -xzvf cudnn-8.0-linux-x64-v6.0.tgz sudo cp -P cuda/include/cudnn.h /usr/local/cuda-8.0/include sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda-8.0/lib64/ sudo chmod a+r /usr/local/cuda-8.0/include/cudnn.h /usr/local/cuda-8.0/lib64/libcudnn* ``` 3. 安装TensorFlow 最后,你可以使用pip安装TensorFlow: ``` pip install tensorflow-gpu==1.3 ``` 如果你想使用CPU版本的TensorFlow,可以使用以下命令: ``` pip install tensorflow==1.3 ``` 安装完成后,你可以测试一下TensorFlow是否正常工作: ``` python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello)) ``` 如果输出了`Hello, TensorFlow!`,那么TensorFlow就安装成功了。
评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值