0、OpenEuler内核:
因为OpenEuler使用的是4.19内核,所以属于el8。下面是centos与内核版本对应关系
6.4/ 20-Jun-2013 15:50 kernel-2.6.32-358.el6.src.rpm
6.5/ 21-Dec-2013 14:05 kernel-2.6.32-431.el6.src.rpm
6.6/ 31-Jul-2015 16:17 kernel-2.6.32-504.el6.src.rpm
6.7/ 21-Jan-2016 13:22 kernel-2.6.32-573.el6.src.rpm
7.0.1406/ 07-Apr-2015 15:36 kernel-3.10.0-123.el7.src.rpm
7.1.1503/ 13-Nov-2015 13:01 kernel-3.10.0-229.el7.src.rpm
7.2.1511/ 16-Feb-2016 16:15 kernel-3.10.0-327.el7.src.rpm
7.3.1611/ 2017-02-20 22:21 kernel-3.10.0-514.el7.src.rpm
7.4.1708/ 2018-02-26 14:32 kernel-3.10.0-693.el7.src.rpm
7.5.1804/ 2018-05-09 20:39 kernel-3.10.0-862.el7.src.rpm
7.6.1810/ 2018-12-02 14:34 kernel-3.10.0-957.el7.src.rpm
7.7.1908/ 2019-09-15 01:00 kernel-3.10.0-1062.el7.src.rpm
7.8.2003/ 2020-06-17 17:55 kernel-3.10.0-1127.el7.src.rpm
7.9.2009/ 2020-11-09 22:01 kernel-3.10.0-1160.el7.src.rpm
8.0.1905/ 2020-09-09 07:43 kernel-4.18.0-80.el8.src.rpm
8.1.1911/ 2020-04-13 08:20 kernel-4.18.0-147.el8.src.rpm
8.2.2004/ 2020-06-15 12:42 kernel-4.18.0-193.el8.src.rpm
1、检测显卡驱动及型号
(1) 添加ELPepo源
sudo rpm --import https://www.elrepo.org/RPM-GPG-KEY-elrepo.org
sudo yum install https://www.elrepo.org/elrepo-release-8.el8.elrepo.noarch.rpm
2、安装NVIDIA驱动检测
sudo yum install nvidia-detect
nvidia-detect -v
显示内容如下:
Probing for supported NVIDIA devices...
[8086:3ea0] Intel Corporation UHD Graphics 620 (Whiskey Lake)
[10de:1d13] NVIDIA Corporation Device
This device requires the current 440.64 NVIDIA driver kmod-nvidia
WARNING: Xorg log file /var/log/Xorg.0.log does not exist
WARNING: Unable to determine Xorg ABI compatibility
WARNING: The driver for this device does not support the current Xorg version
An Intel display controller was also detected
cuda版本与驱动版本对应关系:
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-vSErsxy4-1625489489137)(./2020-11-29%2000-06-43.png)]
Tensorflow CUDA Linux GPU对应
ersion | Python version | Compiler | Build tools | cuDNN | CUDA |
---|---|---|---|---|---|
tensorflow-2.3.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 3.1.0 | 7.6 | 10.1 |
tensorflow-2.2.0 | 3.5-3.8 | GCC 7.3.1 | Bazel 2.0.0 | 7.6 | 10.1 |
tensorflow-2.1.0 | 2.7, 3.5-3.7 | GCC 7.3.1 | Bazel 0.27.1 | 7.6 | 10.1 |
tensorflow-2.0.0 | 2.7, 3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.15.0 | 2.7, 3.3-3.7 | GCC 7.3.1 | Bazel 0.26.1 | 7.4 | 10.0 |
tensorflow_gpu-1.14.0 | 2.7, 3.3-3.7 | GCC 4.8 | Bazel 0.24.1 | 7.4 | 10.0 |
tensorflow_gpu-1.13.1 | 2.7, 3.3-3.7 | GCC 4.8 | Bazel 0.19.2 | 7.4 | 10.0 |
tensorflow_gpu-1.12.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.11.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.10.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.15.0 | 7 | 9 |
tensorflow_gpu-1.9.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.11.0 | 7 | 9 |
tensorflow_gpu-1.8.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.10.0 | 7 | 9 |
tensorflow_gpu-1.7.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.6.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.9.0 | 7 | 9 |
tensorflow_gpu-1.5.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.8.0 | 7 | 9 |
tensorflow_gpu-1.4.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.5.4 | 6 | 8 |
tensorflow_gpu-1.3.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 6 | 8 |
tensorflow_gpu-1.2.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.4.5 | 5.1 | 8 |
tensorflow_gpu-1.1.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
tensorflow_gpu-1.0.0 | 2.7, 3.3-3.6 | GCC 4.8 | Bazel 0.4.2 | 5.1 | 8 |
3、驱动下载地址:
https://www.nvidia.cn/geforce/drivers/
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-b1Td6r5D-1625489489139)(./2020-11-28 23-40-15.png)]
最新驱动程序版本: 455.45 - 发行日期: 2020-11-17,
https://cn.download.nvidia.cn/XFree86/Linux-x86_64/455.45.01/NVIDIA-Linux-x86_64-455.45.01.run
4、处理显卡冲突
因为安装NVIDIA官方驱动会和系统自带nouveau驱动冲突,需要禁用自带的nouveau驱动,先执行命令查看该驱动状态:
lsmod | grep nouveau
修改/etc/modprobe.d/blacklist.conf 文件,以阻止 nouveau 模块的加载,如果系统没有该文件需要新建一个,这里使用root权限,普通用户无法再在/etc内生成.conf文件。
echo -e "blacklist nouveau\noptions nouveau modeset=0">/etc/modprobe.d/blacklist.conf
或者,直接创建编辑 /etc/modprobe.d/blacklist.conf
vi /etc/modprobe.d/blacklist.conf
输入如下:
blacklist nouveau
noptions nouveau modeset=0
5、 重新建立initramfs image文件
mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
dracut /boot/initramfs-$(uname -r).img $(uname -r)
6、安装驱动:
sudo sh ./NVIDIA-Linux-x86_64-440.100.run
nvidia-smi
卸载显卡驱动以及重装
sudo sh ./NVIDIA-Linux-x86_64-440.100.run --uninstall
显示如下:
NVIDIA-SMI 440.100 Driver Version: 440.100 CUDA Version: 10.2
7、安装cuda
资料上建议先装cuda,避免安装中的冲突。
官网下载cuda-rpm包 https://developer.nvidia.com/cuda-downloads ,一定要对应自己的版本。
[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-MLIncKpw-1625489489141)(./2020-11-29 00-06-43.png)]
历史版本:https://developer.nvidia.com/cuda-toolkit-archive
# 如果wget下载文件很小,大约32个字节,清重新下载
wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run
wget https://developer.download.nvidia.cn/compute/cuda/10.2/Prod/patches/1/cuda_10.2.1_linux.run
wget https://developer.download.nvidia.cn/compute/cuda/10.2/Prod/patches/2/cuda_10.2.2_linux.run
sudo sh ./cuda_10.2.89_440.33.01_linux.run
sudo sh ./cuda_10.2.1_linux.run
sudo sh ./cuda_10.2.2_linux.run
sudo /usr/local/cuda-10.2/bin/nvcc --version
各个版本下载目录: https://developer.nvidia.com/rdp/cudnn-archive#a-collapse51b
不要点击,复制连接
cuda验证:
nvcc -V
cat /usr/local/cuda-10.2/version.txt
cd /usr/local/cuda-10.2/samples/1_Utilities/deviceQuery
sudo make
./deviceQuery
出现错误,tmpfs空间不足
# 卸载tmpfs:
umount /dev/shm
# 进程被占用,杀掉进程:
fuser -km /dev/shm
# 再次卸载tmpfs:
umount /dev/shm
# 挂载tmpfs:
mount -t tmpfs -o size=5120m tmpfs /dev/shm
8、安装cudnn
(4)cudnn下载/安装:
cudnn一定要对应的cuda;cudnn不需要选择平台;尽量选择最新的版本;
tar -zxvf cudnn-10.2-linux-x64-v8.0.4.30.tgz
sudo cp cuda/include/* /usr/local/cuda/include/
sudo cp cuda/include/* /usr/local/cuda-10.2/include/
sudo cp cuda/lib64/* /usr/local/cuda/lib64/
sudo cp cuda/lib64/* /usr/local/cuda-10.2/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
sudo chmod a+r /usr/local/cuda-10.2/include/cudnn.h /usr/local/cuda-10.2/lib64/libcudnn*
参考资料:
https://blog.csdn.net/xiaoyw71/article/details/89402146
© 著作权归作者所有