CentOS 7 + cuda10.0

最新推荐文章于 2024-08-20 17:36:33 发布

学习备忘录

最新推荐文章于 2024-08-20 17:36:33 发布

阅读量1.1w

点赞数

本文链接：https://blog.csdn.net/Among12345/article/details/93500512

版权

注意: 千万不要在虚拟机中操作，不会成功的。因为目没有显卡。

要想成功，需要在实体机中操作，决定安装双系统，原系统为win10。

安装系统：CentOS 7

系统安装，为了本地测试，安装了双系统，显卡为GeForce 940MX，为支持cuda的显卡版本，可以查询显卡是否支持cuda https://developer.nvidia.com/cuda-gpus

首先准备足够的磁盘空间：

选择空间较大的磁盘，右键压缩卷，选择分配大小，一般50G就可以

遇到的错误1：dracut:/# 无法继续进行，出现一堆timeout

解决：

>在报错信息下面进行如下操作：   
>dracut:/# cd dev
>dracut:/# ls
这样子你就会看到所有的设备信息。
>找到sdbx,x为一个数字，是你u盘所在
>dracut:/# reboot  重启之后
>在install页面按e键
>修改vmlinuz initrd=initrd.img inst.stage2=hd:LABEL=CentOS\x207\x20x86_64.check quiet为  vmlinuz initrd=initrd.img inst.stage2=hd:/dev/sdbx(你u盘所在)quiet 然后按Ctrl+x就好了。

但是在操作过程中，我发现的dev里面的sdb开头的只有sdb,sdb1和sdb2，于是我就把它仨都试了一遍，就过都说找不到img文件。
本来我以为我的电脑不能安装Linux的，后来我发现他们一般都说默认是sdb4，可我的dev里面没有sdb4，不过我的dev有个sdc4，于是我就使用sdc4
>修改vmlinuz initrd=initrd.img inst.stage2=hd:LABEL=CentOS\x207\x20x86_64.check quiet为

vmlinuz initrd=initrd.img inst.stage2=hd:/dev/sdc4 quiet 然后按Ctrl+x就出现了centOS的安装界面了。

参考https://www.cnblogs.com/Lenbrother/articles/6251555.html

kernel内核：

安装基础依赖环境

yum -y install gcc kernel-devel kernel-headers

查看内核版本

查看内核版本 ls /boot | grep vmlinu

查看源码包版本 rpm -aq | grep kernel-devel

均为kernel-devel-3.10.0-957.21.3.el7.x86_64

屏蔽nouveau驱动

修改/usr/lib/modprobe.d/dist-blacklist.conf文件，以阻止 nouveau 模块的加载

将nvidiafb注释掉。

#blacklist nvidiafb

然后添加以下语句：

blacklist nouveau

options nouveau modeset=0

lsmod | grep nouveau

重新建立initramfs image文件

[root@localhost ~]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@localhost ~]# dracut /boot/initramfs-$(uname -r).img $(uname -r)

CentOS 7安装 NVIDIA驱动

查看自己的显卡版本 lspci |grep VGA

lspci | grep -i nvidia

安装Development Tools

yum groupinstall "Development Tools"

进入命令行（我的电脑按 Alt+F1/F2 没有反应）

systemctl isolate multi-user.target

【返回图形化界面 #startx】

更新一下CentOS（有没有都可以）

yum -y update

准备NVIDIA驱动包

NVIDIA驱动包下载地址为：

https://www.nvidia.com/Download/index.aspx

建议还是根据显卡型号和系统自己找，有多个支持的版本，我这里下载的版本是NVIDIA-Linux-x86_64-410.104.run

[root@localhost] #进入下载的显卡驱动所在的文件夹
[root@localhost ~]# chmod +x NVIDIA-Linux-x86_64-340.65.run
[root@localhost ~]# ./NVIDIA-Linux-x86_64-340.65.run

版本好根据自己的修改，方法一样。

遇到错误2：ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed. If you know the correct kernel source files are installed, you may specify the kernel source path with the '--kernel-source-path' command line option.

解决方法：

./NVIDIA-Linux-x86_64-390.67.run --kernel-source-path=/usr/src/kernels/3.10.0-862.3.2.el7.x86_64/

./NVIDIA-Linux-x86_64-430.26.run –no-opengl-files

版本号根据自己的修改，方法是一样的。

可以正常进入安装界面了

每一步都有选择，选择错误就会出现各种报错，所以在此记录一下：

（这部分参考：https://blog.csdn.net/fu6543210/article/details/80104491）

1. the distribute-provided pre-install script failed.....：

选择 continue to install

开始有进度条显示building kernel modules，等一会.

2. the target kernel has CONFIG_MODULE_SIG set......

选择 sign the kernel module（为内核模块签名）

3. the nvidia kernel module with an existing key pair .....

选择 genera a new one

4. ...was successfully signed with a newly generated key pair,would you like to delete...

选择no

5. 接下来那个没得选，只有一个选项，直接选择 OK

6. 再接下来那个，还是直接选择 OK

7. ...the signed kernel module failed to load,because the kernei dose not trust any key...

选择 install signed kerrel module

8. 接下来那个，直接选择 OK

9. 再接下来那个，直接选择 yes

10. 接下来那个，直接选择 OK

重启或者不重启电脑，进入图形界面，检查驱动是否安装好，在命令行输入

$ nvidia-smi

此时问题出来了，我们在终端看到的不是驱动和显卡的信息，而是.........
“NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.”

解决问题：（这部分参考https://blog.csdn.net/smcaa/article/details/86482872）

显然我们已经安装了驱动，但是为什么我们重启电脑后，查看N卡驱动的时候，会显示这些错误，原因是系统kernel没有加载N家显卡的驱动模块。有意思的是如果系统启动的Secure Boot激活，那么ubuntu18.04的kernel在启动的时候，要通过密码验证的这种方式加载kernel module，而N家的这个驱动并不是通过这种方式加载到内核中，所以我们无法check驱动了。最简单粗暴的方式就是在开机的启动项里面disable secure boot这个功能，Tinkpad T系列的电脑在开机的时候通过 F1进入BIOS设置，我们找到secure boot这个启动选项，然后按enter键，选择disable，按F10保存退出，重启电脑，这时我们在终端再输入 $nvidia-smi，看到下面的信息：

至此，驱动安装成功！

服务器ubtuntu系统也遇到同样的问题，解决：

安装CUDA

下载CUDA的run包（其他也可以）https://developer.nvidia.com/cuda-toolkit-archive

chmod +x cuda_10.0.130_410.48_linux.run

./cuda_10.0.130_410.48_linux.run

阅读一长串信息后，more，参照 https://blog.csdn.net/zbqhc/article/details/73277750 进行选择，如下：

---------------------

以上均为输出信息。

错误问题

Missing recommended library: libGLU.so
Missing recommended library: libX11.so
Missing recommended library: libXi.so
Missing recommended library: libXmu.so

解决 https://stackoverflow.com/questions/22360771/missing-recommended-library-libglu-so
---------------------
这个missing的问题我并没有解决，但是检测安装成功了。或许可以忽略这个错误问题，直接进行下一步。

添加～/.bashrc

export PATH="$PATH:/usr/local/cuda-10.0/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-10.0/lib64"

最好reboot一下。

检测cuda是否安装成功