安装NVIDIA显卡驱动官网地址:https://www.nvidia.cn/Download/index.aspx?lang=cn
yum -y install gcc kernel-devel kernel-headers
kernel-devel用于编译和开发内核模块所需的头文件和内核配置文件。
kernel-headers包含内核的头文件,提供内核信息。
查看内核版本与kernel版本是否一致
[root@lmri209 data209]# ll /boot | grep vmlinu
-rwxr-xr-x. 1 root root 6769256 7月 12 16:21 vmlinuz-0-rescue-e378c0b8d7e4472ab18035149df9a0e9
-rwxr-xr-x. 1 root root 6769256 10月 20 2020 vmlinuz-3.10.0-1160.el7.x86_64
(base) [root@lmri211 ~]# rpm -aq |grep kernel-devel
kernel-devel-3.10.0-1160.71.1.el7.x86_64
一定要yum对应版本的kernel-devel,否则后续编译会报错
[root@lmri209 lib]# sed -i 's/^\(blacklist nvidiafb=.*\)$/#\1/' /lib/modprobe.d/dist-blacklist.conf #屏蔽blacklist nvidiafb
[root@lmri209 lib]# echo -e "blacklist nouveau\noptions nouveau modeset=0" | sudo tee -a /lib/modprobe.d/dist-blacklist.conf #在/lib/modprobe.d/dist-blacklist.conf文件中添加blacklist nouveau和noptions nouveau modeset=0
重建initramfs image
并设置启动格式
[root@lmri209 lib]# mv /boot/initramfs-$(uname -r).img /boot/initramfs-$(uname -r).img.bak
[root@lmri209 lib]# dracut /boot/initramfs-$(uname -r).img $(uname -r)
[root@lmri209 lib]# systemctl set-default multi-user.target
Removed symlink /etc/systemd/system/default.target.
Created symlink from /etc/systemd/system/default.target to /usr/lib/systemd/system/multi-user.target. #设置启动级别,系统启动后会进入一个文本模式的终端,没有图形用户界面(GUI)
[root@lmri209 lib]# reboot #重启
重启后验证nouveau是否被禁用
[root@lmri209 ~]# lsmod | grep nouveau #检查nouveau是否被禁用
cd进入目录后为文件添加执行权限
[root@lmri209 ~]# cd ~
[root@lmri209 ~]# chmod +x NVIDIA-Linux-x86_64-535.183.06.run
[root@lmri209 ~]# ./NVIDIA-Linux-x86_64-535.183.06.run
Q. ERROR: Unable to find the kernel source tree for the currently running kernel. Please make sure you have installed the kernel source files for your kernel and that they are properly configured; on Red Hat Linux systems, for example, be sure you have the 'kernel-source' or 'kernel-devel' RPM installed.
Q1.
cat /var/log/nvidia-installer.log日志文件发现并没有明显报错
查看版本发现是kernel的问题,那大概率是内核与kernel-header或kernel-devel版本不匹配
[root@lmri209 ~]# uname -r #查看内核版本
3.10.0-1160.el7.x86_64
[root@lmri209 ~]# rpm -qa | grep kernel #查看当前下载的kernel的包
abrt-addon-kerneloops-2.1.11-60.el7.centos.x86_64
kernel-3.10.0-1160.el7.x86_64
kernel-headers-3.10.0-1160.119.1.el7.x86_64
kernel-tools-libs-3.10.0-1160.el7.x86_64
kernel-tools-3.10.0-1160.el7.x86_64
kernel-devel-3.10.0-1160.119.1.el7.x86_64
[root@lmri209 ~]# yum -y install kernel-devel-$(uname -r) #yum对应内核版本的kernel-devel包
[root@lmri209 ~]# yum -y install kernel-header-$(uname -r) #yum对应内核版本的kernel-header包
Q2.
cat /var/log/nvidia-installer.log日志文件后发现有些关键的报错信息
‘stack-protector enabled but compiler support broken’
‘CONFIG_RETPOLINE=y, but not supported by the compiler. Compiler update recommended.. Stop.’
这些错误表明当前的编译器版本不支持某些内核配置选项,如 CONFIG_RETPOLINE
和 stack-protector,我们需要在/usr/src/kernels/$(uname -r)/include/config/auto.conf以及/usr/src/kernels/$(uname -r)/.config注释掉CONFIG_RETPOLINE=y以及CONFIG_CC_STACKPROTECTOR=y这些选项
。
[root@lmri209 ~]# cat /usr/src/kernels/$(uname -r)/.config | grep CONFIG_RETPOLINE #查看.config文件中CONFIG_RETPOLINE参数的情况
CONFIG_RETPOLINE=y
[root@lmri209 ~]# cat /usr/src/kernels/$(uname -r)/.config | grep CONFIG_CC_STACKPROTECTOR #查看.config文件中CONFIG_CC_STACKPROTECTOR参数的情况
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
CONFIG_CC_STACKPROTECTOR_STRONG=y
[root@lmri209 kernels]# cat /usr/src/kernels/$(uname -r)/include/config/auto.conf | grep \CONFIG_RETPOLINE #查看auto.conf文件中CONFIG_RETPOLINE参数的情况
CONFIG_RETPOLINE=y
[root@lmri209 kernels]# cat /usr/src/kernels/$(uname -r)/include/config/auto.conf | grep \CONFIG_CC_STACKPROTECTOR #查看auto.conf文件中CONFIG_CC_STACKPROTECTOR参数的情况
CONFIG_CC_STACKPROTECTOR=y
# CONFIG_CC_STACKPROTECTOR_NONE is not set
# CONFIG_CC_STACKPROTECTOR_REGULAR is not set
CONFIG_CC_STACKPROTECTOR_STRONG=y
[root@lmri209 kernels]# sed -i 's/^\(CONFIG_RETPOLINE=.*\)$/#\1/' /usr/src/kernels/$(uname -r)/include/config/auto.conf /usr/src/kernels/$(uname -r)/.config
[root@lmri209 kernels]# sed -i 's/^\(CONFIG_CC_STACKPROTECTOR=.*\)$/#\1/' /usr/src/kernels/$(uname -r)/include/config/auto.conf /usr/src/kernels/$(uname -r)/.config #使用sed命令在两个文件中同时进行文本替换
注意,在添加这些路径时在/kernels的下一级目录最好使用$uname -r变量代替,因为kernels目录下可能有多个其他内核文件,只有修改当前版本的文件才能生效。
[root@lmri209 kernels]# ll
总用量 8
drwxr-xr-x 22 root root 4096 7月 18 20:34 3.10.0-1160.119.1.el7.x86_64
drwxr-xr-x 22 root root 4096 7月 19 09:51 3.10.0-1160.el7.x86_64
Q3. Warning: Compiler version check failed
cat /var/log/nvidia-installer.log日志文件后发现这些信息
Warning: Compiler version check failed: The major and minor number of the compiler used to compile the kernel: gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC) does not match the compiler used here: cc (conda-forge gcc 12.3.0-13) 12.3.0 Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. It is recommended to set the CC environment variable to the compiler that was used to compile the kernel. To skip the test and silence this warning message, set the IGNORE_CC_MISMATCH environment variable to "1". However, mixing compiler versions between the kernel and kernel modules can result in subtle bugs that are difficult to diagnose. *** Failed CC version check. ***
说明当前使用的编译器不匹配,从报错信息上发现内核需要gcc 4.8.5版本,但是目前使用的gcc版本是12.3.0,一般系统自带的gcc版本在/usr/bin中
[root@lmri209 kernels]# cd /usr/bin
[root@lmri209 bin]# ll | grep gcc #检查目录下是否存在gcc
-rwxr-xr-x 2 root root 768608 9月 30 2020 gcc
-rwxr-xr-x 1 root root 27088 9月 30 2020 gcc-ar
-rwxr-xr-x 1 root root 27088 9月 30 2020 gcc-nm
-rwxr-xr-x 1 root root 27088 9月 30 2020 gcc-ranlib
-rwxr-xr-x 2 root root 768608 9月 30 2020 x86_64-redhat-linux-gcc
[root@lmri209 bin]# export CC=/usr/bin/gcc #将路径下的gcc版本设定为临时编译器