解决高版本的NVIDIA驱动导致Ubuntu桌面出不来的问题

     一台使用RTX3090 GPU卡的PC在对Ubuntu做apt-get upgrade后重启发现桌面出不来了,为了解决这个问题遇到了多个坑,记下来备忘。

     首先想退回去用旧版的GPU驱动,卸掉已有版本:

     sudo apt-get --purge remove "cuda*"
     sudo apt-get --purge remove "*nvidia*"

然后安装低版本的CUDA10的deb安装包之类,发现即使重启后也不起作用,执行nvidia-smi总是报错:

     Failed to initialize NVML: Driver/library version mismatch

那可能是和当然使用的linux kernel版本不匹配,直接安装deb包是不行的,需要使用源码编译出与当前kernel版本适配的ko,于是改成使用这种使用run文件方式安装:
 

wget https://developer.download.nvidia.com/compute/cuda/10.2/Prod/local_installers/cuda_10.2.89_440.33.01_linux.run

chmod +x cuda_10.2.89_440.33.01_linux.run
./cuda_10.2.89_440.33.01_linux.run

可以安装成功,但是重启系统后桌面还是进不去,切换到文字界面可以看到报错:

改成使用低版本的驱动程序安装则每次安装到最后都报错:

    ERROR: Unable to load the 'nvidia-drm' kernel module

按照网上别人说的一些办法,例如禁用BIOS的secure boot或者升级内核,解决内核和source版本的不一致等等办法通通没用,最后试着安装了一个cuda11.0里包含的driver版本450.80.02对应的run文件 NVIDIA-Linux-x86_64-450.80.02.run来安装却一次性成功了,这说明对于比较新的GPU,需要安装比较新的驱动才行,老版本的驱动安装不了,更不用说跑步起来了。

既然驱动也要是对应于cuda11以上的版本,直接使用cuda11.1.1(RTX30序列GPU好像需要11.1.1或者以上版本才能正常工作)安装更好,但是目前最好不要使用最新的cuda11.3或者cuda11.4,因为像pytorch这样的工具还根本不支持,盲目安装高版本不是啥好事,够用就行。

解决驱动版本的选择问题后,开机启动后还是gdm桌面出不来,看网上有人说gdm3对于最新的NVIDIA的驱动支持不好, 于是安装lightdm 显示管理服务器和Unity桌面:

     sudo apt-get install lightdm unity

安装过程中确认选择lightdm为默认的Display Manager,而不是gdm3(事后需要切换时,可以使用dpkg-reconfigure lightdm) ,然后重启时发现桌面出不来,那个Ubuntu的标记总是在那个动,就是始终桌面出不来:

 

检查状态:

root@ubuntu-rtx3090:~# systemctl status lightdm
● lightdm.service - Light Display Manager
   Loaded: loaded (/lib/systemd/system/lightdm.service; indirect; vendor preset: enabled)
   Active: failed (Result: exit-code) since Wed 2021-08-25 19:15:17 CST; 7min ago
     Docs: man:lightdm(1)
  Process: 1246 ExecStart=/usr/sbin/lightdm (code=exited, status=1/FAILURE)
  Process: 1243 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
 Main PID: 1246 (code=exited, status=1/FAILURE)

8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Service hold-off time over, scheduling restart.
8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Scheduled restart job, restart counter is at 5.
8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: Stopped Light Display Manager.
8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Start request repeated too quickly.
8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: lightdm.service: Failed with result 'exit-code'.
8月 25 19:15:17 ubuntu-rtx3090 systemd[1]: Failed to start Light Display Manager.

apt policy lightdm
lightdm:
  Installed: 1.26.0-0ubuntu1
  Candidate: 1.26.0-0ubuntu1
  Version table:
 *** 1.26.0-0ubuntu1 500
        500 http://mirrors.aliyun.com/ubuntu bionic/universe amd64 Packages
        100 /var/lib/dpkg/status
        
root@ubuntu-rtx3090:~# lightdm --test-mode --debug
Failed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a group

root@ubuntu-rtx3090:~# lightdm --show-config
Failed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a group

 

从Failed to load configuration from /etc/lightdm/lightdm.conf: Key file does not start with a group来看/etc/lightdm/lightdm.conf有问题,打开一看,发现只有一行:

     greeter-session=unity-greeter

加上Seat组才是正确的:

    [Seat:*]
    greeter-session=unity-greeter

再执行 lightdm --show-config 就能正常输出了:

root@ubuntu-rtx3090:~# lightdm --show-config
   [Seat:*]
A  allow-guest=false
C  greeter-wrapper=/usr/lib/lightdm/lightdm-greeter-session
D  guest-wrapper=/usr/lib/lightdm/lightdm-guest-session
G  user-session=unity
F  greeter-show-manual-login=true
I  greeter-session=unity-greeter
F  all-guest=false
H  xserver-command=X -core

   [LightDM]
B  backup-logs=false

Sources:
A  /usr/share/lightdm/lightdm.conf.d/50-disable-guest.conf
B  /usr/share/lightdm/lightdm.conf.d/50-disable-log-backup.conf
C  /usr/share/lightdm/lightdm.conf.d/50-greeter-wrapper.conf
D  /usr/share/lightdm/lightdm.conf.d/50-guest-wrapper.conf
E  /usr/share/lightdm/lightdm.conf.d/50-ubuntu.conf
F  /usr/share/lightdm/lightdm.conf.d/50-unity-greeter.conf
G  /usr/share/lightdm/lightdm.conf.d/50-unity.conf
H  /usr/share/lightdm/lightdm.conf.d/50-xserver-command.conf
I  /etc/lightdm/lightdm.conf

从上面还可以看出,对于lightdm的多个配置文件的优先级,显然/etc/lightdm/lightdm.conf有最高优先级,它里面的设置覆盖前面的所有配置文件,因为lightdm读取配置文件的顺序是 A->I

再重启lightdm: sudo systemctl restart lightdm,发现服务正常了:

root@ubuntu-rtx3090:~# systemctl status lightdm
● lightdm.service - Light Display Manager
   Loaded: loaded (/lib/systemd/system/lightdm.service; indirect; vendor preset: enabled)
   Active: active (running) since Wed 2021-08-25 19:50:22 CST; 3min 1s ago
     Docs: man:lightdm(1)
  Process: 1088 ExecStartPre=/bin/sh -c [ "$(basename $(cat /etc/X11/default-display-manager 2>/dev/null))" = "lightdm" ] (code=exited, status=0/SUCCESS)
 Main PID: 1096 (lightdm)
    Tasks: 6 (limit: 4915)
   CGroup: /system.slice/lightdm.service
           ├─1096 /usr/sbin/lightdm
           ├─1115 /usr/lib/xorg/Xorg -core :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch
           └─1564 lightdm --session-child 12 19

8月 25 19:50:21 ubuntu-rtx3090 systemd[1]: Starting Light Display Manager...
8月 25 19:50:22 ubuntu-rtx3090 systemd[1]: Started Light Display Manager.
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:setcred): (null): pam_sm_setcred
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:setcred): (null): pam_sm_setcred
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_unix(lightdm-greeter:session): session opened for user lightdm by (uid=0)
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:session): (null): pam_sm_open_session
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet(lightdm-greeter:session): pam_kwallet: open_session called without kwallet_key
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:session): (null): pam_sm_open_session
8月 25 19:50:23 ubuntu-rtx3090 lightdm[1220]: pam_kwallet5(lightdm-greeter:session): pam_kwallet5: open_session called without kwallet5_key


root@ubuntu-rtx3090:~# lightdm --test-mode --debug
[+0.00s] DEBUG: Logging to /var/log/lightdm/lightdm.log
[+0.00s] DEBUG: Starting Light Display Manager 1.26.0, UID=0 PID=2573
[+0.00s] DEBUG: Loading configuration dirs from /var/lib/snapd/desktop/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /usr/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-disable-guest.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-disable-log-backup.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-greeter-wrapper.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-guest-wrapper.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-ubuntu.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-unity-greeter.conf
[+0.00s] DEBUG:   [Seat:*] contains unknown option all-guest
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-unity.conf
[+0.00s] DEBUG: Loading configuration from /usr/share/lightdm/lightdm.conf.d/50-xserver-command.conf
[+0.00s] DEBUG: Loading configuration dirs from /usr/local/share/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration dirs from /etc/xdg/lightdm/lightdm.conf.d
[+0.00s] DEBUG: Loading configuration from /etc/lightdm/lightdm.conf
[+0.00s] DEBUG: Registered seat module local
[+0.00s] DEBUG: Registered seat module xremote
[+0.00s] DEBUG: Registered seat module unity
[+0.00s] DEBUG: Using D-Bus name org.freedesktop.DisplayManager
[+0.01s] DEBUG: Monitoring logind for seats
[+0.01s] DEBUG: New seat added from logind: seat0
[+0.01s] DEBUG: Seat seat0: Loading properties from config section Seat:*
[+0.01s] DEBUG: Seat seat0: Starting
[+0.01s] DEBUG: Seat seat0: Creating greeter session
[+0.01s] DEBUG: Seat seat0: Creating display server of type x
[+0.01s] DEBUG: Using VT 7
[+0.01s] DEBUG: Seat seat0: Starting local X display on VT 7
[+0.01s] DEBUG: XServer 1: Logging to /var/log/lightdm/x-1.log
[+0.01s] DEBUG: XServer 1: Writing X server authority to /var/run/lightdm/root/:1
[+0.01s] DEBUG: XServer 1: Launching X Server
[+0.01s] DEBUG: Launching process 2578: /usr/bin/X -core :1 -seat seat0 -auth /var/run/lightdm/root/:1 -nolisten tcp vt7 -novtswitch
[+0.01s] DEBUG: XServer 1: Waiting for ready signal from X server :1
[+0.01s] DEBUG: Acquired bus name org.freedesktop.DisplayManager
[+0.01s] DEBUG: Registering seat with bus path /org/freedesktop/DisplayManager/Seat0
[+0.01s] DEBUG: Loading users from org.freedesktop.Accounts
[+0.01s] DEBUG: User /org/freedesktop/Accounts/User1000 added
Failed to use bus name org.freedesktop.DisplayManager, do you have appropriate permissions?

不过登录界面unity-greeter还是没有出来,使用gdm3为Display Manager时gdm3的服务使用systemctl status gdm3 查看也是能正常启动了的,就是登录窗口greeter出不来,像使用lightdm时,最后就是停留在这里:

 折腾了很久,包括安装和在lightdm.conf里配置了lightdm-gtk-greeter和强制设置greeter-show-manual-login=true,还是看不到登录界面出来,

[Seat:*]
greeter-session=lightdm-gtk-greeter
greeter-show-manual-login=true
allow-guest=false

猜测是不是gdm3和lightdm的greeter窗口在最新的GPU驱动桌linux内核下都不能正常显示,那么我跳过登录让系统自动登录进入桌面,结果如何呢?于是在/etc/lightdm/lightdm.conf里增加一行(我登录的用户名是ubuntu):

     autologin-user=ubuntu

再重启系统,终于能看到久违的unity桌面了!

经试验,下面这些设置有没有都没关系:

autologin-guest=false 
autologin-user-timeout=0
autologin-session=lightdm-autologin

因解决问题中可能需要升级内核版本,附录一下如何安装和删除指定版本的内核及相关命令:

uname -r
lsb_release -a
#查看当前已经安装的 Kernel Image
dpkg --get-selections |grep linux-image
#查询当前软件仓库可以安装的 Kernel Image 版本
apt-cache search linux | grep linux-image
#安装指定版本的 Kernel Image 和 Kernel Header
apt-get install linux-headers-5.4.0-81-generic linux-image-5.4.0-81-generic

Building module:
cleaning build area...
'make' -j24 NV_EXCLUDE_BUILD_MODULES='' KERNEL_UNAME=5.4.0-81-generic IGNORE_CC_MISMATCH='' modules.....
Signing module:
 - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-modeset.ko
 - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-drm.ko
 - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia.ko
 - /var/lib/dkms/nvidia/450.80.02/5.4.0-81-generic/x86_64/module/nvidia-uvm.ko
Secure Boot not enabled on this system.
cleaning build area...

DKMS: build completed.

nvidia.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/

nvidia-uvm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/

nvidia-modeset.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/

nvidia-drm.ko:
Running module version sanity check.
 - Original module
   - No original module exists within this kernel
 - Installation
   - Installing to /lib/modules/5.4.0-81-generic/updates/dkms/

depmod...

DKMS: install completed.
   ...done.
Processing triggers for linux-image-5.4.0-81-generic (5.4.0-81.91~18.04.1) ...
/etc/kernel/postinst.d/dkms:
 * dkms: running auto installation service for kernel 5.4.0-81-generic
   ...done.
/etc/kernel/postinst.d/initramfs-tools:
update-initramfs: Generating /boot/initrd.img-5.4.0-81-generic
/etc/kernel/postinst.d/zz-update-grub:
Sourcing file `/etc/default/grub'                    ### 自动执行update-grub
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.4.0-81-generic
Found initrd image: /boot/initrd.img-5.4.0-81-generic
Found linux image: /boot/vmlinuz-5.4.0-72-generic
Found initrd image: /boot/initrd.img-5.4.0-72-generic
Found linux image: /boot/vmlinuz-5.4.0-53-generic
Found initrd image: /boot/initrd.img-5.4.0-53-generic
Adding boot menu entry for EFI firmware configuration
done

查看当前的 Kernel 列表
grep menuentry /boot/grub/grub.cfg
修改 Kernel 的启动顺序:如果安装的是最新的版本,那么默认就是首选的;
如果安装的是旧版本,就需要修改 grub 配置
vi /etc/default/grub
生效配置
update-grub

  • 1
    点赞
  • 17
    收藏
    觉得还不错? 一键收藏
  • 打赏
    打赏
  • 0
    评论
您好!对于在 Ubuntu 上安装 Nvidia 显卡驱动,您可以按照以下步骤进行操作: 1. 首先,打开终端,通过以下命令禁用 Nouveau 开源驱动: ``` sudo echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf sudo echo "options nouveau modeset=0" >> /etc/modprobe.d/blacklist.conf sudo update-initramfs -u ``` 2. 安装依赖项: ``` sudo apt-get update sudo apt-get install build-essential gcc make ``` 3. 下载 Nvidia 官方驱动: 您可以从 Nvidia 官方网站下载适用于您的显卡型号的 Linux 驱动程序。请确保下载.run 文件。 4. 关闭桌面环境: 在安装过程中,需要在文本模式下运行驱动程序。可以按下 Ctrl + Alt + F3 进入文本模式。 5. 运行安装程序: 使用 cd 命令导航到您下载驱动程序的目录,并使用以下命令将其设置为可执行: ``` chmod +x <驱动程序文件名>.run ``` 然后运行安装程序: ``` sudo ./<驱动程序文件名>.run ``` 6. 按照安装程序的提示进行操作: 在安装过程中,可能需要接受许可协议并进行其他配置。请按照屏幕上的说明完成安装。 7. 重新启动系统: 安装完成后,使用以下命令重新启动系统: ``` sudo reboot ``` 这样,您就应该成功安装了 Nvidia 显卡驱动程序。请记住,在更新内核或 Ubuntu 版本时,可能需要重新安装驱动程序以确保与新内核版本的兼容性。 希望这些信息对您有帮助!如果您有其他问题,请随时提问。

“相关推荐”对你有帮助么?

  • 非常没帮助
  • 没帮助
  • 一般
  • 有帮助
  • 非常有帮助
提交
评论
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包

打赏作者

Arnold-FY-Chen

你的鼓励将是我创作的最大动力

¥1 ¥2 ¥4 ¥6 ¥10 ¥20
扫码支付:¥1
获取中
扫码支付

您的余额不足,请更换扫码支付或充值

打赏作者

实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值