ESXI8.0,虚拟机分配显卡报错:模块“DevicePowerOn”打开电源失败。解决方案...

报错:

模块“DevicePowerOn”打开电源失败。

vmkernel.log:

2024-09-13T15:14:17.520Z In(182) vmkernel: cpu91:2102143)PCIPassthru: 4686: pcipDevInfo(0x4313bac015f0) allocated for 0000:4e:00.0

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=1 slot status=0x108.

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1497: 0000:4c:01.0: hotplug slot:0x2 (0000:4e:00.0) Adapter removed.

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1049: 0000:4c:01.0: Disabling hotplug slot:0x2

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu15:2097563)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=0 slot status=0x0.

2024-09-13T15:14:19.266Z In(182) vmkernel: cpu2:2098149)igbn: igbn_CheckRxHang:1414: vmnic1: false hang detected on RX queue 0

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu0:2097564)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=1 slot status=0x148.

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu0:2097564)PCIEHP: 1478: 0000:4c:01.0: hotplug slot:0x2 (0000:4e:00.0) Adapter inserted.

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu15:2097563)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=0 slot status=0x0.

2024-09-13T15:14:19.945Z In(182) vmkernel: cpu0:2097564)PCIEHP: 983: 0000:4c:01.0: Enabling hotplug slot:0x2

2024-09-13T15:14:19.945Z In(182) vmkernel: cpu0:2097564)PCIEHP: 638: 0000:4c:01.0: hotplug slot: 0x2: Prior device 0000:4e:00.0 was yanked

2024-09-13T15:14:19.945Z Wa(180) vmkwarning: cpu0:2097564)WARNING: PCIEHP: 641: 0000:4c:01.0: hotplug slot: 0x2: Device insertion detected while prior device 0000:4e:00.0 removal is still pending

尝试的解决办法:

  • BIOS开启above 4G
  • 设置EFI引导
  • 设置显卡直通
  • 配置高级参数:
    • pciPassthru.use64bitMMIO="TRUE"
    • 第二个参数需要进行一个简单的计算。计算打算传递给虚拟机的高端PCI设备数量,将该数字乘以16,然后向上取整到下一个2的幂。例如,如果使用两个设备进行直通,计算结果为:2 * 16 = 32,向上取整到下一个2的幂,得到64。对于一个设备,使用32。将此值用于第二项设置:
    • (如果没出现电源启动错误,但是开机后进不去系统又自动关机了,可以尝试把这个值调大。解决上面报错后我测试4张A100需要设置成512才能开机)
    • pciPassthru.64bitMMIOSizeGB="64"

        

然并卵

看到了相同的错误,解决方案如下:

1.开启exsi的ssh和shell:

2.输入:

esxcli system settings kernel set -s enablePCIEHotplug -v FALSE

        然后重启,重启之后可以输入以下命令验证PCIe设备热插拔是否已禁用:

esxcli system settings kernel list -o enablePCIEHotplug

        这样就是禁用了。

        再开机就可以成功了,记得给PCI设备重新设置直通,并且在虚拟机配置里把之前没识别到的PCI设备移除。

参考资料:

     1.Virtual Machine On VMware ESXi Hypervisor Will Stop Responding or Fail to Power On When Configured With the NVIDIA A40/A10 PCIe Graphics Accelerator As a "Passthrough" Device (broadcom.com)

2.How to Enable Compute Accelerators on vSphere 6.5 for Machine Learning and Other HPC Workloads - Virtualize Applications (vmware.com)

评论 1
添加红包

请填写红包祝福语或标题

红包个数最小为10个

红包金额最低5元

当前余额3.43前往充值 >
需支付:10.00
成就一亿技术人!
领取后你会自动成为博主和红包主的粉丝 规则
hope_wisdom
发出的红包
实付
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、付费专栏及课程。

余额充值