×

ESXI8.0,虚拟机分配显卡报错:模块“DevicePowerOn”打开电源失败。解决方案...

hqy hqy 发表于2025-01-10 17:38:36 浏览41 评论0

抢沙发发表评论

报错:

mportant;">模块“DevicePowerOn”打开电源失败。

vmkernel.log:

2024-09-13T15:14:17.520Z In(182) vmkernel: cpu91:2102143)PCIPassthru: 4686: pcipDevInfo(0x4313bac015f0) allocated for 0000:4e:00.0

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=1 slot status=0x108.

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1497: 0000:4c:01.0: hotplug slot:0x2 (0000:4e:00.0) Adapter removed.

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu0:2097565)PCIEHP: 1049: 0000:4c:01.0: Disabling hotplug slot:0x2

2024-09-13T15:14:17.521Z In(182) vmkernel: cpu15:2097563)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=0 slot status=0x0.

2024-09-13T15:14:19.266Z In(182) vmkernel: cpu2:2098149)igbn: igbn_CheckRxHang:1414: vmnic1: false hang detected on RX queue 0

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu0:2097564)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=1 slot status=0x148.

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu0:2097564)PCIEHP: 1478: 0000:4c:01.0: hotplug slot:0x2 (0000:4e:00.0) Adapter inserted.

2024-09-13T15:14:19.843Z In(182) vmkernel: cpu15:2097563)PCIEHP: 1573: 0000:4c:01.0: hotplug slot:0x2: num reads=0 slot status=0x0.

2024-09-13T15:14:19.945Z In(182) vmkernel: cpu0:2097564)PCIEHP: 983: 0000:4c:01.0: Enabling hotplug slot:0x2

2024-09-13T15:14:19.945Z In(182) vmkernel: cpu0:2097564)PCIEHP: 638: 0000:4c:01.0: hotplug slot: 0x2: Prior device 0000:4e:00.0 was yanked

2024-09-13T15:14:19.945Z Wa(180) vmkwarning: cpu0:2097564)WARNING: PCIEHP: 641: 0000:4c:01.0: hotplug slot: 0x2: Device insertion detected while prior device 0000:4e:00.0 removal is still pending

尝试的解决办法:

  • BIOS开启above 4G

  • 设置EFI引导

  • 设置显卡直通

  • 配置高级参数:

    • pciPassthru.use64bitMMIO="TRUE"

    • 第二个参数需要进行一个简单的计算。计算打算传递给虚拟机的高端PCI设备数量,将该数字乘以16,然后向上取整到下一个2的幂。例如,如果使用两个设备进行直通,计算结果为:2 * 16 = 32,向上取整到下一个2的幂,得到64。对于一个设备,使用32。将此值用于第二项设置:

    • (如果没出现电源启动错误,但是开机后进不去系统又自动关机了,可以尝试把这个值调大。解决上面报错后我测试4张A100需要设置成512才能开机)

    • pciPassthru.64bitMMIOSizeGB="64"

        

然并卵

看到了相同的错误,解决方案如下:

1.开启exsi的ssh和shell:

2.输入:

esxcli system settings kernel set -s enablePCIEHotplug -v FALSE

        然后重启,重启之后可以输入以下命令验证PCIe设备热插拔是否已禁用:

esxcli system settings kernel list -o enablePCIEHotplug

        这样就是禁用了。

        再开机就可以成功了,记得给PCI设备重新设置直通,并且在虚拟机配置里把之前没识别到的PCI设备移除。

参考资料:

     machine-on-vmware-esxi-hyperviso.html" rel="nofollow" title="1.Virtual Machine On VMware ESXi Hypervisor Will Stop Responding or Fail to Power On When Configured With the NVIDIA A40/A10 PCIe Graphics Accelerator As a "Passthrough" Device (broadcom.com)" style="box-sizing: border-box; outline: none; margin: 0px; padding: 0px; text-decoration-line: none; cursor: pointer; color: rgb(78, 161, 219); font-synthesis-style: auto; overflow-wrap: break-word;">1.Virtual Machine On VMware ESXi Hypervisor Will Stop Responding or Fail to Power On When Configured With the NVIDIA A40/A10 PCIe Graphics Accelerator As a "Passthrough" Device (broadcom.com)

2.How to Enable Compute Accelerators on vSphere 6.5 for Machine Learning and Other HPC Workloads - Virtualize Applications (vmware.com)


打赏

本文链接:https://www.kinber.cn/post/4588.html 转载需授权!

分享到:


推荐本站淘宝优惠价购买喜欢的宝贝:

image.png

 您阅读本篇文章共花了: 

群贤毕至

访客