Unable to run vGPU manager on Dell R740 and ESX

Currently we get almost daily complaints from customers that they are stuck with their GRID deployment on the new Dell R740.

Symptom:

After installing the vGPU manager VIB file the vGPU manager doesn’t run properly.

nvidia-smi is not working:

[root@esx65:~] nvidia-smi

Failed to initialize NVML: Unknown Error

[root@esx65:~]

[root@esx65:~] vmkload_mod nvidia

Module nvidia loaded successfully

[root@esx65:~] lspci | grep -i nvidia

0000:3d:00.0 Display controller: NVIDIA Corporation NVIDIATesla  M10 [vmgfx0]

0000:3e:00.0 Display controller: NVIDIA Corporation NVIDIATesla  M10 [vmgfx1]

0000:3f:00.0 Display controller: NVIDIA Corporation NVIDIATesla  M10 [vmgfx2]

0000:40:00.0 Display controller: NVIDIA Corporation NVIDIATesla  M10 [vmgfx3]

[root@esx65:~]

If you run dmesg command you may see an issue with IOMMU:

This indicates that there is an issue with IOMMU settings in the SBIOS

Solution:

The default settings for IOMMU need to be modified:

New settings should look like this:

 The MMIO Base value needs to be less than 16TB.

For detailed explanation why this is necessary you can have a look here:

https://kb.vmware.com/s/article/2142307

About the Author

sschaber

GRID Solution Architect and owner of this Blog site.

2 thoughts on “Unable to run vGPU manager on Dell R740 and ESX

Leave a Reply

Your email address will not be published. Required fields are marked *