Currently we get almost daily complaints from customers that they are stuck with their GRID deployment on the new Dell R740.
Symptom:
After installing the vGPU manager VIB file the vGPU manager doesn’t run properly.
nvidia-smi is not working:
[root@esx65:~] nvidia-smi
Failed to initialize NVML: Unknown Error
[root@esx65:~]
[root@esx65:~] vmkload_mod nvidia
Module nvidia loaded successfully
[root@esx65:~] lspci | grep -i nvidia
0000:3d:00.0 Display controller: NVIDIA Corporation NVIDIATesla M10 [vmgfx0]
0000:3e:00.0 Display controller: NVIDIA Corporation NVIDIATesla M10 [vmgfx1]
0000:3f:00.0 Display controller: NVIDIA Corporation NVIDIATesla M10 [vmgfx2]
0000:40:00.0 Display controller: NVIDIA Corporation NVIDIATesla M10 [vmgfx3]
[root@esx65:~]
If you run dmesg command you may see an issue with IOMMU:
This indicates that there is an issue with IOMMU settings in the SBIOS
Solution:
The default settings for IOMMU need to be modified:
New settings should look like this:
The MMIO Base value needs to be less than 16TB.
For detailed explanation why this is necessary you can have a look here:
Thanks Simon! Saved me some time tonight!
Agree. Already talked to Dell a few months ago but it seems they are not willing to set another default