The goal of this action was to upgrade a cluster with nvidia graphics card from 6.7 to 7.0.3. Before the upgrade everything worked fine. You where able to live vmotion vm:s with vgpu to another host.
Problem:
Everything worked from vm side, but was not able to vmotion live vm:s between hosts.
Steps:
- Upgrade esxi
- Upgrade nvidia driver
- Start vm:s
- Vmotion failed between hosts
Solution:
This change must be applied on every host in the cluster that you want to live vmotion machines in between.
How to see if ECC is enabled or disabled.
nvidia-smi -q | grep "Ecc Mode" -A2

To disable
If your Current config is enabled you need to disable ECC, and you do this with this command:
nvidia-smi -e 0
And reboot is required!
And do not forget 1 host like I did! 🙂
Thats it!
//Roger
Leave a Reply