Update: VMware has posted the following KB and there is a really good article by Comac Hogan on the matter. I have also posted a PowerCLI script to resolve the issue.
Yesterday I was alerted to the fact that there was a change in the VMware 5.5 U2 heartbeat method. In U2 and vSphere 6 it now uses ATS on VAAI enabled arrays to do heartbeats. Some arrays are experiencing outages due to this change. It’s not clear to me what array are exactly effected other than IBM has posted an article here. It seems to cause one of the following symptoms : Host disconnects from vCenter or storage disconnects from host. As you can see one of these (storage) is a critical problem creating an all paths down situation potentially.
The fix suggested by IBM disabled the ATS lock method and returns it to pre U2 methods. It’s my understanding that this is an advanced setting that can be applied without a reboot. I have also been told that if you create this advanced setting it will be applied via host profile or powercli.
It is very early in the process in all accounts you should open a VMware ticket to get their advice on how to deal with this issue. They are working on the problem and should produce a KB when possible with more information. I personally would not apply this setting unless you are experiencing the issue as identified by VMware. I wish I had more information but it has not happened in my environment.
Post comments if you are experiencing this issue with more information. I will update the article once the KB is posted.
2 Replies to “Change in VMware 5.5 U2 ATS can cause storage outages!”
I don’t see the setting in 5.5 u2 host profile to disable ats for heartbeat operations only as you mention.
Doing it via cli I am not sure I understand how to accomplish
Thanks for reading. I would be very careful applying the settings in the IBM document without VMware review and approval. It’s very early in this bug and the exact scope is not yet known. My tests have shown the only way to apply the setting is via the console at this time but again don’t apply it without VMware support confirming the course of action.