Recently I ran into an issue where esxtop was showing a virtual machine that was ballooning even thou my ESXi host was in a high state (see this article for state information). A quick check of the guest operating system found the ballooning driver running but no additional memory than usual was used. (This virtual machine used a pretty static 3GB of RAM) I wanted to understand and duplicate this situation so I loaded up my home lab with two Linux virtual machines with 40GB of ram each (my ESXi hosts are 32GB each). I took the following steps:
- Power on each virtual machine and bring up top
- SSH to the ESXi host run esxtop and switch to Memory (M) and filter to virtual machines only (V) then add only the fields D, J, K, Q
- Monitor the normal load of the virtual machines and ESXi host for 10 minutes
During this time I found that one of my virtual machines was using 3GB of ram and the other was .5GB of RAM. So I started to apply pressure. Knowing that I needed to create a soft state to force ballooning I added 27GB of synthetic ram usage to a single operating system (using linux command stress). I found that I quickly moved into a hard state and ballooning and compression began. After two minutes I turned off the stress application using 27GB and allowed the virtual machine to return to 3GB used (ESXi host 3.5GB used). The screen shot below was taken after 10 minutes of no memory pressure:
As you can see we are still showing ballooning even thou the operating system top showed it had returned to the requested original value of 3GB:
So we had a problem the guest was not ballooning but esxtop was showing ballooning. I can assume from this that ballooning is not reclaimed until requested by the guest again. I found that if I initiated a 25GB request the ballooning metrics in esxtop would be removed. So ballooning without active soft state can indicate over provisioned ram on a guest and that a soft state once existed.
What about vMotion
I wanted to test the effects of this phantom ballooning on vMotion as expected the metric is 100% cleared after a vMotion and not set again unless soft state is achieved on the destination esxi host.