If you have been running VMware View 4.0 or greater the chances are that you have vSphere 4.0 U2 or vSphere 4.1 deployed in your VDI infrastructure. You may have VMware Infrastructure 3.5 U5 but I am hoping that at this stage you have already been trough the upgrade.
If you haven’t yet upgraded you will experience the behaviour I am explaining below when you upgrade to vSphere 4.0 and greater. If you have already upgraded your ESX hosts and the hosts are Nehalem based Xeon 5500 series CPUs, you may or may not have noticed that after the upgrade vCenter and esxtop show that the hosts are consuming more physical memory than before the upgrade even thou the number of virtual desktops have not changed.
The behaviour is not exclusive to VDI infrastructures, however in VDI, especially if 32-bit Windows XP is in use; there is a significant amount of TPS (Transparent Page Sharing) sharing across all virtual desktops. Because of a reduced TPS ratio across all hosts the behaviour is more evident in VDI infrastructures.
I must admit here that I may be out of my depth when compared to other bloggers such as Duncan Epping and Frank Denneman, who have written a book solely based on vSphere memory management techniques (HA and DRS deepdive).
Simply put, with vSphere 4.0 few changes have been introduced to the memory management techniques. TPS now, by default, only analyse 2MB memory blocks (large memory pages) to try to find common memory blocks. Only when the host server is under memory stress TPS will break those large memory blocks into smaller 4KB blocks to search for additional common memory blocks. This happens because vSphere relies on hardware-assisted memory virtualization, EPT(Intel) or RVI(AMD), to eliminate the need for shadow page tables, reducing kernel overhead.
VMware published a KB 1020524 about the behaviour – Transparent Page Sharing is not utilized under normal workloads on Nehalem based Xeon 5500 series CPUs.
I recently have been to a customer who after the upgrade to vSphere 4.1 noticed this memory consumption behaviour and asked me what was happening. I thought would be interesting to share.
The customer claimed that before the upgrade the memory consumption for each host in the View Cluster (8 hosts) was 70% and now is approximately 90%.
The esxtop screenshot above demonstrate the After the upgrade. Few things to observe here:
- Memory is in high state (6% or more memory available – no memory contention)
- MEMCTL (ballooning) is enabled
- ZIP (memory compression) is in use and almost 1GB is currently compressed
- SWAP is occurring and currently 637MB are swapped to disk
Interesting to note that normally, without memory contention, we should not see ZIP and SWAP. I did a little research and found out that when TPS is not sharing upfront it leads to memory contention that wouldn’t otherwise have happened. When large pages are broken down into small 4KB pages it will only try to combine these smaller bocks with other small blocks; not being able to share small blocks across all virtual desktops given that many large pages have not been broken. This leads to ballooning/swapping and possibly thrashing, as working the set memory accidentally gets swapped out.
After disabling Large Page support for Guest OS setting Mem.AllocGuestLargePage to 0 the customer immediately started to see memory consumption decreasing to previous numbers, approximately 70%.
The resxtop screenshot above demonstrate After the parameter change. Few things to observe here:
- ZIP (memory compression) is reducing
- Host SWAP is reducing
Instead of host setting it is also possible to turn off large pages support for each individual virtual machine setting monitor_control.disable_mmu_largepages = TRUE in the .VMX file.
Performance Hit when Disabling Large Pages support
Disabling Large Pages support will likely produce a CPU performance hit in your hosts. Duncan Epping discussed and tested this in his article Re: Large Pages (@gabvirtualworld @frankdenneman @forbesguthrie).
In Frank’s article he says: “Using Large pages shows a different memory usage level, but there is nothing to worry about. If memory demand exceeds the availability of memory, the VMkernel will resort to share-before-swap and compress-before-swap. Resulting in collapsed pages and reducing the memory pressure.”
Most VDI environments are memory bound, not CPU, therefore if you know that an additional 15% to 20% CPU hit is not a biggie you may decide to disable Large Pages support. The fact that only small pages are used will help to augment the memory consolidation ratio without hitting hard walls such as ZIP and SWAP.
My customer was experiencing SWAP and ZIP and didn’t have visibility of the memory % in use. They decided to keep large pages disabled given that their CPU utilisation was as low as 50%.
Even more Memory Consolidation
I previously wrote an article on how to Increase VDI consolidation ratio with TPS tuning on ESX (read the article for full context). Basically reducing the Mem.ShareScanTime from Default 60 to a lower number virtual machines will have their physical memory pages scanned faster and more common blocks may be found. As always, be aware of the CPU hit when optimizing memory advanced parameters.