“Common knowledge is knowledge that is known by everyone or nearly everyone, usually with reference to the community in which the term is used.” – Source: Wikipedia.
It is typical to have sets of common knowledge that guide best practices. However, common knowledge isn’t always correct and people tend to embrace common knowledge as the only and absolute truth.
I have been longing this article for a long time and I have finally found time (I’m stranded at SFO airport) and resources to prove that many times common knowledge is not correct.
In the VDI context, Windows XP has been used as Guest OS for a long time. Over this period vendors, architects and customers were able to familiarize themselves with the behavior of vSphere TPS (Transparent Page Sharing) and Windows XP. Common knowledge quantifies Windows XP TPS savings of ~40% in VDI environments.
Windows 7 came along and with it a new feature called ASLR (Address Space Layout Randomization). Address space randomization hinders some types of security attacks by making it more difficult for an attacker to predict target addresses.
ASLR is an important security feature when combined with NX (DEP), and by disabling one or the other will make Windows 7 VMs more easily exploitable by security vulnerabilities.
While it is true that on Windows ASLR requires some code page contents to be modified, the common knowledge tell us that the effects on TPS are empirical. This means, it’s not possible to predict the amount of TPS savings at a given point in time – even if we are trying to calculate the average. Because of that, consultants, vendors and architects have been using safety numbers around 10% – 15% as ballpark numbers for designing VDI solutions with Windows 7 and vSphere.
With the OS change there is a loss of approximately 30% on TPS savings. In business values this means that VDI infrastructures will run ~30% less sessions before the hypervizor start to run into contention (memory compression or host swap).
There are many blog references regarding ASLR and how it impacts TPS. Meanwhile, there are many references of how to disable Windows ASLR feature by setting the following registry key to 0.
\HKLM\SYSTEM\CurrentControlSet\Control\Session Manager\Memory Management\MoveImages
MoveImages key is described in “Windows Internals”, but that doesn’t make it a documented feature supported by Microsoft.
Is it a good practice to disable ASLR?
Despite I’ve not had a great deal of time to determine the difference with or without ASLR, the short answer is No. Unless you are pushing very high levels of memory overcommit in a 32-bit desktop VDI environment, you have a lot more to lose than to gain from disabling ASLR. On 64-bit platforms the loss of opportunities to share pages is much less due to the large memory page nature.
Having set baseline for this discussion I would like to mention that I have not seen any solid test or research that quantifies the amount of savings provided by transparent page sharing when in use with Windows 7 virtual desktops, with ALSR enabled (default).
To create valid tests it is important that a baseline is always maintained. In this case, during tests, the host used was a Cisco UCS B200 with 48GB RAM and two Quad-Core CPU’s at 2.526MHz. The virtual desktops are running Windows 7 with 1GB RAM.
The workload was generated using LoginVSI with the Default Heavy profile. Due to the nature of LoginVSI workload the results presented below are valid to demonstrate virtual desktops that run similar applications on the same host, such as task workers in call center or office branch.
To maintain host affinity during tests a DRS group (Must run on) was created and pinned to the server host. The next step was to add virtual desktops to the DRS group and watch VMs being VMotion(ed) to the chosen host. DRS was set to aggressive to make sure VM not belonging to the DRS group were properly evacuated.
Out of 100 VMs a total of 54 VMs were VMotion(ed) to the server host until the point where the the host started to run into memory contention. At this point the total amount of memory saving due to TPS was 13760/MB, or 28%.
The next step was to watch TPS behavior during the next one hour for which LoginVSI was continuously running workload tests. I did not notice at any point in time during the tests a behavior where memory footprint increased due to ASLR functionality (memory randomization).
As time went by TPS started to collapse more small memory blocks and reduced the total memory footprint in use. After one hour the savings due to TPS were 16364/MB, or 33%. The savings provided by TPS would probably allow me to add additional four or five virtual desktops. See stats below.
At this point in time I was convinced that Windows ASLR does impact how TPS collapse memory blocks, however not the way Common Knowledge tell us. For this constant and similar workload the savings averaged 30%.
VDI is in a very cost sensitive solution. The higher the consolidation ratio more cost effective the solution will present itself. So, why not go further and see what kind of awesomeness is TPS likely to deliver if settings are tweaked a little bit.
In the past I have written about TPS, Large Memory Pages and your VDI environment and Increase VDI consolidation ratio with TPS tuning on ESX. I recommend tweaking Mem.AllocGuestLargePage and Mem.ShareScanTime for higher consolidation rates. After the changes, vSphere started to collapse more memory blocks and more aggressively.
Collapsing memory blocks more aggressively does put an extra overhead on CPU. However most VDI deployments are memory bound, not CPU – so is this one. Therefore, using more CPU was not really an issue here.
As time went by the savings provided by TPS freed up a total of 11267MB RAM that would allow me increase consolidation ratio from 54 to probably 64 virtual desktops per host (17%).
I run those tests a while back and unfortunately I misplaced the screenshot that would demonstrate the stats on PSHARE/MB. I thought the results were remarkable and decided to share the results anyway.
Windows 7 ASLR does impact vSphere’s ability to collapse memory blocks via Transparent Page Sharing. However, the numbers attained during tests are better than Common Knowledge. With a constant workload generated via LoginVSI, TPS achieved a 30% average memory saving, without vSphere tweaking.
With tweaking of vSphere hosts mentioned in TPS, Large Memory Pages and your VDI environment and Increase VDI consolidation ratio with TPS tuning on ESX, TPS savings should average roughly 40% for this specific workload.
Every workload is different, and even similar workloads may behave differently. I recommended anyone planning to implement the mentioned host changes to execute tests with a single host before changing entire vSphere clusters.
If you are architecting a VDI solution on vSphere you may now decide go go further than 10%-15%, but as always, run your own tests before implementing the solution. No architecture or calculation is substitute for a good PILOT.