«

»

Nov 01 2010

Increase VDI consolidation ratio with TPS tuning on ESX

In ESX, the VMware Kernel scans the guest physical memory pages randomly with a base scan rate specified by Mem.ShareScanTime, which specifies the desired time to scan the virtual machine’s entire guest memory. The default value for Mem.ShareScanTime is 60 minutes.

This means that virtual machines will have their physical memory pages scanned within 60 minute timeframe. This is done by the hypervizor to search for common block pages across the host physical memory footprint. Common memory blocks are seen as page sharing opportunities (TPS) and are merged in memory to reduce host memory footprint.

The 60 minute magic number is an optimal value defined by VMware that will offer marginal or minimal CPU impact. However, is this number optimal for server workloads, virtual desktop workloads, or both?

 

VDI workloads fundamentally differ from server workloads. I would like to highlight couple key points:

 

BootStorms: because all virtual desktops run the same GuestOS, during a bootstorm the common blocks are even more similar than during bootstorms for various server workloads. Now, because ESX is configured to scan for common blocks over the period of 60 minutes TPS will not provide major benefits during a bootstorm, and the host will potentially need to swap memory to disk if all GuestOS use the allocated memory at the same time.

GuestOS and Application: Just like during bootstorm there are a large number of memory block communalities across all virtual desktops that not necessarily will be picked up by TPS in time for block merge.

User Interaction: There is a coherent and predictable use pattern for virtual desktops. Yes, users are more alike than you imagine and they will likely fire up and use applications around the same time. As an example, an employee gets into the office at 8:50am power on his computer/laptop/thinclient and connects to his virtual desktop session. If you setup VDI to logoff users automatically to save power the user will login and start outlook. While outlook is downloading messages from the server the user goes to the cafeteria and get a coffee – chat little bit with colleagues – and they all get back to their workstations at the same time to fire up their financial application.

The interaction pattern is very similar for most users and this pattern offers a great deal of common memory blocks. However, if TPS is not fast enough to utilise this communalities the hypervisor will be losing opportunity to reduce memory footprint.

 

In general virtual desktop environments are memory constrained, not CPU, and any memory savings could improve the consolidation ratio. Because of the reasons above I have been testing Mem.ShareScanTime set to 10 minutes. Initially I thought that there would be large VMkernel CPU overhead, but it proved to be negligible.

Using the setting above I was able to achieve TPS savings at the order of 75% for WindowsXP guest OS running IDLE.

Because most virtual desktops were running IDLE during the tests this is not a valid exercise or scientific benchmark. During tests I observed that TPS savings increased on average 30% just augmenting the number of memory scan cycles per virtual desktop.

The image below depicts 12 virtual desktop running with 1.5GB RAM and achieving 13.7GB TPS savings. This is 74.4% savings.

 

clip_image002

 

The image below depicts 12 virtual desktop running with 1.5GB RAM and achieving 16.6GB TPS savings. This is 90.4% savings.

 

image

I suggest that you do your own tests with one of your VDI hosts and compare the results with other hosts on the cluster. Mem.ShareScanTime is set per host and can be defined through vCenter Advanced settings or rCLI.

 

clip_image004

 

It is also possible to change the maximum number of page scan rate in MB/sec per GHz of Host. This would allow you to augment even more the number of pages processed. I have not done tests changing Mem.ShareScanGHz.

If you decide to run your own tests I would be interested in your results for comparison.

10 comments

3 pings

Skip to comment form

  1. Marco Broeken

    I will test this with 140 desktops

    plz wait for results 🙂

  2. Matt Liebowitz

    I really like this idea but wonder how often it will actually be helpful. With modern processors and ESX 4.x, TPS is only actually used once you over commit memory. In both of the screenshots above the memory in the host was not over committed. So if that were a modern processor without large pages disabled then TPS wouldn’t even start sharing memory and you would see no memory sharing benefits.

    In my experience most environments do not over commit memory in production with the exception of VDI workloads. So your example and use case does make sense. I just want to make sure everyone understands that if you don’t over commit memory then TPS will have no effect.

  3. Andre Leibovici

    Great to hear Marco,
    Let me know how your test goes.

  4. Andre Leibovici

    Matt, you are spot on on blade types. Most VDI environments are overcomitted anyway, so it would make sense to reduce scan time.Now, every workload is different and I recommend testing prior to make the change full scale.

  5. Riste Ristevski

    @Marco Broeken

    How was your test, would really like to know how your test went. I have 120 VDI in my setup.

    Thanks

  6. Andre Leibovici

    @Riste Ristevski
    I actually didn’t have time to run the tests yet. As soon as I can I will run it and publish the result.

  7. Riste Ristevski

    I have this setting on all 5 of my Host’s running since my last post and i like the results

  8. Andre Leibovici

    @Riste Ristevski
    Thnaks for the feedback. Good to know that the article helped you.

  9. Riste Ristevski

    Everything looks good. Haven’t had any issues

  10. Riste Ristevski

    I have 229 virtual computers 13 Servers between 4 locations

  1. Page Sharing ? VMware vSphere - chmv | chmv

    […] ????? ? ?????. ? ????????? ?????????????? ??????: Memory management. […]

  2. myvirtualcloud.net » A VMware View POC with XtremIO

    […] the past I have written about TPS, Large Memory Pages and your VDI environment andIncrease VDI consolidation ratio with TPS tuning on ESX. I recommend tweaking Mem.AllocGuestLargePage and Mem.ShareScanTime for higher consolidation […]

  3. Virtual Server Consolidation Ratio – KVM VPS

    […] Increase VDI consolidation ratio with TPS … – I really like this idea but wonder how often it will actually be helpful. With modern processors and ESX 4.x, TPS is only actually used once you over commit memory. […]

Leave a Reply