«

»

Aug 03 2011

The biggest Linked Clone “IO” Split Study – Part 1/2

Advertisement

Note: Part 2 of his article can be found here.

In my article Get hold of VDI IOPs, Read/Write Ratios and Storage Tiering I discussed the importance of understanding the virtual desktop IO pattern in VDI deployments. On the same article I briefly discussed the I/O split between Replicas and Linked Clones.

The main idea is that at moment ‘zero’ after the creation of a linked clone, the Replica disk is responsible for 100% of the Read IO, and the Linked Clone disk (delta) is responsible for 100% of the Write IO.

When Windows boot for the first time, is customized, and users start to use the desktop, the Linked Clone disk will have increased Read IO, not only Writes.

The concept is easy to understand since we know that data that has committed to Linked Clone disks will eventually be read. Please note that Replica disks will always be 100% Read IO.

When I read performance benchmarks stating that a virtual desktop require 30 IOps I ask:

  • What percentage of those 30 IOs are Read and what percentage are Write?
  • From Reads IOs how many are hitting replica disk and how many the Linked Clone disk?
  • If Persistent Disks (Old UDD used for User Profile) are in use, how many IOs are hitting them?
      Those are difficult questions to answer without running a Pilot. In fact, most of the Pilots I have seen do not even get to this level of detail. However, without these answers  it is impossible to properly size and architect storage arrays for performance without adding extra  fat to the solution. (

FAT = $$$)

    Technologies such as DRAM Cache and EMC FAST Cache enormously help to diminish IO contention. However, those technologies must be put in consideration only after you have determined the real number of IOs per storage tier.

Storage tiers are used by VMware View to separate disk types in a VDI environment, and they often provide different RAID types for various performance or redundancy objectives. VMware View allow storage tiering for Replicas, Linked Clone and Persistent disks.

The picture below illustrates a virtual desktop scenario where the the overall Read/Write ratio of 60/40 has different ratios across each disk type.

 

clip_image004

 

These are the fundamentals behind a project I decided to execute myself. I wanted to find out how much of the IO pattern change overtime as virtual desktops gets utilized and Linked Clone disks grow in size.

 

Architecture

 

First and foremost I need to explain the architecture I am using to simulate and run the tests. The important architecture definition is the GuestOS. The number of Read/Write IOs is different between Windows XP and 7; and different between 32 and 64 bit versions.

I have chosen Windows 7 64bit as I believe that’s what new deployments should be using.

The amount of vRAM and vCPU may also alter the IO pattern. I have chosen to use 2GB RAM and 1vCPU as I believe this is the most common configuration out there.

  • Windows 7 64bit
  • 2 GB vRAM
  • 1 vCPU
  • 40GB Disk (Approximately 10GB used in the NTFS partition)

The applications installed on the Parent VM also help to shape the IO profile of the virtual desktop, independent if they are in use or not. For these tests I have chosen to install Microsoft Office 2007 64bit only.

To make sure analysis and measurements are valid and accurate I created dedicate datastores to host a single Linked Clone virtual desktop. This configuration is important to avoid IOs intrusion from other virtual machines.

Another important piece was to remove the Windows user profile IO from the Linked Clone Disk. This was accomplished with the creation of a persistent disk (UDD) and placing it on a dedicated datastore.

Ideally I would have created an additional Disposable disk for Windows Temporary Files; however VMware View does not allow placement of disposable disks in a separate datastore, other than the linked clones datastore. For the desired end results there was no benefit in creating a Disposable disk.

The image below demonstrate how the test linked clone virtual machine was configured and also identifies the three datastores used. Each datastore uses a different LUN.

image

 

In VMware View the Datastore selection screen was set to:

image

 

Analysis Tools

 

Tree tools were used to collect and analyze the IO: vCenter Performance Monitor, Veeam Monitor and Unisphere Analyzer.

 

Replica Creation Statistics

 

Replica disk creation is a one-off operations and should not cause IO burden unless if created during production hours in an IO constrained environment.

For the Parent VM with the configuration described above was required an average total of 10,654 Write IOs to generate the Replica. The average peak IO was at 412 IOps.

image

 

Creating Replica disks is a write intensive operation. However it is also Read intensive on the original Parent VM. I have not collected the number of Read IOs at the Parent VM for this test but I recommend to have Parent VMs in a storage pool that can provide the same number of IOs used for creating the replica disk.

 

Virtual Desktop Customization and Boot Statistics

 

The initial objective for this study was not to identify IO during boot time; however as I was already doing the monitoring Why Not?! I have seen several boot storm studies but I never really found any of them conclusive because none of them went to the level of detail I’m covering here.

Please, keep in mind that this is the IO for a virtual desktop with configuration specified above and the IO for desktops with different configurations will always be different. The aim of this article (more like a whitepaper) is to demonstrate how IOs should be used to help sizing VDI solutions. However, the number below can be used as a baseline.

 

PowerOn, Customization and 1st boot generated an average total of:

  • 2192 Replica Read IO
  • 549 Linked Clone Write IO
  • 203 Linked Clone Read IO

Average Peak IOs happened at:

  • 640 Replica Read IO
  • 93 Linked Clone Write IO
  • 51 Linked Clone Read IO

An averaged total of 2944 IOs to have a linked clone virtual desktop ready for use by users

      . In a View Composer environment this values are applicable to all virtual desktops.

Attention

    ! This is not the so called bootstorm and happens only once in the virtual machine lifetime, as long the virtual desktop is not deleted after use.

 

image

 

After PowerOn, Customization and 1st boot the virtual machine was shutdown. A 2nd PowerOn is used here to identify the real number of IO required to boot the virtual desktop without any customization process. Attention! This is the so called bootstorm and will happen every time the virtual desktop is powered on again.

The 2nb boot generated an average total of:

  • 839 Replica Read IO
  • 108 Linked Clone Write IO
  • 83 Linked Clone Read IO

The average Peak IOs happened at:

  • 472 Replica Read IO
  • 53 Linked Clone Write IO
  • 51 Linked Clone Read IO

An averaged total of 1,030 IOs were necessary to get the virtual desktop ready for use after reboot.

image

 

You now should know exactly what your storage array will require to provide from a IO perspective from each storage tier.

For 100 virtual desktops booting at the same time the array would need to be able to respond to approximately:

  • 47200 Replica Read IO
  • 5300 Linked Clone Write IO
  • 5100 Linked Clone Read IO
    * The numbers above assume no storage latency.
    It’s not possible to boot all virtual desktop at the same time. VMware View will only boot 5 virtual desktop simultaneously using Default settings. This value can be changed.
      If you accept the

default

    five simultaneous PowerOn operations it’s necessary 2,360 Read IO from the Replica tier, 415 Read IO from the Linked Clone Tier and 265 Write IO from the Linked Clone Tier.

As virtual desktop boot they will start to become ready to be used and users will start log in. For this reason the calculation of the total number of IO required is a hybrid between the number of desktops the may boot simultaneously, plus the number of desktops in logon process, plus desktops in steady idle state.

If desktop pools are configured to refresh or delete desktops after use, that should also be part of the calculation.

 

Persistent Disk and User Profile Statistics

 

To make the IO assessment accurate the Windows User Profile was redirected to another .vmdk through the use of Linked Clone Persistent Disk. The Persistent Disk was also stored in a dedicated datastore for better IO tracking.

Because user profiles will often have different sizes and IO requirements I decided to collect data only for Virtual Desktop Customization and Boot Statistics. User Login is not under consideration.

Based on the virtual configuration described above the numbers below are consistent across different desktops.

 

PowerOn, Customization and 1st boot generated an average total of:

  • 64 Persistent Disk Write IO
  • 51 Persistent Disk Read IO

Average Peak IOs happened at:

  • 43 Persistent Disk Write IO
  • 20 Persistent Disk Read IO
    In a production environment you may choose not to use Persistent Disks. In this case the IOs demonstrated here are not applicable. However, upon user logon IOs will be generated against linked clone disk instead of persistent disk.
    If Roaming Profiles are in use the IOs are still applicable and will be generated against the linked clone disk.

 

    Summarizing the Numbers

The best way to understand how many IOs are required for PowerOn, Customization and 1st boot is to find out the averaged maximum IO per datastore. The reason for that is that each storage tier will require different performance.

      For this specific virtual desktop configuration the baseline for

PowerOn, Customization and 1st boot

    process is:

image

The same numbers from the picture above can be demonstrated in a percent style per storage tier. I think this spreadsheet gives you great visibility of what is happening with the virtual machine during creation process.

The bottom part of the spreadsheet demonstrate those five default simultaneous PowerOn operations allowed by VMware View.

 

image

 

Now you understand the importance of sizing storage tiers independently.
Don’t blindly accept the same old 15 IOPS per VM anymore.

 

This is the Part 1 of 2 of this article. Part 2 discuss IO trending as a Linked Clone delta disk grow. I personally have not seen any reports or white papers on this subject; however I may be wrong. If you know about the existence of such documentation please forward to me or let me know.

Your comments are very much appreciated.

Note: Part 2 of his article can be found here.

Similar Posts:

Permanent link to this article: http://myvirtualcloud.net/?p=2084

16 comments

5 pings

Skip to comment form

  1. Matt Cowger

    Andre,

    I think we are missing something here….people dont say to expect 15 IOPs for a VDI VM because thats what they see during the boot, they say to expect that because its the average over a longer time than the boot.

    The measurement time frame you included for your analysis was only during that second (boot storm, 9AM-I-just-got-to-work) IO workload, not the average over the day. Technologies like FAST Cache and regular old DRAM cache are intended to absorb that quick spike in load, while when you do *disk* sizing you use the amortized performance over time (which tends to be the lower number like 15 IOPs).

    Maybe you should split up your analysis between boot time and average (post boot) time. Not sure what you would use to generate user-style workload over time, but I’m sure we could work something out.

    Thoughts?

  2. Andre Leibovici

    @Matt Cowger
    Thanks for your comment.

    You are absolutely right about the 15 IOPs being the averaged utilization over a longer period. My intention is to educate people about finding out the correct number of IO per storage tier while catering for customizations and bootstorms properly.

    For the part 2 of this article I have been using a workload generator that simulates user activity (work, excel with reads and writes). I still need to put all the numbers and thoughts together, but it will be sooner than later.

    FAST Cache and regular old DRAM cache definetely improve performance, but I prefered to live it out of the equation for now. I may write another article runing same numbers from a storage array (spindle/block) utilization perspective.

  3. Pieterjan Heyse

    Andre,

    glad to finally see an article on this subject. In january I asked a VMware technician about this subject and I never got a definitive answer. SSD is nice to offload some of the SAN load to the hosts, but over time more reads go over to the SAN (persistent disk & linked clone) and the SSD will be less of a use, more importantly the SAN will see more IO’s.

    My question was, how fast does this happen, when should we refresh, and now you are giving us the answer, thanks!

  4. Barrie

    Hi Andre

    Don’t forgot to throw in some anti-virus for good measure (not the VMware VMsafe API type, as not all vendors do that yet) – configured for both read and write on-access.

    This will come close to doubling the required IOps in my experience.

  5. Nicholas Schoonover

    Great info, thanks for sharing. I don’t see any mention of the Windows image’s pagefile, but I’m assuming that it was left as default and explains a lot of the write activity. Many tuning recommendations that I’ve come across recommend disabling pagefile or making it a much smaller fixed size vs default auto-sized at the very least.

  6. Andre Leibovici

    @Nicholas Schoonover
    Thanks for your comment.

    I left the Windows Pagefile as default. Despite changing the Windows pagefile could have impact on number of IOs, keep in mind that Windows is write intensive by nature.

    The objective of this article is not to find the number of IOs, but to educate about finding the right numbers in your environment.

    Andre

  7. Andre Leibovici

    @Barrie
    Thanks for the note.
    You are absolutely right, having AV could drastically change the numbers. AVs on average will do 4 writes for every 1 read IO.

    I may include AV in a future article.

    Andre

  8. Andre Leibovici

    @Pieterjan Heyse
    Thanks for the feedback.

    Remember that if your array make use of cache, the chances are that the entire Replica disk will be served from cache, instead of SSD.

    Andre

  9. Rob

    Its a good read and really justifies why SSD should be used for replica as well :)

  10. Ray Brighenti

    Great read!

    Out of interest did you have Superfetch disabled in the Win7 image?

    Ray

  11. Andre Leibovici

    @Ray Brighenti
    Yes, Windows 7 Superfetch was disabled for the tests. Actually, I have used the Powershell script VMware provides to optimize WIndows 7 images for VDI.

    Andre

  12. gogogo5

    Hi Andre – firstly, great blog with loads of useful information here.

    Was reading this article but wanted to ensure my understanding of your calculations is correct. Under the Boot Statistics section you say:

    For 100 virtual desktops booting at the same time the array would need to be able to respond to approximately:

    4720 Replica Read IO
    530 Linked Clone Write IO
    510 Linked Clone Read IO

    But for 100 virtual desktops would it not be 472 x 100 = 47,200 IO? (and therefore 5300 Linked Clone Write IO and 5100 Linked Clone Read IO). This is assuming you could boot all 100 concurrently as you stated.

    Cheers
    gogogo5

  13. Andre Leibovici

    @gogogo5
    You are absolutely correct, good catch!
    I have amended the results on the article.

    Thanks,
    Andre

  14. Mike Bello

    Hi Andre,

    Thank you for sharing your research. What is the size of a single IO?

    Mike

  15. Andre Leibovici

    @Mike, the size of the IO varies according to the OS and application. Typical user workload in Windows will have IO size between 4KB and 64KB. Some backup tools and sequential operations may use 128KB and even 256KB.

  16. michaelbello

    Andre,

    Thanks for the answer. Let me follow up. You mention 839 Replica Read IO for the boot in your note. It cannot be 4MB only, can it? I would expect around 1GB of data pumped into the RAM during the boot up. Am I wrong?

    Mike

  1. - Cliff Davies

    […] For other configurations please refer to The biggest Linked Clone “IO” Split Study – Part 1/2 […]

  2. Twitted by VirtuallyUseful

    […] This post was Twitted by VirtuallyUseful […]

  3. Ample RAM for your VM. | Demitasse

    […] second part of his linked clone IO study and it makes interesting reading.  The first part is here and the second here they are well worth the […]

  4. myvirtualcloud.net » VMFS File Locking in VMware View 5.1 [WhitePaper]

    […] If you want to better understand the IO behavior of Linked Clones I recommend reading my article entitled  The biggest Linked Clone “IO” Split Study. […]

  5. myvirtualcloud.net » View Storage Accelerator and View Storage Tiering [Unsupported]

    […] in the past that during SteadyState operations the replica disks are not utilized very much (here and […]

Leave a Reply