«

»

Jul 20 2010

Improve VDI performance with IO Length Trending

Advertisement

In a VDI environment each CPU cycle, megabyte of memory or disk IO that can be reduced may represent considerable performance improvement when the workload is characterized by hundreds or thousands of desktops. Disk alignment is a topic that has been discussed to exhaustion, but the use of the correct block/cluster size for your workload is something that will drastically improve your VDI disk IO performance.

Have you ever analysed your VDI IO length trend?
Your answer is most likely NO, but you should.

I will start the discussion saying that to obtain best performance from the IO stack, all storage layers must be properly aligned and they must use analogous cluster/block block size. There is a good article from Duncan Epping, and there you will find a comment from Vaughn Stewart.

 

Storage arrays from other vendors store data in other block, or chunk, sizes. Say your array stores data in a 64 KB block. In this configuration if the GOS partition is misaligned then should a VM make a 4KB read request we will read a 64KB block. As I’ve stated before most data reads aren’t that small, so let’s consider a 1MB read request. In this case the array would retrieve 1MB plus an additional 64KB block. In this case the overhead on the array is around 1%. So if we consider my premise that many non-busy VMs make requests in the 32KB to 128KB range the overhead with a misaligned 64KB block would be between 200% and 50%.

 

I have performed IO Length trending analysis on different VDI environments to understand how virtual desktops are issuing IOs. What I found is quite contradictory to what I have been reading. Some documentation, including VMware ones, mention that Windows XP uses 64kb block IO request in it’s majority, but that is not what I was able to observe.

Three different samples collected from Windows XP 32b using vSCSIStats tool had the majority of the IOs issued with 4kb length. That’s is not hard to understand as Microsoft’s The Default Cluster Size for the NTFS and FAT File Systems article make the following mention:

 

Drive size
   (logical volume)             Cluster size          Sectors
   ———————————————————-
     512 MB or less               512 bytes           1
     513 MB – 1,024 MB (1 GB)   1,024 bytes (1 KB)    2
   1,025 MB – 2,048 MB (2 GB)   2,048 bytes (2 KB)    4
   2,049 MB and larger          4,096 bytes (4 KB)    8

 

The results collected were as follows:

image

image

image

 

Disk alignment is important but the use of the correct block size for your VDI needs is essential to achieve the ultimate IO performance from your storage array. I recommend you to perform the IO Length trending analysis using vSCSIStats tool or the tunning tool provided by your array vendor to determine the best cluster size for your VDI workload.

The one size fits all doesn’t apply here and that is what most storage vendors will try to fit you in.

In my examples the majority of IO lengths are 4kb. If the array volume was created, let’s say, with 64kb blocks, each IO request will require the storage array to read an extra 60Kb to satisfy a 4kb request. Seems like a waste of reads/writes, doesn’t?

The good thing is that most arrays will allow you to specify volume block size during creation time based on the RAID type you are using.

Windows 7 might have a complete different set of IO lengths and that’s what I will sample next.

Similar Posts:

Permanent link to this article: http://myvirtualcloud.net/?p=988

14 comments

2 pings

Skip to comment form

  1. Richard Powers

    Greetings,

    Thanks for the article

    This is something I’ve always wondered about. What part does VMFS block size effect performance on Block storage ? Also you said the most arrays allow you to change the blocksize. Would you get the same result from change the block size on the VM?

    Thanks

    -Rich

  2. PiroNet

    Many arrays work with 64KB block size. VMFS uses also sub block of 64KB even thought you format the VMFS datastore with a 8MB block size. So I guess your VDI has to follow the same logic and format the NTFS partitions with a 64KB AUS. This rule goes also for other non-VDI, such Windows Server 200x VMs…

    I’m wondering what the guys at PQR have to say about that!?

  3. Andre Leibovici

    @PiroNet
    Well pointed!

    Small files use sub-blocks instead of file blocks. A sub-block is 1/16th the size of a file block on 1 MB file block-size volumes, and 1/128th the size of a file block on 8MB volumes. That translates into 64kb blocks.

    Would be interesting to see the results for the same trend analysis using all layers aligned with 64kb blocks. (partition disk alignment, partition block-size alignment, VMFS alignment, VMFS block size alignment, Array block-size volume alignment)

  4. John Martin

    I was confused about this too as the documentation seemed a little confusing. In the end I did a bunch of experiments on NetApp’s sydney lab infrastructure and found that a 4KiB block request from the guest resulted in a 4KiB block request at the back end. Misalignment for reads did have an impact, but overall the impact of misalignment on writes (I covered this in fairly great detail here http://j.mp/bGdXii).

    For traditional logical arrays, block alignment all the way through from the application to the storage can have a reasonable impact. On a NetApp array as long as you’re aligned on 4K boundaries you’re fine, though if you do a majority of 8K read requests (or multiples thereof), there is a performance optimisatoin called extents that might be worth investigating.

  5. Chad Sakac

    Disclosure – EMCer here.

    Andre – alignment matters with all arrays. And, the IO size through the VMware subsystem is generally the IO size of the guest. There’s a small amount of coallescing through the VMFS stack, but not much. For all intents and purposes, the IO sizes hitting the array are the IO sizes of the guests – no more, no less.

    I’ve been going back and forth with various peers at various vendors. I’m starting to think we all ought to standardize on a 1MB alignment as a best practice default.

    we all generally do well with a multiple of 4K as the alignment value – and each array has a “sweet spot” – but of course, aligning is something that really is best done as a one-time task (realigning is a huge PITA). Following the best practices of any one vendor and then changing vendors (and then having “non-optimal” alignment) sucks.

    1MB is a large value but giving up 1MB of usable space on a volume is not a big deal. 1MB meets most use cases and across storage vendors, which would be a good thing.

    Vaughn and I are going back and forth between EMC and NetApp on this – and I have broached it with others as well. Will see if we can make some headway to make it common across vendors.

    Good post – thank you for putting it out there!

  6. Andre Leibovici

    @Chad Sakac
    I will jump the discussion on the initial offset disk alignment as I believe we are all aligned on that.

    It appears that at VMFS level the hypervizor fetches 1MB, and not 64k as mentioned before. If this number is correct than I agree that Guest OS (NTFS) and Array should have 1MB block sizes for best performance. Are you able to confirm the size of the sub-blocks?

    @John Martin
    It’s also interesting yours and John’s comments:
    “4KiB block request from the guest resulted in a 4KiB block request at the back end”
    and “IO size through the VMware subsystem is generally the IO size of the guest”.

    Interesting to see how that translates to:
    GuestOS > VMFS SubBlocks > Array
    4kb > 1MB > 4kb

    It appears to me that if the majority of blocks are on 4kb region, the storage array will provide best performance if using 4Kb blocks.

    What I am still confused is how the array get same IO size as issues by the Guest if VMFS SubBlocks fetches 1MB.

  7. PiroNet

    PiroNet :VMFS uses also sub block of 64KB even thought you format the VMFS datastore with a 8MB block size. So I guess your VDI has to follow the same logic and format the NTFS partitions with a 64KB AUS.

    I was giving the impression that sub-blocks play a role within the guest’s VMDKs but that’s not the case and I used to misunderstand sub-blocks.

    If a file system on a guest’s VMDK reads 4KB of data, the host will read 4KB of data out of the VMFS datastore and not a whole block of xMB or sub-block of 64KB!

    Sub-blocks are used for files smaller than 1MB and directories managed by the VMFS and not within the guest’s file system. Example of such files and directories are the .LOG, .VMX .NVRAM, VM’s own directory, etc…

    I’ve just posted an article about that at http://deinoscloud.wordpress.com/2010/07/26/understanding-vmfs-block-size-and-file-size/

  8. Chad Sakac

    Andre – thanks…

    Perhaps it helps to think of it via an example:

    You have a VM, with a NTFS volume with a default allocation of 4K. It’s a SQL Server VM. SQL Server with an OLTP workload generally uses an IO size of 8K. That VMDK is sitting on a VMFS volume with the default 1MB VMFS allocation size. That in turn is sitting on an array with a 64K stripe size, and a 512byte block size. (this is a common example).

    While at the NTFS layer any **file allocation** will be in 4K sizes that are multiples, most of the IOs will be 8K against existing files (when you did a backup, most of the IO sizes would tend towards 256K or even larger).

    Those 8K IOs would make their way down through the ESX IO stack (in the form of little SCSI block commands). When they got to VMFS, unless they were a NEW allocation of a file (which would result in VMFS-level file allocations in 1MB IO sizes), the IO size would be still (for the VAST majority of cases) be 8K (the exceptions are the rare cases where VMFS would coalesce multiple write IOs).

    Onward down to the storage layer, the IOs would be come into the array as 8K IOs. In the worst case, this results in a partial stripe write, which means reading a 64K stripe, updating it with the new data. Most arrays do a lot more coalescing (grouping IOs) and various methods to eliminate RAID parity write penalties. In the end, that IO is done in the block size (usually 512 bytes + non-data content like soft error CRC info)

    Arrays do the idea of parity differently (example – in the NetApp case, everything is a function of the WAFL filesystem allocation size of 4K – and underlying block size that is also 512 bytes + non-data content), but the overall flow I described is pretty consistent.

    When I started in the storage business, this was confusing to me – long and short – don’t confuse file allocation size with IO size, or block size :-)

    Hope that helps!

  9. John Martin

    I was half way through a comment about the same length as Chads’ when a colleague forwarded me the following link to a blogpost written within the last 24 hours that explains this and other stuff really well. It’s better than anything else I’ve read on ths subject

    http://deinoscloud.wordpress.com/2010/07/26/understanding-vmfs-block-size-and-file-size/

    I sometimes wonder if this kind of thing is because of some kind of zeitgeist, co-incidence, or the internet version of reticular activation.

  10. Andre Leibovici

    @Chad Sakac
    Awesome explanation and thanks for taking the time to write that.

    It seems that this is the penalty associated with Thin Provisioning – “Unless they were a NEW allocation of a file (which would result in VMFS-level file allocations in 1MB IO sizes)”.

  11. idivad

    another great post. I came across it while doing some research to understand an issue i’m having; however, i’m still not sure if this post would be applicable to the issue i’m seeing on my environment. my block size are the same in both datastores…

    here’s a link to my post on the vmware forum
    VM image take up more space on my iSCSI than local disk
    http://communities.vmware.com/thread/417836

    thought i’d drop a comment here to see if someone would have an idea regarding my issue.

  12. Josh Townsend

    Andre – wondering if you’ve done any follow-up work on this for Windows 7? This EMC document suggests an 8k cluster size for Windows 7: http://www.emc.com/collateral/software/white-papers/h8043-windows-virtual-desktop-view-wp.pdf

  13. Andre Leibovici

    @Josh Townsend
    Unfortunately I have not. However, I know the team working on EMC reference architectures and I trust them. If you don’t have the time to run your own tests (recommended!) follow EMC guidelines.

    Andre

  14. John Martin

    @idivad

    When you did the copy, did you use a unix commands like cp ? If so you may have caused what was a sparse file on the source to become non-sparse on the destination. It looks like that’s what’s happened.

    If so check out the wikipedia entry on sparse files … if that solves your problem consider making a small donation to the wikimedia foundation.

    Regards
    John

  1. thevirtualnews.com

    myvirtualcloud.net » Improve VDI performance with IO Length Trending…

    In a VDI environment each CPU cycle, megabyte of memory or disk IO that can be reduced may represent considerable performance improvement when the workload is characterized by hundreds or thousands of desktops. Disk alignment is a topic that has been d…

  2. Storage Basics – Part IX: Alternate IOPS Formula

    […] Improve VDI Performance with IO Length Trending – read down through the comments (Chad Sakac has a good one here: http://myvirtualcloud.net/?p=988&cpage=1#comment-3620). […]

Leave a Reply