One of the major concerns in a VDI solution is the end-user experience. Poor user experience = Low user acceptance. I have discussed this paradigm in my article VDI USER EXPERIENCE and USER ACCEPTANCE.
Behind the scenes and hidden in the physical supporting infrastructure there are a number of factors contributing for possible poor user experience. The most common pain points are network and storage.
Networking is commonly an issue when dealing with remote offices/branch offices. However, storage may as well be the most common bottleneck in a VDI implementation if not designed properly.
In the past I have discussed storage IO pattern, read/write skews, boot storms, login storms, dedicated replica datastores, IO split, etc. I particularly recommend the reading of The biggest Linked Clone “IO” Split Study for detailed information on IO behavior in a VDI environment.
The graph below demonstrates the number of read and write IO on Linked Clone disks. The old saying “VDI is write intensive” can be observed in this graph.
IOPs and RAID type define the storage layout and the numbers of disks required. The RAID type will set the numbers of spindles required to support the workload based on the amount of IOPS and Read/Write ratio – especially Write IO given that RAID adds a write penalty that is dependent upon the RAID type chosen. RAID5 adds a write penalty of 4 whilst RAID10 adds a write penalty of 2.
VM IO = VM Read IO + (VM Write IO * RAID Penalty)
What practices can be used to offload the number of Write IOs from the storage array? IOs can be reduced at source, de-duplicated, single instanced or served from cache.
Storage vendors have been providing solutions that help to reduce the number of IOs hitting the spindles, therefore driving down the numbers of disks required. Those solutions range from flash drives for caching, automated storage tiering, flash RAM and others. The objective of this article is to discuss techniques that help to reduce or avoid that IOs hit the drives (spindles).
1. Windows Customization
Windows customization not only helps to reduce memory footprint and CPU cycles, but also reduce the number of IOs. There are a number of resources available to help you to customize master images.
- Mastering VDI Templates updated for Windows7 and PCoIP
- VMware View Optimization Guide for Windows 7
- Quest vWorkspace Desktop Optimizer
- Additionally I would recommend reading an article I wrote a
2. Application Virtualization
Application virtualization tools such as ThinApp, App-V or XenApp are a superb way to offload IOs from the storage array; mostly read IO. Application virtualization provide operational benefits that include easy of management, easy application upgrade, and easy applications rollout.
Many of the application virtualization products also allow for single instancing. This means that the applications will only exist a single time in storage. This differs from installing the same application for each virtual desktop.
Single instancing itself does not help to reduce the number of IOs required to serve applications since all users will be accessing that single instanced application. However, most storage arrays have the ability cache heavy accessed blocks in DRAM. Having application single instance allow storage arrays to maintain the application in DRAM cache or extended cache, serving data in a much faster fashion.
Picture (A) demonstrates storage cache utilization without application virtualization; picture (B) demonstrates utilization of storage cache with application single instancing.
3. .vswp offload to local storage
When designing VDI solutions on vSphere, one of the many ways to reduce shared storage consumption is to allow VM Swap files (.vswp) placement on host local storage. A .vswp file is automatically created by ESXi when the desktop is Powered On, and deleted when the VM is Powered Off.
Benefits of offloading .vswp file to local storage are the reduction of the storage footprint and the offload of read and write IOs from shared storage to local storage.
When offloading .vswp file to local storage it is important to make sure that local storage on the host is capable of providing the required number of IOs to support the virtual desktops on the host. It is recommended to use Flash drives for .vswp files on local storage.
If you are interested in how to implement .vswp offload read Save [VDI] Storage using VM Swap File Host Placement.
4. Antivirus IO Offloading (vShield Endpoint)
vSphere introspection capabilities through vShield Endpoint considerably reduce CPU cycles, memory consumption and storage IO. vShield Endpoint plugs directly into vSphere and consists of a virtual appliance (delivered by VMware partners), a driver for virtual machines to offload file events, and VMware Endpoint security (EPSEC) to link the first two components at the hypervisor layer.
Both McAfee and Symantec solutions required that a separate instance of the AV agent run in each virtual machine. TrendMicro Deep Security required one instance of its virtual appliance per host. Numbers and graph from Tolly Test Report #211101.
Note: McAfee MOVE does not use introspection but achieve similar results using the network layer to offload AV operations.
5. Storage Level Caching and Write Buffering
Read Caching make sure highly active data is served from Flash drives or RAM. EMC FAST Cache technology dynamically absorbs unpredicted spikes buffering the most accessed blocks in DRAM and Flash drives. The movement of the data is dynamic, near real-time, and at 64KB sub-slices chunks which is ideal for bursty data. Other storage vendors have different technologies that achieve similar results.
The biggest storage problem is the Write IO, aka Write Penalty. Write buffering can be done in couple different ways – at the host level or at the storage level. EMC FAST Cache also helps to accommodate Write IO in an extended cache if spindles are busy to deal with the write IOs. This technique alleviates the latency responsible for poor end-user experience and AFAIK is the only shared storage side caching solution that also alleviates Write IOs.
Tests with persistent desktops demonstrate the 9 out of 10 IOs may be served from cache using Fast Cache. If you are interested in finding more about EMC FAST Cache technology read this detailed paper.
6. Host Level Caching and De-duplication
Few start-ups have been creating industry furor with their technology and ability to boost VDI performance beyond what shared storage arrays can provide nowadays. One of the products that I have been paying attention is Atlantis ILIO.
Atlantis ILIO promises up to 90% reduction in IO load on storage infrastructure. ILIO achieve those numbers re-sequencing read/write operations from small random IO to large sequential, processing real-time IO locally from host memory and instantly characterizing IO based on Windows NTFS file system characteristics.
ILIO is a software virtual appliance that seats on each host or in a “top-of-rack” configuration where hosts connect to a single Atlantis ILIO virtual appliance. The storage is then presented to via NFS or iSCSI.
The base ILIO appliance uses 22GB RAM per host and support up to 65 virtual desktops. Additional desktops may be added at a 150MB RAM price tag. Because of the RAM memory used the VDI consolidation ratio is reduced, increasing costs associated to RAM, Hosts, and VDI licensing depending on the solution in use.
To be cost-effective Atlantis ILIO licensing and host RAM costs, plus any additional hosts, blade enclosures, and VDI licenses required, would have to have substantially less if compared to the cost of adding additional spindles to the storage infrastructure. I’m yet to do this calculation.
I have done some ILIO lab tests that look promising from a technology and performance standpoint. Does it make sense from a $/performance standpoint? I will be publishing the results in my next blog post.
For most VDI deployments the first 5 options will considerably reduce the number of IOs, consequently improving storage response times, reducing latency and increasing consolidation ratios. The option number 6 can be investigated in extreme cases where the number of IOps per virtual desktop is very high.
For the past few weeks I have been collecting data from my VDI Read/Write Ratio Challenge. Please, take a minute to help us understand the overall IO pattern of VDI deployments.