For the past weeks I have been extensively testing and validating ILIO technology developed by Atlantis Computing. I have briefly mentioned ILIO in my articles How to offload Write IOs from VDI deployments and EMC FAST Cache effectiveness with VDI.
Atlantis Computing is a privately held startup. ILIO is a storage virtual appliance with remarkable capabilities, allowing administrators to boost storage performance for virtual desktop infrastructures. Key benefits are Storage Optimization, Inline IO De-duplication, IO Optimization and Caching.
According to SearchVirtualStorage – Atlantis Computing claims its first version of Atlantis ILIO could offload up to 70% of the writes and 90% percent of the reads of VDI IO workload hitting the storage array, and Version 2.0 improved the writes offload performance by an additional 20%.
I have innumerous times reasoned about the impact of write IO on storage arrays for VDI deployments and How to offload Write IOs from VDI deployments. If ILIO is able to offload up to 70% of the write IO, than storage requirements would either be smaller, or storage arrays would be able to sustain more virtual desktops using the same capacity/performance threshold. Below is a simple theoretical illustration of the above. It’s theoretical because there are a number of variables that could affect the numbers upwards and/or downwards.
If you follow my blog articles you probably know that I have delved into the technology and ran multiple tests to validate Atlantis claims. Yes, I have… I will share a little bit of what I have found. I will also discuss what the good use cases are for the technology.
I usually start my tests in my home lab and then when I have a defined testing methodology I expand the tests to another set of hardware when required.
The first validation test included VMware vSphere 5.0 and VMware View 5.0 with two Windows 7 Linked-Clone virtual desktops running in parallel with exact same configuration and resource limits.
For storage I used my iOmega IX4, which is not a hardware that you will want to run VMs in it. The IX-4 has only 4 SATA hard drives, and in my case they are only 5400 RPM.
The workload was generated by IOMeter with a configuration that mimics a VDI workload. IO Size = 8K, 100% Random IO, 20% Reads, 80% Writes, and the working set file was set to 1GB.
See what happens in the video below!
This video is available in HD and I recommend watching in HD
Ok, now that I’ve got you excited about the technology let’s understand how ILIO works.
ILIO sits on the storage IO path creating and creates a VMDK on LUNS or NFS exports presented by hypervizor. This VMDK file is then mounted by ILIO appliance and presented as NFS export back to vSphere hosts. (ILIO can be deployed per host or in a top of the rack architecture, but I will not get into deployment models here).
IO Caching is done in RAM, precluding IOs from hitting the array when possible.
ILIO understand NTFS file systems and is able to understand if a given IO is part of a Windows DLL, temporary file or Windows swap file, and will treat them differently. As an example, a DLL constantly accessed may be cached for further use, whereas IO that is part of Windows swap may not be put in RAM cache.
ILIO also has an IO coalescing feature that attempts to create large sequential IO blocks to allow efficient disk writes. Most intelligent storage arrays have some sort of IO coalescing mechanism. I’ll get back to that.
The last key feature is the Inline IO De-duplication that performs de-duplication in real-time before IO transactions reach the storage fabric. Because the de-duplication occurs before the IO hit the storage array the load on the array storage processors and spindles is reduced. At the same time, because no duplicate data is written to disk, post-process de-duplication is not required, and the associated IOPS cost is never incurred.
Moving to a more Robust Test Platform
To create valid tests it is important that a baseline is always maintained. In this case, during tests, the host used was a Cisco UCS B200 with 48GB RAM and two Quad-Core CPU’s at 2.526MHz. 200 linked-clone virtual desktops running Windows 7 with 1GB RAM were created.
The workload was generated using LoginVSI with the Default Heavy profile. LoginVSI is a workload simulator that targets more CPU and RAM utilization than storage IO. However, I still one of the best tools around to simulate a real VDI workload. Other tools such as View Planner and RAWC can also be used for this end.
ILIO was configured in a top of the rack deployment model with a single ILIO appliance serving hosts and all 200 virtual desktops.
The storage array used was an EMC VNX5500 with FAST Cache enabled. The array was dedicated to ILIO and no other workload was during the tests. A RG/LUN was configured with only 5 SAS 15K disks drives and this same pool of disks server the tests with ILIO and without ILIO.
The same workload was generated for all tests with and without ILIO appliance. In fact I have ran the workload with ILIO multiple times to be able to tune ILIO appliance to efficiently work with the EMC VNX5500 array that used during the test. The default mode used by ILIO is good for slow SAN/NAS because the internal scheduler will sample IO service time and decide how much scatter/gather should be done. But the VNX5500 with FAST Cache is definitely not a slow array – so ILIO was good at eliminating peaks when latency creeps upward but during steady state when latency is good the scheduler thinks it best not to perform any IO coalescing. This behavior can be observed in the graph bellow (ILIO).
With help from Atlantis CTO Chetan (@chetan_) I was able to tune ILIO. The tuning helped to get the system to enable a feature called write scatter gather on the ILIO appliance – this allows ILIO to coalesce IO on the ILIO’s vScaler component which provides an intelligent NTFS caching and IO processing. The IO coalescing can be performed at a different levels in ILIO – in the ILIO IO scheduler primarily, but also on the vScaler.
When the scheduler was performing the coalescing, we found out that the VNX5500 with FAST Cache did not really have latency problems (Duh!) so the scheduler was being lazy and sending IOs without merging them or gathering them.
With the tuning complete I started to see aggressive IO coalescing happening and the number of IOs hitting the storage array drastically dropped from an average of 1785 IOPs to 242 IOPs (ILIO_NEWCONF).
By enabling scatter/gather on the ILIO vScaler – all writes are forced into coalescing and get re-arranged on the vScaler before being sent to SAN. This means that the ILIO tends to keep some uncommitted IOs in memory and therefore needs to be protected by having some form of non-volatile cache. Typically FusionIO card, NVRAM card, SLC SSD or a good MLC SSD is used. ILIO deployments with slower SANs or in environments with high storage latency would not require the use of non-volatile cache devices.
For non-persistent desktops maybe it’s not that important to be crash consistent should ILIO or the underlying hardware fail. Administrators should be able to recreate the desktops. In a persistent desktop scenario perhaps one ILIO appliance per host may provide better availability. High availability can also be provided through the use of vSphere Fault Tolerance, however this scenario would require also additional underlying physical hardware.
The graphs below demonstrate the IO size change after IO optimization techniques. Since the majority of IO time is spent doing physical positioning and seeking on the disk, large write IO can greatly reduce disk access time.
From all the results probably the below Disk Utilization % is the most impressive. The blue represents AVG and MAX % utilization without ILIO; and the green represents AVG and MAX % utilization with ILIO using the configuration tuned for the VNX5500 with FAST Cache. The graph below demonstrates the true power of ILIO optimizing, caching and coalescing IOs before they hit the storage array.
Important Note: Numbers and graphs in this article are a representation of the delta and volatile part of Linked Clone virtual desktops. Replica disks were placed in a dedicated datastore outside ILIO appliance.
Each ILIO appliance requires 22GB RAM for approximately 65 concurrent VDI users. Additional users comes at a 150MB cost per user for a maximum of 200 users per appliance. A full appliance for 200 users require approximately 42GB RAM.
For a top of the rack architecture multiple virtual servers may be dedicated to ILIO (commonly entire physical servers are dedicated). If an ILIO appliance per host is the chosen architecture additional RAM per host is required, reducing the total consolidation ratio. Either way ILIO will use reasonable amounts of RAM resources – CPU not so much according to my tests.
I have performed lab tests to determine ILIO capabilities but I have never extensively used the solution in production environments and I am not able to talk about its reliability and stability. During my tests I have not incurred into any issues.
ILIO provides a synergetic complement to traditional storage arrays offloading IOs and helping organizations leverage existing investments while allowing for higher VDI consolidation. For newer storage arrays that support features like Automated Storage Tiering or make use of Solid State Drives the decision to implement a solution like ILIO to reduce the impact of IOs can be a little bit more complex and down to $$. Overall, adding ILIO to the architecture can provide a viable alternative to reduce the number of drives and offload backend IO’s. It’s great to see new products launched to benefit virtual infrastructures – Is it right for your architecture? The choice is up to you.
Below you can find the raw data for all the information published in this article.