My colleague Kees Baggerman published a brilliant post on how Nutanix Shadow Cloning not only improve performance for Horizon Linked Clones and XenDesktop MCS, but also hugely improve performance for VMware AppVolumes. I recommend reading his article here.
In his single VM test Kees Baggerman saw an improvement of almost 360% for Adobe Reader through the use of Shadow Cloning and data locality.
I would like to shed some light on how important these performance improvements are in the context of distributed storage architectures.
The performance improvements seen by Kees were due to Nutanix being able to analyze the IO access pattern for the AppVolume vmdk and determine that the vmdk was serving only read requests. Based on that Nutanix create local copies, in RAM or SSD, of the vmdk for each server accessing the AppVolume stack.
If you look at the picture below, where I have a large cluster running VDI with hundreds or thousands of virtual desktops you will notice that all desktops access the same AppVolume vmdk (e.g. MS Office 2013) hosted and managed by node number 1.
In a distributed storage architecture the vmdk is distributed across servers (the methodology for how the vmdk is distributed varies according to the solution), but there is commonly a single server actively serving the vmdk. In this case the server number 1 is actively serving the vdmk.
These operations not only can saturate the network with unnecessary operations but also overload the active serving vmdk server with requests, throttling resources.
What kills network performance is not latency but congestion. Congestion can come in many forms – microbursts (200Mbps burst for 10ms, which equates to 20Gbps equivalent of traffic on the 10G port for that 10ms, resulting in high traffic getting dropped if switches do not have enough buffer), or for e.g. a misbehaving NIC sending PAUSE frame and slowing down the network.
Caching is also extensively used by distributed storages. Caching can use SSD or RAM and it uses ingest and eviction policies according to data access and capacity allocation. Once data is evicted from cache it needs to be once again read from server number 1 over the network, when required. VDI environments commonly have very random IO pattern causing data eviction to be very frequent, especially if no data de-duplication is available. Performance of virtual desktops suffers when the active dataset can no longer fit in premium storage tiers, and that’s how data de-duplication help performance.
Shadow Cloning do not allow data eviction to happen. This approach completely eliminate the need for communication over networking to re-load data. The data is only migrated on read as to not flood the network and allow for efficient cache utilization. In the case where the AppVolume vmdk is modified the Shadow Clones will be dropped and the process will start over.
Hopefully now you understand how AppVolumes and Shadow Cloning provide better performance and resource utilization. Now, visualize a VDI environment without Shadow Cloning with 1000+ users and the chaos to provide access to a multitude of AppVolumes stacks across the network on a consistent basis with cache ingestions, evictions and the network congestion.
In a next article I will introduce you to AppVolumes stack replication to different sites and datacenters. Nutanix is one of the only solutions that is able do vmdk based backups and replication, instead of virtual machines.
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.