One of the Nutanix features that drastically improve VDI performance and end-user experience is Shadow Clones. I briefly discussed Shadow Clones in the past. However, before explaining how Shadow Clones work let’s recap how VMware View Linked Clones work (the same is valid for XenDesktop MCS).
To deploy linked clone desktops the administrator first creates what is called the Parent VM or Master Image. This is a virtual machine running the Windows GuestOS customized with recommendations for virtual environments.
After the Parent VM is assigned to the desktop pool in View a clone of this VM is created. This clone is called Replica. In old VMware View releases the replica VM would be created in each datastore hosting virtual desktops for the desktop pool (figure 1).
From VMware View 4.5 onwards administrators started to have the ability to specify a single datastore to host replica disks for the entire vSphere cluster. This feature was originally created to enable a small subset of SSDs to provide high-performance access to replica disks (figure 2) in a time where Flash SSD memory was very expensive.
After the replica disk is created VMware View starts creating linked clones. The linked clone is an empty disk that will grow overtime according to data block changes requested by the GuestOS. This disk is also called Delta disk because it accumulates all the delta changes, and it may grow to a maximum size set for the parent VM (here is where de-duplication is essential, but that’s a topic for another article). This replica disk is read-only and used as primary disk for all deployed desktops. Write I/O and data block changes are written and read from delta disks.
As I mentioned, it used to be a recommended practice to allocate Tier 1 storage for replica disks because all virtual desktops in the cluster use them as base image. With hundreds or even thousands of desktops reading from the replica disk the storage solution could quickly become a bottleneck if some form of caching was not used.
Enter converged infrastructures
Converged infrastructures provide highly distributed storage aggregating local HDD and SSD to deliver a high performing scale-out architecture.
Despite most converged storage solutions provide the ability to create multiple datastores, it is commonly accepted that a single large datastore is presented to all hosts, aggregating local HDD and SSD across every host in the cluster. The picture below detonates what this architecture look like from a logical perspective.
While the above is true, on the physical layer things are a little more complex and every converged storage solution will handle data distribution in a different way. However, due to data distribution, high availability and replication factor requirements every VM disk will have data blocks distributed across three, four or more servers in the cluster. The picture below detonates the physical architecture with data being distributed across nodes and disks.
The problem with such granular and distributed architecture is that every server needs to go out to the network to gather replica disk data blocks, and not a single server will have the entire replica disk, causing a enormous burden on the network, increasing latency due to network hops, and most importantly possibly clogging up with requests the three or four servers that maintain the distributed replica disks.
Enter Shadow Clones
Nutanix Shadow Clones allow for distributed caching of a particular disk or VM data, which are in a ‘multi-reader’ scenario. This will work in any scenario, which may be a multi-reader scenario (eg. deployment servers, repositories, etc.).
Data or I/O locality is critical for the highest possible VM performance and a key structure of the Nutanix File System (NDFS). With Shadow Clones the NDFS will monitor disk access trends. When all of the requests are read I/O, the disk, in our case the replica, will be marked as immutable. Once the disk has been marked as immutable the disk will then be cached locally by each Nutanix node making read requests to it. In the background, when that happens every CVM gets the map of where the immutable disk blocks are. If the disk data is local to the node great, if not it will automatically retrieve the data, without relying on the original CVM maintaining the replica disk, thus eliminating any possible service degradation due to multiple access request to the original CVM to copy the data.
This method allows VMs on each node to read the replica disk locally from the Nutanix Extended Cache (SSD and RAM). In the VDI case, this means each node can cache the replica disk and all read requests would be served locally, drastically improving end-user experience.
VM data will only be migrated when there is a read I/O request as to not flood the network and allow for efficient cache utilization. In the case where the replica disk is modified the Shadow Clones will be dropped and the process will start over. Shadow Clones are disabled by default and can be enabled/disabled using the following NCLI command:
ncli cluster edit-params enable-shadow-clones=true
Below we show an example of how Shadow Clones work and allow for distributed caching:
The architecture above is dramatically simpler and more scalable than traditional architecture, as the solution will scale indefinitely without degrading performance.
Some of the benefits of Shadow Clones are:
- Replica disk data is always served locally to the host
- Does not require the use of CBRC (Content Based read cache) and is not limited to 2GB RAM
- Reduced overhead on the Storage Network (IP Network) as read I/O is serviced locally, which ensures the lowest possible latency on the network for both write I/O & virtual machine traffic.
- During boot storms, login storms and antivirus scans all replica data can be served locally and NO read I/O is forced to be served by a single storage controller. This not only improves read performance but makes more I/O available for write operations which are generally >=65% in VDI environments
- The solution can scale while maintaining linear
- When the base image is updated, Nutanix NDFS detects the file has been written to an automatically creates a new snapshot which is replicated out to all nodes.
- The feature is enabled once and does not require ongoing configuration or maintenance.
Performance and Benchmark
By now you are probably wondering what are the real performance improvements and how can it be measured. Interestingly, some people have alluded to the fact that 10Gb networks are very fast and will not add latency, or that data locality is not really necessary. In my view that’s a normal behavior given that this is a new and disruptive technology that is not available in other converged solutions on the market. (I’m pleased Nutanix has a patent on that)
Nutanix has extensively tested shadow clones using VDI and server workloads. Here is an example:
400 Medium Workload Desktops – Shadow Clones Disabled
- During the testing with 400 desktops using LoginVSI the VSImax was not reached with a baseline of 5765ms and average VSIindex of 4985ms.
- The weighted response times were consistently below 2000ms with the Zip High Compression task having the highest response times as expected.
- Logon times ranged from ~20-30ms for the first 200 sessions and from ~24-40ms for the second 200 sessions.
400 Medium Workload Desktops – Shadow Clones Enabled
- During the testing with 400 desktops using LoginVSI VSImax was not reached with a baseline of 5676ms and average VSIindex of 3397ms.
- The weighted response times were consistently below 1500ms with the Zip High Compression task having the highest response times as expected.
- Logon times ranged from ~20-30ms for all 400 sessions.
Data locality with shadow clones radically improve the overall performance and end-user experience. If you are interested in additional details about the performance test and how LoginVSI measures performance, please refer to the Nutanix Citrix XenDesktop on Microsoft Hyper-V reference architecture.
Thanks to Dwayne Lesnner for reviewing the accuracy of this article.
Updated Shadow Clones tests:
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net