«

»

Mar 03 2014

Nutanix Shadow Clones Explained and Benchmarked

One of the Nutanix features that drastically improve VDI performance and end-user experience is Shadow Clones. I briefly discussed Shadow Clones in the past. However, before explaining how Shadow Clones work let’s recap how VMware View Linked Clones work (the same is valid for XenDesktop MCS).

To deploy linked clone desktops the administrator first creates what is called the Parent VM or Master Image. This is a virtual machine running the Windows GuestOS customized with recommendations for virtual environments.

After the Parent VM is assigned to the desktop pool in View a clone of this VM is created. This clone is called Replica. In old VMware View releases the replica VM would be created in each datastore hosting virtual desktops for the desktop pool (figure 1).

From VMware View 4.5 onwards administrators started to have the ability to specify a single datastore to host replica disks for the entire vSphere cluster. This feature was originally created to enable a small subset of SSDs to provide high-performance access to replica disks (figure 2) in a time where Flash SSD memory was very expensive.

 

(Figure 1)

 

(Figure 2)

 

After the replica disk is created VMware View starts creating linked clones. The linked clone is an empty disk that will grow overtime according to data block changes requested by the GuestOS. This disk is also called Delta disk because it accumulates all the delta changes, and it may grow to a maximum size set for the parent VM (here is where de-duplication is essential, but that’s a topic for another article). This replica disk is read-only and used as primary disk for all deployed desktops. Write I/O and data block changes are written and read from delta disks.

As I mentioned, it used to be a recommended practice to allocate Tier 1 storage for replica disks because all virtual desktops in the cluster use them as base image. With hundreds or even thousands of desktops reading from the replica disk the storage solution could quickly become a bottleneck if some form of caching was not used.

 

Enter converged infrastructures

Converged infrastructures provide highly distributed storage aggregating local HDD and SSD to deliver a high performing scale-out architecture.
 

 

Despite most converged storage solutions provide the ability to create multiple datastores, it is commonly accepted that a single large datastore is presented to all hosts, aggregating local HDD and SSD across every host in the cluster. The picture below detonates what this architecture look like from a logical perspective.

 

 

While the above is true, on the physical layer things are a little more complex and every converged storage solution will handle data distribution in a different way. However, due to data distribution, high availability and replication factor requirements every VM disk will have data blocks distributed across three, four or more servers in the cluster. The picture below detonates the physical architecture with data being distributed across nodes and disks.

 

 

The problem with such granular and distributed architecture is that every server needs to go out to the network to gather replica disk data blocks, and not a single server will have the entire replica disk, causing a enormous burden on the network, increasing latency due to network hops, and most importantly possibly clogging up with requests the three or four servers that maintain the distributed replica disks.

 

Enter Shadow Clones

Nutanix Shadow Clones allow for distributed caching of a particular disk or VM data, which are in a ‘multi-reader’ scenario. This will work in any scenario, which may be a multi-reader scenario (eg. deployment servers, repositories, etc.).

Data or I/O locality is critical for the highest possible VM performance and a key structure of the Nutanix File System (NDFS). With Shadow Clones the NDFS will monitor disk access trends. When all of the requests are read I/O, the disk, in our case the replica, will be marked as immutable. Once the disk has been marked as immutable the disk will then be cached locally by each Nutanix node making read requests to it. In the background, when that happens every CVM gets the map of where the immutable disk blocks are. If the disk data is local to the node great, if not it will automatically retrieve the data, without relying on the original CVM maintaining the replica disk, thus eliminating any possible service degradation due to multiple access request to the original CVM to copy the data.

This method allows VMs on each node to read the replica disk locally from the Nutanix Extended Cache (SSD and RAM). In the VDI case, this means each node can cache the replica disk and all read requests would be served locally, drastically improving end-user experience.

VM data will only be migrated when there is a read I/O request as to not flood the network and allow for efficient cache utilization. In the case where the replica disk is modified the Shadow Clones will be dropped and the process will start over. Shadow Clones are disabled by default and can be enabled/disabled using the following NCLI command:

ncli cluster edit-params enable-shadow-clones=true

Below we show an example of how Shadow Clones work and allow for distributed caching:

 

 

The architecture above is dramatically simpler and more scalable than traditional architecture, as the solution will scale indefinitely without degrading performance.

Some of the benefits of Shadow Clones are:

  1. Replica disk data is always served locally to the host
  2. Does not require the use of CBRC (Content Based read cache) and is not limited to 2GB RAM
  3. Reduced overhead on the Storage Network (IP Network) as read I/O is serviced locally, which ensures the lowest possible latency on the network for both write I/O & virtual machine traffic.
  4. During boot storms, login storms and antivirus scans all replica data can be served locally and NO read I/O is forced to be served by a single storage controller. This not only improves read performance but makes more I/O available for write operations which are generally >=65% in VDI environments
  5. The solution can scale while maintaining linear
  6. When the base image is updated, Nutanix NDFS detects the file has been written to an automatically creates a new snapshot which is replicated out to all nodes.
  7. The feature is enabled once and does not require ongoing configuration or maintenance.

 

Performance and Benchmark

By now you are probably wondering what are the real performance improvements and how can it be measured. Interestingly, some people have alluded to the fact that 10Gb networks are very fast and will not add latency, or that data locality is not really necessary. In my view that’s a normal behavior given that this is a new and disruptive technology that is not available in other converged solutions on the market. (I’m pleased Nutanix has a patent on that)

Nutanix has extensively tested shadow clones using VDI and server workloads. Here is an example:

 

400 Medium Workload Desktops – Shadow Clones Disabled

  • During the testing with 400 desktops using LoginVSI the VSImax was not reached with a baseline of 5765ms and average VSIindex of 4985ms.
  • The weighted response times were consistently below 2000ms with the Zip High Compression task having the highest response times as expected.
  • Logon times ranged from ~20-30ms for the first 200 sessions and from ~24-40ms for the second 200 sessions.

 

Screen Shot 2014-02-28 at 6.27.32 PM

 

400 Medium Workload Desktops – Shadow Clones Enabled

  • During the testing with 400 desktops using LoginVSI VSImax was not reached with a baseline of 5676ms and average VSIindex of 3397ms.
  • The weighted response times were consistently below 1500ms with the Zip High Compression task having the highest response times as expected.
  • Logon times ranged from ~20-30ms for all 400 sessions.

 

 

Data locality with shadow clones radically improve the overall performance and end-user experience. If you are interested in additional details about the performance test and how LoginVSI measures performance, please refer to the Nutanix Citrix XenDesktop on Microsoft Hyper-V reference architecture.

 

Thanks to Dwayne Lesnner for reviewing the accuracy of this article.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Similar Posts:

Permanent link to this article: http://myvirtualcloud.net/?p=5979

10 comments

Skip to comment form

  1. Sven

    Those pictures from a LoginVSI test result don’t look good to me. The VSI baseline should be somewhere between 800-2200ms, usually around 1500 with Windows 7 VDI desktops. Then the VSI index average should increase over time, when more sessions are added to the host. Like this:
    http://foonet.be/wp-content/uploads/2013/05/12-login-vsi-40-analyzer-vsimax.png

  2. Dan

    I can’t think of many places where a 24-40ms logon time would be considered a bad user experience..

  3. Steven Poitras

    @Sven – Agree with your comment, however I wouldn’t fully draw conclusion from the graph as without the ZHC test the baseline was sub-1000ms. Compression tests pushes the average up substantially on a CPU constrained system.

    All of the tests were run in benchmark mode and we didn’t remove ZHC from the tests like some others :)

    Also we were running 400 desktops on a single 2U unit, which will obviously push the CPU substantially

  4. Lieven

    What is the reason why Shadow Clones are disabled by default? Are thee any possible negative impacts of enabling Shadow Clones?

  5. Andre Leibovici

    Thanks @Poitras.

    @Sven, what is also very interesting is how latency doesn’t creep up as LoginVSI increase the workload. This is a combination our our scale-out, data locality, and RAM and SSD caching technology. This is the linear behavior expected as a Nutanix cluster scales out.

  6. Andre Leibovici

    @Lieven, the Shadow Clones feature was introduced in 3.5 and I believe engineering preferred to ship it disabled. There’s no reason why it should no be enabled by default, and with the next NOS release it will be enabled by default.

  7. Thomas Brown

    Once you enable this feature, does Nutanix know which VM to replicate automatically or do we have to tell it to replicate the replica VM?

    Also, you say CBRC is not required but is there a reason why you wouldn’t enable it for this? It seems like 2GB of cache in the RAM would still be faster than local storage if only by a small amount.

  8. Andre Leibovici

    @Thomas, yes, there is no administrator interaction. Nutanix automatically create and manage shadow clones.

    For your CBRC question I recommend you reading ‘CBRC-like Functionality For Any VDI Solution with Nutanix’ http://myvirtualcloud.net/?p=5856 In the article I explain why Nutanix Content and Extent caches do a better job than CBRC.

  9. Virtual_blends

    ZHC is a mandatory measurement in LVSI. ZipHighCompression, which is CPU/ I agree with Sven. There is no hockey stick line, the avg and baseline response times is not really ‘impressive’/

    http://www.loginvsi.com/documentation/index.php?title=VSImax

    1. Andre Leibovici

      @virtual_blends if you read Steven Poitras response you will see that this test was done with 400 desktops on a single 2U unit, therefore not targeting a performance demonstration, rather the benefits of data locality and shadow clones.

      Also, I recommend you reading “Is Hyper-Converged faster than SAN?” http://myvirtualcloud.net/?p=6315 to understand how hyper-converged differ from a SAN and how it allow linear scalability.

      In regards to performance, I will blog and point you soon to a Citrix Validated Design using LoginVSI that will answer all your questions. The real truth is that as CPU performance improves so does Nutanix performance.

Leave a Reply