Dec 19 2014

VMware AppVolumes and Nutanix Shadow Cloning Marriage

Advertisement

My colleague Kees Baggerman published a brilliant post on how Nutanix Shadow Cloning not only improve performance for Horizon Linked Clones and XenDesktop MCS, but also hugely improve performance for VMware AppVolumes. I recommend reading his article here.

In his single VM test Kees Baggerman saw an improvement of almost 360% for Adobe Reader through the use of Shadow Cloning and data locality.

I would like to shed some light on how important these performance improvements are in the context of distributed storage architectures.

The performance improvements seen by Kees were due to Nutanix being able to analyze the IO access pattern for the AppVolume vmdk and determine that the vmdk was serving only read requests. Based on that Nutanix create local copies, in RAM or SSD, of the vmdk for each server accessing the AppVolume stack.

If you look at the picture below, where I have a large cluster running VDI with hundreds or thousands of virtual desktops you will notice that all desktops access the same AppVolume vmdk (e.g. MS Office 2013) hosted and managed by node number 1.

In a distributed storage architecture the vmdk is distributed across servers (the methodology for how the vmdk is distributed varies according to the solution), but there is commonly a single server actively serving the vmdk. In this case the server number 1 is actively serving the vdmk.

These operations not only can saturate the network with unnecessary operations but also overload the active serving vmdk server with requests, throttling resources.

 

Screen Shot 2014-12-18 at 10.55.34 PM

 

What kills network performance is not latency but congestion. Congestion can come in many forms – microbursts (200Mbps burst for 10ms, which equates to 20Gbps equivalent of traffic on the 10G port for that 10ms, resulting in high traffic getting dropped if switches do not have enough buffer), or for e.g. a misbehaving NIC sending PAUSE frame and slowing down the network.

Caching is also extensively used by distributed storages. Caching can use SSD or RAM and it uses ingest and eviction policies according to data access and capacity allocation. Once data is evicted from cache it needs to be once again read from server number 1 over the network, when required. VDI environments commonly have very random IO pattern causing data eviction to be very frequent, especially if no data de-duplication is available. Performance of virtual desktops suffers when the active dataset can no longer fit in premium storage tiers, and that’s how data de-duplication help performance.

Shadow Cloning do not allow data eviction to happen. This approach completely eliminate the need for communication over networking to re-load data. The data is only migrated on read as to not flood the network and allow for efficient cache utilization.  In the case where the AppVolume vmdk is modified the Shadow Clones will be dropped and the process will start over.

Hopefully now you understand how AppVolumes and Shadow Cloning provide better performance and resource utilization. Now, visualize a VDI environment without Shadow Cloning with 1000+ users and the chaos to provide access to a multitude of AppVolumes stacks across the network on a consistent basis with cache ingestions, evictions and the network congestion.

 

In a next article I will introduce you to AppVolumes stack replication to different sites and datacenters. Nutanix is one of the only solutions that is able do vmdk based backups and replication, instead of virtual machines.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6825

Dec 18 2014

Nutanix Metro Availability Operations – Part 3

In the first part of this Nutanix Metro Availability series I discussed datacenter failure recovery operation (failover) to a secondary site. In the second part I talk about the operational procedure to resume normal operations after a successful failover. In this third and last part I discuss the operational procedure to recover an entire datacenter to a new Nutanix cluster in a new site.

If you missed the announcement of NOS 4.1, please refer to All Nutanix 4.1 Features Overview in One Page (Beyond Marketing).

 

Datacenter Failure Recovery to a new Cluster – Operation

The example below follows a datacenter failure recovery (failover) as described in my first two articles.

I had two sites that were replicating distinctive containers to each other (bi-directional). After a network or datacenter outage the Metro Availability peering was automatically broken to allow each surviving cluster to operate independently. However in this case, let’s assume that Site 1 was completely lost due to flooding or another natural disaster.

In this case site 2 had all the data for site 1, and site 1 is completely down. The administrator decides to move the entire workload belonging to site 1 to Site 3. (Please note that the administrator may choose to temporally run the workload from site 1 in site 2 until it’s time to move to site 3).

Just like the other metro cluster operations, re-establishing operations in a brand new site is just couple clicks away. After racking, stacking and configuring the new cluster in site 3 the administrator need to establish connection between sites 2 and 3. This can easily be done via the PRISM UI.

The first picture demonstrates the scenario described, where site 2 is lodging the workload from site 1 (blue) and it’s own workload (green). Now data must be migrated to a complete new cluster in a new site (1).

 

(1)

Screen Shot 2014-12-18 at 9.30.03 PM

The next step is to enable container replication (blue) for between sites (2).

(2)

Screen Shot 2014-12-18 at 9.30.23 PM

The replication, if bidirectional, will start to synchronize the data in the container (3) between both sites. Please note that metro availability replication works in conjunction with all NOS data management features such as compression, de-duplication, shadow clones, automated tiering and others. Additionally, Metro availability also offers compression over the wire to reduce the amount of bandwidth required.

(3)

Screen Shot 2014-12-18 at 9.30.32 PM

 

Once the replication is complete the next step is to promote the blue container (4) in site 3. The container promotion tells the Nutanix cluster in site 2 to now run the virtual machines on site 1. When that is done the virtual machines will automatically restart on the new cluster (5) and operations will be resumed. The promotion step is a one-click manual procedure, but it can also fully automated with some basic scripting or run-book automation tools. Please note that this scenario assumes stretch clusters and stretch VLANS are in use.

(4)

Screen Shot 2014-12-18 at 9.30.41 PM

(5)

Screen Shot 2014-12-18 at 9.31.04 PM

 

I have been in the technology and infrastructure space for a long time and have managed very large datacenters. I have never seen a solution that allows efficient data migration with failover and failback operations in such simple and elegant manner.

If you are interested in reading the first two parts of this series: Nutanix Metro Availability Operations – Part 1 and Nutanix Metro Availability Operations – Part 2.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6816

Dec 10 2014

Nutanix Data-at-Rest Encryption Demo Video

Nutanix clusters are deployed in a variety of customer environments requiring different levels of security, including sensitive/classified environments. These customers typically harden IT products deployed in their datacenters based on very specific guidelines, and are mandated to procure products that have obtained industry standard certifications.

Data-at-rest encryption is one such key criteria that customers use to evaluate a product when procuring IT solutions to meet their project requirements. Nutanix data-at-Rest encryption satisfies regulatory requirements for government agencies, banking, financial, healthcare and other G2000 enterprise customers who consider data security products and solutions.

The data-at-rest encryption feature is being released with NOS 4.1 and allow Nutanix customers to encrypt storage using strong encryption algorithm and only allow access to this data (decrypt) when presented with the correct credentials, and is compliant with regulatory requirements for data at rest encryption. Nutanix data-at-rest encryption leverages FIPS 140-2 Level-2 validated self-encrypting drives, making it a future proof since it uses open standard protocols KMIP and TCG.

The video bellow demonstrates how easy and simple is to enable and manage Nutanix Encryption.

 

 

For more information read Simply Secure by Amit Jain, or my article New Nutanix Data-at-Rest Encryption (How it works).

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6812

Older posts «