Aug 30 2015

Danger! Beware of potential Data Loss through Bit Rot…

Not all Hyperconverged Solutions are created equal. It’s all very well for vendors to discuss data resiliency and data recovery, but when they claim to support Tier One workloads, yet are not giving real consideration and protection to customer’s data, then there is a problem for the customer. One such example is VSAN. Unfortunately for customers VSAN does not have the core software-based data protection mechanisms in place to prevent data loss due to bit rotting. We all know that Data loss experience in any organization is significant, impactful and in this day in age – not tolerated. In this article I intend to outline the process of bit rot/data loss and the variances of features between Nutanix and VSAN.

Bit rotting is the deterioration of the integrity of data stored on storage media. It is also known by the names data rot and silent corruption. Most disks, disk controllers and file systems are subject to a small degree of unrecoverable failure. With the ever-growing disk capacities, data sets, and increases in the amount of data stored in magnetic and flash media, the likelihood of the occurrence of data decay and other forms of uncorrected and undetected data corruption increases.

Different techniques can mitigate the risk of such underlying failures such as by increasing redundancy, implementing integrity checking and self-repairing algorithms. The ZFS file system was designed to address many of these data corruption issues. EMC Isilon OneFS also has a service called MediaScan to periodically check for and resolve drive bit errors across the cluster. The Nutanix NDFS file system also includes data protection and recovery mechanisms.

A Netapp study found out that the risk of losing data through bit rotting events is thousands of times higher than predicted by “MTBF” failure models.

The problem that bit rotting poses to distributed storage systems where multiple copies of the data exist is that these systems may write or replicate a bad copy of the data making all copies unusable. In some cases, a good copy of the data could be overwritten with a bad one. There are two main methods to prevent or correct bit rot data, the first is to perform disk scrubs, which is something every reputable array vendor does; the second involves the use of redundant copies and checksuming to verify data integrity.


“A checksum or hash sum is a small-size datum from a block of digital data for the purpose of detecting errors which may have been introduced during its transmission or storage. It is usually applied to an installation file after it is received from the download server. By themselves checksums are often used to verify data integrity, but should not be relied upon to also verify data authenticity.” –



Every Nutanix node has a process called Stargate that amongst many other things is responsible for processing checksums. While the data is being written, a checksum is computed and stored as part of its metadata. Any time the data is read, the checksum is computed to ensure the data is valid.  In the event where the checksum and data don’t match, the replica of the data will be read and will replace the non-valid copy.

The response from each replica will carry the checksums of the updated data block. Each replica can then verify them to ensure that everyone wrote the exact same data. The Stargate service issuing a WriteOp can then store the resulting checksums within the metadata entry – this permits a disk scrubber to later verify these checksums.


Disk Scrubber (Curator)

Another important service is the Curator and it does continuous non-deterministic fault recovery. Besides being
responsible for data replication the Curator is continuously monitoring data integrity by verifying checksums of random data groups across the entire cluster.

The disk scrubbing activity is done at low priority for all disks in the cluster. Any corrupted data result in the data replica getting marked as bad – thus triggering off replication from a good replica. So even if a disk sector was to go bad after a successful I/O, Stargate’s scrubber operation would detect it and then create new replicas as necessary.


Nutanix & VSAN – The Variances

In summary, the Nutanix distributed file system has a number of features that ensure that checksums are computed and that data integrity is prioritized to guarantee that Customer data is safe. VSAN in it’s current release 6.0 doesn’t yet provide software-based checksum to protect against bit rotting which should be concerning for organizations adopting distributed storage architectures.

In fairness, VSAN does provide limited support for hardware-based checksum, but it will depend on the controller being used and it was difficult to find data on this while searching the web or the HCL. According to some blog posts only two controllers are supported and the What’s New VSAN 6.0 document has the following mention “Support for hardware-based checksum – Limited support for controller-based checksums for detecting corruption issues and ensuring data integrity (Refer to VSAN HCL for certified controllers).” I hear that VSAN will be introducing software-based checksum in 2016.

[UPDATE] VSAN 6.1 was announced today (31/8/2015), and still doesn’t provide software-based checksum.

We may be competitors, but data loss prevention, protection and integrity are important considerations for those recommending or purchasing a HCI system and we should all be clear on what these differences are to make informed choices.

Not all Hyperconverged Solutions are created equal!


This article was first published by Andre Leibovici (@andreleibovici) at

Permanent link to this article:

Aug 24 2015

VMware Horizon View on Acropolis Hypervisor

One of the submissions for the Nutanix Coding Challenge is quite interesting from a VDI point of view. The team at EZDC Automation used the self-service engine from vCloud Automation Center (vRA) to automate the creation and customization of desktops using the Acropolis API’s and making them available in VMware Horizon View for use. In this case Horizon View is handling the connection brokering to unmanaged desktops, but both persistent and floating use-cases can be successfully deployed.

While Horizon View comes by with VMware vSphere I can see organizations running mixed environments and placing desktops on different architectures based on cost, SLA and performance requirements.

Openstack + cloud-init provisioning anyone?!


[Watch in Full Screen]


This article was first published by Andre Leibovici (@andreleibovici) at

Permanent link to this article:

Aug 22 2015

Nutanix All Flash, Only When Required – VM Pinning

Nutanix continues to innovate at an incredible pace, with new features released every few months. All that is possible only because the data and management fabrics are completely independent from the hypervisor, not in-kernel. This enforced detachment has also allowed Nutanix to implement features such as non-disruptive upgrades for the entire stack, including drivers, firmware and the hypervisors themselves. BIOS automated upgrades anyone?!

Anyhow, most hyperconverged solutions utilize a combination of HDDs and SSDs to deliver both capacity and performance for workloads at acceptable costs. However, Nutanix was also the first vendor to deliver All Flash hyperconverged clusters, providing sub-millisecond latency across entire datasets for latency sensitive workloads.

I actually would argue that the large majority of workloads don’t require All Flash clusters because applications active working sets are often small and fit inside SSDs available in hybrid clusters.  Nutanix provides a simple way to identify application working sets sizes and the read source tier –  RAM, SSD or HDD.

Moreover, by using Data Reduction and Data Avoidance techniques, such as in-line deduplication and VAAI offloads, Nutanix demand even less SSD capacity to host multiple VM datasets on the hot tier. My colleague Josh Odgers has a very good post on Advanced Storage Performance Monitoring with Nutanix.

However, SSD is a more expensive resource and keeping all VMs in SSD at all times is not often economically viable.




All Flash Only When Required






As you probably have seen Nutanix recently announced a feature that will allow flash pinning even on hybrid nodes. This is not yet released, but when it is, it will allow VM’s or virtual disks to be pinned to the flash tier on a discrete basis.

A cluster running a SQL database workload with a large working set alongside other workloads may be too large to fit into the hot tier and could potentially hit the cold tier. For extremely latency sensitive workloads this could seriously affect the read/write performance of the workload.

The new VM Pinning feature will provide administrators with the ability to tell Nutanix clusters that a particular disk or VM belongs to a latency sensitive mission critical application and should never down migrate data blocks from SSD to cold tier to free up space in the SSD tier.

The pinning process is non binding, not requiring the full VM/disk to be pinned to SSD, and will allow administrators to optimize how much disk should be pinned to the SSD tier.



Could this be the end of the All Flash?

No, not at all… there are some ginormous workloads out there with immense active working sets and very low latency requirements that definitely require the All Flash treatment; and we have many of them running on Nutanix today, specially in the healthcare and banking verticals. But, for the large majority of workloads existing in mainstream enterprises the current hybrid approach + VM pinning will be more than enough to guarantee the required performance and SLAs. I would venture say That’s It for All Flash solutions when dealing with such workloads.


There’s a lot more to be said about this new feature, and how the reservation process works in tandem with the entire cluster activity. I will soon publish a video demonstration of the feature.


This article was first published by Andre Leibovici (@andreleibovici) at

Permanent link to this article:

Older posts «