Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media. Erasure coding is extensively used in data centers since they offer significantly higher reliability than data replication methods at much lower storage overheads. Erasure coding is broadly applicable, but especially relevant in large clusters with mission critical data, opting for RF3 configured resiliency.
Erasure coding has been traditionally implemented using RAID groups on disks; however those are commonly bottlenecked by single disk, constrained by disk geometry and generally waste space implementing hot spared. Nutanix EC is done across nodes instead of disks, optimizing availability with faster rebuilds and utilizing the entire cluster through map-reduce processes to compute block parities.
Nutanix EC is very easy to be enabled; just a click and the work will start in the background.
How it works?
Each Nutanix container has a defined replication factor (RF) for data resiliency and availability, either 2 or 3. Once data is cold, data and data copies are thinned down by computing the parity for a set of data. This process occurs as a background distributed job.
Since only cold data is erasure coded, hot data remains in the RF state. This is a good thing because if there is a node failure, then hot data is simply read from RF copies elsewhere on the cluster, without any in-flight rebuild penalty.
Important – EC works in concert with de-duplication and Compression. All 3 data reduction methods are complementary to each other.
How efficient is Erasure Coding?
The examples below walks you through various configurations of cluster size, the strips possible, and the savings as a result. The purple-gray nodes are nodes that are avoided when creating the Erasure strip (parity), so that these nodes could be used for rebuild, if a node were to fail. The EC Engine will balance Capacity savings with the cost and time of rebuild. 3 nodes while technically possible, are not recommended, since rebuild nodes are not available
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net