Sooner or Later, Storage Drives, Disk or SSD, will Fail!

Storage arrays and HCI systems come in all shapes and sizes with vastly different architectures tailored to a variety of use cases. One thing they all have in common is that, sooner or later, storage drives, disk or SSDs, will fail. Entire drives may become unavailable, or they may silently lose a few sectors to what is known as Latent Sector Errors (LSEs). LSEs and silent corruptions injected by faulty hardware or software are increasingly common despite built-in drive ECC.

A storage system must remain available and continue to serve data in the face of drive failures and LSEs. In our view, any enterprise storage system must tolerate the failures of at least two drives which will most often manifest as the failure of one entire drive plus an LSE on a second drive discovered during a drive rebuild.

Datrium DVX relies on Erasure Coding during its normal operation and exposes no controls to disable EC or change the level of data protection. It does so without any performance sacrifices for workloads heavy on small random writes and overwrites.

The technical paper below describes in detail data protection modes and how DVX with built-in Erasure Coding achieves 1.8M IOPS for 4K random writes for a system configured with 10 hard disk-based Data Nodes which exceeds the performance of most all-flash arrays.

This article was first published by Andre Leibovici (@andreleibovici)

