Disaster Recovery Seeding is a Pain!

Learning new technology is exciting, and I recently came across a Datrium feature that I believe no one else in primary storage delivers today. It is solving a cumbersome problem for organizations with large amounts of data that need to be seeded for disaster recovery purposes.

Seeding DR datacenters can be an arduous and lengthy procedure even for the most advanced storage platforms, especially when the amount of data is exponentially higher than the bandwidth available between datacenters. Simply put, all data from the site A must efficiently travel to site B before next snapshots start replicating and incremental data can start flowing.

In the past, the simplest way to achieve that was to place legacy disk-arrays (SAN) side-by-side, replicate the data, physically move the array to the DR site and reinitiate replication from the last snapshot. This method surely isn’t the desired resolution for the problem because it requires transport of heavy, big and delicate equipment.

In the case of hyperconverged (HCI) platforms this problem is exacerbated given the fact that data live on different servers. HCI require transport of dozens of servers between sides.

Data replication over-the-wire (WAN) is a more popular option, but depending on the amount of data to be replicated and the bandwidth available, the initial seeding process can take days, weeks and even years. Networking vendors introduced de-duplication over-the-wire reducing the amount of data transmitted, therefore reducing the time spent executing full initial DR seeding. However, that worked only for small and similar datasets due to the massive amount of metadata and computing necessary to maintain coherence on both sides of the network.

 

Image source

 

More recently we started to see storage vendors implementing native de-duplication over-the-wire, replacing specialized WAN deduplication appliances; Datrium also offers the technology.

That said, the amount of data in enterprises has grown exponentially over the last decade and even with de-duplication over-the-wire starting the migration of Terabytes or Petabytes of data is a monumental task even for links with ample bandwidth. Mere 100 TB transmitted over a 200Mbps WAN takes 46 days. 1 Petabyte takes over a year. AWS knowing about the challenges created Snowball, a solution that syncs your data on-premises and then ships to their datacenters.

 

The Datrium Way

With Datrium, data is ALWAYS globally deduplicated, compressed and erasure coded, and blocks are uniquely and logically aggregated. Because of that, the system possesses the full understanding of existing and missing data blocks of both datacenter sites, primary and DR, allowing administrators to load any data, even old data from old backups, on the DR site and Datrium recognizes the differences. From there the system only transfers missing or different data, and when the initial seeding is complete, the regular snapshot replication cycle initiates.

 

Here are the five different ways users can do the initial seeding with Datrium:

 

  1. Side-by-side like “legacy” arrays (or over a fast network link).
  2. Restore from an existing Veeam/Commvault/Cohesity/NetBackup/ARCserve disk backup on a remote/DR site. The backup can even be stale.
  3. Restore from a tape backup on a remote site. The backup could be stale.
  4. Copy from an existing LUN mirror of a legacy array to a Datrium DVX on the DR site. The LUN mirror could also be stale.
  5. Ship a USB drive from the source to the destination.

In all cases, the remote site can be seeded with somewhat stale data and Datrium replication figures out what’s missing and transfer only that data incrementally. This also means that the primary site does not need to be stopped and, in the case of some backups already present on the DR site, the primary site is not affected by the initial seeding.

While global deduplication does reduce storage space, there are other benefits too. Technology foundations do matter!

 

Thanks, Sazzala Reddy (@sazzala) and Boris Weissman for review and comments.

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

 

Leave a Reply