HCI or Not? Understand the Datrium solution in a 3 minutes read

As a newer market player that implements a game-changing architecture that accelerates applications, simplifies infrastructure management, and combine data protection in a single solution I am frequently asked what precisely Datrium does, and if Datrium is HCI or not.

Is Datrium HCI or Not? A duality dilemma.

If you view HCI as being the physical bundling of computing and storage, and sometimes networking (NFV) in a single box, maybe it’s not. However, if you consider HCI as the seamless and delightful user experience to operate data centers that enact similarities with public clouds such as pay-as-you-go and simplicity, then yes.

The reality is that Gartner integrated systems Magic Quadrant doesn’t exist anymore and they created a new one for HCI that encompasses all systems that behave like the public cloud. I am starting to use ‘Legacy HCI’ to designate the first HCI systems to hit the market and operate as a hardware building block, but are now seeing newcomers, not only Datrium, with modern architectures that provide comparable and better benefits.

This HCI or Convergence journey is only getting started, and we have entered the 2nd generation now. We call ourselves OCI (Open Convergence), but Gartner will still place us in the HCI bucket.

What does Datrium do?

The simplest way to describe the extensive range of services provided by the Datrium DVX is to follow the high-level architecture here.

1 – Datrium is a tier-1 AllFlash primary storage that leverages similar concepts of HCI, implementing flash close to the CPU bus and data locality, ensuring data used by an application is always located on the server where the app is running. Applications read and write IOs land on local flash, and all applications benefit from enterprise data services such as inline deduplication, compression, erasure coding, encryption, replication, and snapshots. This storage tier shoulders up to 18 Million IOPS (4kb) and 256 GB/s random-write throughput, scaling from 1 all the way up to 128 hosts.

1 – This tier-1 AllFlash primary storage provides the proper namespace that works cross-platform, providing a single namespace to be used by VMware, KVM, bare-metal Containers and bare-metal applications.

2 – Datrium empowers organizations to leverage their existing investments as part of the solution, allowing for both new or existing servers, rack or blade. The only primary requirement is that the hypervisor of choice supports the server.

3 – Datrium provides integrated scale-out data protection that is cost-optimized for data retention and where archived data is checked for data integrity four times a day. From an IO path perspective, all application writes completed on the AllFlash primary storage are copied and protected with erasure coding in this scale-out data pool. Inline deduplication is utilized to eliminate all duplicate data, and snapshot restores to the AllFlash primary storage are instantaneous. This data protection tier scales from 1 up to 10 nodes, allowing up to 1.7 petabytes of usable storage capacity.

4 – Datrium also provides native asynchronous replication to a 2nd site or data center. This replication is fully deduplicated and can be setup as site-to-site across multiple sources and destinations.

5 – Cloud DVX is a cloud-native solution that provides seamless data archiving from on-premise Datrium to AWS. Due to Datrium universal deduplication, all the data from multiple applications and sites are only stored once on AWS, and Datrium only uses incremental forever replication. This approach eliminates most AWS storage costs, but also the costly egress traffic for retrieving data back on-prem when required. Cloud DVX is also set to be the multi-site management user-interface and the native disaster recovery orchestrator and witness.

6 – Cloud DVX is further the source-repository that enables a fully orchestrated disaster recovery to VMware on AWS (VMC). The integrated solution also provides fully deduped and orchestrated replication from VMC back to Cloud DVX and then back on-prem. VMware has announced VMC for GCP, Azure, and AWS Gov, and Datrium customers automatically get to utilize the new cloud’s services. {Demos at VMworld’18 and Availability later on}

7 – Not necessarily part of the product architecture, but with RedHat and Pivotal, Datrium allow customers to embrace kubernetes and containers across multiple virtualization platforms, bare-metal and also leveraging resources from the most prolific cloud services.

There are other product nuances that I could discuss in a deep dive, such as the unique end-to-end FIPS encryption, or the amazingly low price parts used to achieve performance that other vendors cannot achieve even using high-end NVMe and latest generation servers; but I’ll leave that for a next chapter.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

 

Hands Down the Ultimate VDI Platform – 2018 Review

Back in 2014 when I left VMware and joined Nutanix, I wrote two articles entitled “Hands Down the Ultimate VDI platform” (here and here) explaining how hyperconvergence and more specifically Nutanix were the perfect platforms for running VDI – and in all honesty that was one of the primary reasons I left VMware EUC CTO Office. I saw the opportunity to help thousands of organizations struggling with VDI deployments….and I was right, hyperconvergence is pretty much the default deployment model for VDI nowadays.

 

“We are products of our past, but we don’t have to be prisoners of it.”
Rick Warren

 

The hyperconvergence primers that radically changed how VDI is deployed included the ability to efficiently scale-out deployments and seamlessly ramp-up from dozens to hundreds and to thousands of users, the data locality making virtual desktops work primarily from host attached flash, the data services that eliminated duplicated data across hundreds or thousands of virtual desktops, and the ability to provide VM level replication for persistent desktops.

That was in 2014, and now we are in 2018! It’s amazing how new technologies can be created and proliferated in few years. Five years ago you did not play AR games like Pokemon Go, and rockets were not landing vertically.

Today, I am convinced that the technology provided by Datrium is the best solution for on-premise VDI deployments, and I would like to explain what makes it Hands Down the Ultimate VDI platform. Moving ahead I may refer to Datrium as OCI (Open Convergence Infrastructure).

 

  1. OCI provides comparable HCI capabilities:

 

  • Scale-Out (Pay as You Go)
  • Data Locality with flash for performance
  • Data Services (Inline deduplication, compression, and erasure coding)
  • VM level replication for persistent desktops

 

2. Open Convergence drastically improves upon and extends HCI capabilities:

 

  • New and Existing Servers

HCI commonly specifies the precise server brand, model, and configuration that needs to be obtained, making the rollout of virtual desktops always a Green Field initiative, with the associated costs of obtaining new hardware.

OCI does not impose hardware limitations, and new and existing servers, rack or blades, can be used as part of the deployment. Moreover, because there’s no East-West traffic between servers, older hardware generations do not impact the performance of new hardware performance, unlike HCI. This approach also directly benefit the Return on Investment and Total Cost of Ownership of the overall solution, making VDI more accessible than ever before.

 

  • Stateful vs. Stateless

HCI protect desktops with copies of the virtual desktop data across servers, creating East-West traffic. Data is persistently stored on each server. For large deployments, due to the high networking traffic across servers, networking may need to be updated with Spine-and-Leaf architecture.

The fundamental premise of OCI is that servers are stateless and data is not persistently stored on servers, making all the data on host flash ephemeral. A server failure doesn’t generate data resiliency problems, unlike HCI.

When a server fails, the required data for virtual desktops is instantaneously uploaded to the new target host flash from a highly scalable persistent data pool. Furthermore, universal dedupe and crypto-hashing will ensure that common data across virtual desktops (Windows binaries, application files etc.) already exist on the new target host, removing the need to upload that data in many cases. Because all the data from virtual desktops are fully de-duplicated on the persistent data pool – 1,000 virtual desktops look like a single image from a persistent storage capacity perspective.

 

  • Multi-Dimensional Growth (Performance vs. Capacity)

VDI workloads are heavy in IOPs that can be pretty much delivered by most HCI vendors using locally attached flash. I don’t believe performance for VDI is an issue for any vendor today. However, to scale the number of virtual desktops, servers must be added to the cluster, and with new servers, loads of storage capacity are added because HCI vendors specify the hardware vendor, model, and config – generally speaking a combination of flash and HDDs.

Unlike HCI, OCI enables capacity and performance to grow in different dimensions. If performance is needed, add a new server with one or two flash devices; meanwhile, if capacity is required, add another datanode – up to 10 datanodes and up to 1.7 petabytes of usable capacity.

It makes no sense to keep adding storage capacity that will never be used by the VDI solution. In some cases, a portion of this extra capacity can be used for user profiles; but is still overkill.

 

 

  • Non-Persistent, Floating and Instant Clones

When it comes to Non-Persistent, Floating and Instant Clones, makes no sense to run them on servers with persistent storage, replication factors and RAID overhead.

Unlike HCI, OCI will only store ephemeral data on hosts without the overhead of RAID or mirroring. Only the virtual desktop data that is unique across all virtual desktops on the deployment are then erasure coded (triple mirroring N+2 parity) to the persistent data pool. As you can imagine, over hundreds or thousands of virtual desktops the data commonality is exceptionally high. This method and trait also enable drastic reduction on the amount of data being sent over the network.

 

  • Persistent Desktops

Many organizations still like to have some of their desktops be persistent across users and sessions.

To be cost-effective, vendors recommend that HCI clusters be configured with two-way mirroring, instead of three-way mirroring. No enterprise that is serious about their data should be using RAID5 in 2018. According to Wikipedia, Dell posted an advisory against the use of RAID 5 back in 2012.

Unlike HCI, OCI always protects virtual desktops with erasure coding (distributed three-way mirroring with N+2 parity stripes) keeping a backup (or copy) of all deduplicated data on a scalable data pool, while the host running the virtual desktop has just an ephemeral copy. Just like with the non-persistent model, only virtual desktop data that is unique across all virtual desktops is persistently stored in the data pool – the universal deduplication operates across hosts and the data pool.

 

  • Scalability

While HCI enabled the scale-out approach for virtual desktops solving the SAN scalability issues, most HCI vendors will recommend a maximum of 16-24 hosts as part of a single cluster deployment. The outcome is significant management overhead for multiple clusters. Sometimes this management can be aggregated via a user interface that combines the deployment view, but clusters still need to be managed independently.

Datrium has been externally validated by IOmark with 128 servers and ten data nodes, providing up to 18M IOPS, 256 GB/s of random write throughput, and 1.7 petabytes of useful capacity (deduplication and compression are assumed here, but virtual desktops, in general, have high data avoidance and deduplication ratios).

Another interesting point is that Datrium will support multiple VMware vSphere clusters as part of the same Datrium domain, presenting a single namespace across VMware clusters.

 

 

(*)There is a caveat that I would like to share. Today Datrium supports VMware and KVM (CentOS and RedHat) hypervisors, therefore if you are looking for a different hypervisor solution, we cannot help you today.

 

Conclusion

I have been in the End-User Computing and storage industry for a long time, and I don’t take my credibility lightly. I am the first one to say that hyperconvergence made a lot of sense and truly helped many organizations. However, when new technologies come to light and start to hit mass adoption, we all must ensure that we are not prisoners from the past and look for alternatives that take us to the future.

Things can change, and maybe in another five years or ten years, there will be another excellent solution that builds atop of OCI and make things even more awesome. Today, and for the foreseeable future, I am convinced that Datrium and OCI offer the best solution for running virtual desktops, be it Horizon or Citrix, Persistent or Non-Persistent, VDI or Session Host.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

 

 

 

 

Breaking the Data Gravity Hypothesis… The Data Anti-Gravity

Data Gravity is the term used describe the hypothesis that Data, like planets, have mass and that applications and services are naturally attracted to Data. This is the same effect Gravity has on objects around a planet.

Dave McCrory first coined {link} the term Data Gravity to explain that because of Latency and Throughput constraints applications and services will or should always be executing in proximity to Data — “Latency and Throughput, which act as the accelerators in continuing a stronger and stronger reliance or pull on each other.”

The theory is also used to describe how Data mass growth restrains applications to move to or from private and public clouds. What some people also allude to is that inertia causes the real data mobility problem and that transferring large amounts of data is still hard because of the speed of light.

Data Gravity – in the Clouds – by McCrory

 

In any way the hypotheses are applied, the truth is often in the eye of the beholder, and most commonly a sound technology has not been developed, or it has not been used in a way that breaks the hypotheses.

This Data Gravity theory is highly applicable to the evolution of datacenters and clouds, both private and public. The ascension of host attached Flash devices, and the ability to utilize them on local computing buses vs. over a network is a clear indication that applications benefit from the Data proximity.

However, when it comes to application and system mobility, we are still wrapped by Latency and Throughput, making such data movement hard, particularly when addressing vast Data Lakes. McCrory also determined the key factors that are preventing Data Gravity mitigation, including, Network Bandwith, Network Latency, Data Contention vs. Data Distribution, Volume of Data vs. Processing, Data Governance/Laws/Security/Provenance, Metadata Creation/Accumulation/Context, and Failure State.

In the case of data movement between clouds, the real puzzle is how to dilute and reduce Data to it’s most essential fundament, a sequence of bits and bytes that never repeat themselves. Also known as Data Deduplication, this technology has been around for many years, but it has always been used as in a self-contained manner, this means that Data is de-duplicated in a container, drive, in a host, in a cluster, on the wire.

If was possible to de-duplicate application data at a global level, across various datacenters, across clouds, across Data Lakes, and across systems then we would be guaranteeing very high-level of data availability in each part of the globe because data becomes ubiquitous and universal.

 

How does that work in practice?

An application running in a private datacenter has each data block de-duplicated and hashed locally creating a unique fingerprint, then these fingerprints are compared to hashes available on AWS from this own same system or from all systems running on AWS from all customers that are also uniquely hashed and fingerprinted. Then only the outstanding and unique data is transferred before migrating the application location from an on-premise to AWS in a fraction of the time and bandwidth that is required today with traditional mechanics.

Universal De-duplication makes data ubiquitous and universal, common to every possible application and system, while Metadata takes on a vital role, building datasets, enforcing policies and distribution. Metadata is what defines my Data Lake, from your Data Lake – my system from your system.

Universal Deduplication solves the puzzle and most of the key issues described by McCrory. The bigger the pool of de-duplicated data available on a given location (AWS, Azure, On-Premise), the lesser bandwidth is required because most of the necessary data is already there. Data Contention and Distribution issues are gone because data is ubiquitous and common to all systems, while Metadata starts playing a vital role. Data Governance becomes a Metadata intricacy, not a data problem. Encryption and Data Security becomes a concern at the Metadata level, not at the Data level.

While there would need to have a further in-depth discussion on the matter, it is clear that unless we look at the problem with different eyes, we will not solve the exponential data growth problem and data mobility. Universal de-duplication is a sound way to address the Data Gravity hypothesis. The Data Anti-Gravity.

 

How does that relate to Datrium?

On a smaller scale that’s one of the critical issues that Datrium is solving for enterprise data center workloads, especially Virtual Machines, and Stateful Containers. In a Datrium solution, Data is universally de-duplicated across drives, hosts, systems, clusters, links, multi-datacenter deployments or on AWS – for a customer or enterprise.

A simple example, an IT admin may load stale and legacy data from a backup onto a secondary of DR site that has not been backed up from the same system, same dataset, or even using the same backup tool; and the primary site will never re-send data that is already on the destination system. No checkpoints required, no pre-synchronization – it’s simple universal metadata hash and fingerprinting comparison at the most basic level.

Another example, when archiving or restoring data from AWS S3 only data with unique hash fingerprints are sent for archiving, reducing the amount storage and bandwidth required, but also, more importantly, solving another more meaningful issue, the artificial Data Gravity created by AWS, with the high egress cost for data.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

Load more