In the Hybrid Cloud, datasets must follow applications, not the other way around… and many are doing it wrong.

The definition of Hybrid Cloud may differ by implementation type or vendor, but a common description is that a Hybrid Cloud connects two discrete computing platforms sharing the ability to execute and burst the same set of applications and workloads.

These disparate computing platforms typically provide different consumption models (buy vs. rent) that should be chosen based on application and business requirements across service level agreements, performance, availability, cost, geolocation, CAPEX vs. OPEX, and so on.

From a hybrid cloud architectural perspective, the IT team manages the on-prem components (typically the organization’s data centers) while 3rd party hosting service providers manage the off-prem components (typically the cloud-based data center). This delineation is obviated at the fabric level (compute/memory/storage/networking) where deployments behave intrinsically as if there were a single unified platform or OS. Some authors allude to the beginning of the Cloud OS.

Whereas computing and memory are largely addressed with conventional resource scheduling (DRS) and live migration techniques, storage properties such as capacity and performance have not been entirely addressed in a Hybrid Cloud context.

Unless applications are developed from the ground-up for high latency and low network bandwidth resiliency or are built for eventual data consistency, storage datasets will frequently need to reside in proximity to computing and applications.


Data is heavy, and its location governs the optimal location of applications that need it in a hybrid cloud environment.” – BERT LATAMORE (Silicon Angle)

One method of overcoming the dataset migration puzzle is to replicate the entire or partial dataset upon an application or virtual machine migration event, frequently Gigabytes of data and in some cases Terabytes. The disadvantage of this ad-hoc replication operation is a requirement for both high time and large bandwidth bursts. In this case, the application will only complete its transition to the destination computing environment once all data has been fully replicated.

Pre-configuring synchronous or asynchronous data replication to the destination target or cloud tenant is a preemptive method for mitigating the migration challenges. Employing snapshots completes the data synchronization – compute and memory are live-migrated over the network. This technique provides a much faster migration route, but replication must be in place before the application migration event.

Configuring replication requires that IT teams know where applications will reside in the future. But since a Hybrid Cloud is a collection of multiple customer and service provider infrastructure stacks, it may be impossible to maintain a full replication mesh for all applications across the Hybrid Cloud. In Hybrid Clouds, applications are transient and may be running anywhere at a given point in time-based on application and business requirements, rendering the preemptive mode a poor option.


In the Hybrid Cloud, datasets must follow applications, not the other way around…

Hybrid Clouds ought to present themselves as a single storage, compute, and networking fabric with a single management plane across on-prem, off-prem and 3rd party service providers. While a difficult computer science problem to solve with conventional methodologies, live-migration, and network virtualization encapsulation techniques will do the trick. Storage fabric with large datasets and performance requirements, however, still presents a computational and physics problem requiring mitigation.

Data Locality is the key to unlocking applications for open movement between clouds. Data locality enables virtual machines and applications to move compute and networking components across different parts of a converged cluster using live migration, and over time equally moving datasets transparently to destination hosts on an as-needed basis and using idle cycles and bandwidth. The data locality end goal is to ensure that storage datasets are always on the same computing host as where the related applications live.

In the future, data locality will be a key enabler for applications and datasets to freely move between clouds and infrastructure stacks without the long wait for full dataset replication, and without a necessity to pre-configure replication and destination targets. Undoubtedly, there are hitches to be studied and solved, such as:

  • What bits of the storage dataset should be moved first?
  • How to retrieve remote data without affecting application performance and availability?
  • How to predict application data requirements and initiate the transfer before it is needed?


Nutanix CEO, Dheeraj Pandey, wrote: “Whosoever solves the convergence between own-and-rent with a methodical operating system and usability mindset will win the new wars of virtualization — one in which software makes computing location- and spend-agnostic; one in which machines make decisions on how long-term predictable workloads be owned and the rest rented. The data center is the new “motherboard.” Abstracting that motherboard — by decoupling the application — is the next frontier.



Hybrid Cloud Resource Scheduling

Only when applications and datasets can freely transpose private and public infrastructure stacks will the Hybrid Cloud will be fully realized. When that happen, traditional Distributed Resource Scheduling (DRS) will need to get smarter and make decisions that relate to the entire IT forest, and not only to its own domain. The cross-cloud harmonization will be vital.

Nutanix Acropolis Dynamic Scheduling implementation already factors in computing, memory, and storage (performance and capacity) to make intelligent application placement decisions. ADS manage applications on nodes to minimize the likelihood of resource (CPU, memory, and storage IO) contention and subsequently proactively solve any transient resource contention related to CPU, memory or storage.

An ever-evolving part of the Acropolis Distributed Storage Fabric is the way data is intelligently placed throughout the cluster and beyond. With the Hybrid Cloud needing to be highly flexible and scalable, the mixing of hardware generations is vital. Being able to place data in the most efficient and suitable performance tiers is also key. Development in this space is ongoing by industry vendors so data can move between tiers, clusters, locations and between public and private infrastructures in a policy-based manner.

The next step in this journey is to enable cross-cloud harmonization and unlock application mobility and the true Hybrid Cloud potential.



Thanks to Josh Odgers and Steve Kaplan for review and commentary.


Disclaimer: Opinions expressed are solely my own and do not necessarily express the views or opinions of my employer.

Leave a Reply