Web-Scale IT. It’s Here. It’s Now.

Web-Scale IT. It’s here; it’s now and is allowing IT to seamlessly and effortlessly support the pace and growth of business.

I’m looking forward to the Web-Scale Wednesday Event happening June 25th, 2014 and hence decided to re-visit one of my articles about Web-Scale properties, and explaining the properties with more business focus.

Web-Scale IT is the way datacenters and architectures are designed to incorporate multi-dimensional concepts such as scalability, consistency, tolerance, versioning etc.

Web-scale describes the tendency of modern architectures to grow at (far-)greater-than-linear rates. Systems that claim to be Web-scale are able to handle rapid growth efficiently and not have bottlenecks that require re-architecting at critical moments.

Web-scale architecture and its properties are not something new and have been systematically used by large web companies like Google, Facebook and Amazon for many years. The major difference is that now these technologies that allowed those companies to scale to massive compute environments are being introduced into mainstream enterprises, with purpose-built virtualization properties, and allowing mainstream enterprises to harvest the same benefits.

Enterprise IT has ingrained complexity and silos around networking, storage and compute. This complexity results into unpredictability with scaling current environments. One might ask oneself the following questions – Do I have enough throughput available to add my next storage array? Do I have enough ports left in my fiber channel switch to add my next server?

Enterprise IT should not be about nuts and bolts. It’s easy to get lost in the weeds of technology and forget the greater purpose. It’s not about speeds and feeds; it’s about getting your teams to focus on supporting the business priorities and work together.

Gartner Says By 2017 Web-Scale IT Will Be an Architectural Approach Found Operating in 50 Percent of Global Enterprises

Adding additional infrastructure such as server and storage should be a non-event because IT is about delivering services that matter to the business. Shrink or grow, one node or 20 nodes, it needs to happen at the pace of business.
Today Nutanix OEM server hardware through Super Micro, tomorrow Nutanix could switch or introduce a new hardware platform if the economics, performance and form factor made sense; or even support a software-only model if enterprises were not interested in an end-to-end solution for their datacenters.

  • Everything should be in software, running on standard x86 hardware, with no special purpose machines doing one and one thing only. This is where Web-scale intersects with the SDDC (Software Defined Datacenter) for the 1st time. Zero hardware crutches. Taiwan hardware with pure software-based services. A number of services already take this approach, including SDN (Software Defined Network), Virtual Services and SDS (Software Defined Storage).


The Nutanix platform is a fully distributed system designed to be fault resistant and to eliminate any single points of failure or bottlenecks. The system uses a shared-nothing approach where all components and services are distributed to all nodes within the cluster. Individual components are designed to fail fast to enable quick system recovery.

  • There should be architectural considerations for no single point of failure or bottleneck for management services. Tolerance of failures is key to a stable, scalable distributed system, and ability to function in the presence of failures is crucial for availability. Techniques like vector clocks, two-phase commit, consensus algorithms, leader elections, eventual consistency, multiple replicas, dynamic flow control, rate limiting, exponential back-offs, optimistic replication, automatic failover, hinted-handoffs, data scrubbing, check summing among others all go towards the ability of a distributed system to handle failures.
  • Web-scale systems should provide elastic services with an embarrassingly parallel approach to systems building (http://en.m.wikipedia.org/wiki/Embarrassingly_parallel). Parallel approaches enable non-disruptively approach to traditionally disruptive tasks, such as rolling or forklift upgrades, always-on clusters, and all workflows always online.
  • Web-scale systems should be able to be expanded and continue to function normally as one unit, instead of relying on multiple deployments of functional units that are not scalable units by themselves.
  • Web-scale systems are built from ground up and should expect and tolerate failures while upholding the promised performance and availability guarantees or service level agreements.
  • Strictly and eventually consistent consistency models with clear understanding of the CAP theorem (Consistency, Availability and Partition Tolerance) (http://en.m.wikipedia.org/wiki/CAP_theorem).




Nutanix Tech Note – System Reliability
Scale Out Shared Nothing Architecture Resiliency by Nutanix
Nutanix Disk Self-Healing: Laser Surgery vs The Scalpel


DevOps is a response to the growing awareness that there is disconnect between what is traditionally considered development activity and what is traditionally considered operations activity. For the business, DevOps contributes directly to enabling business agility and IT alignment.

Programmatic interfaces allow administrators to create automation workflows using scripting languages or workflow engines like vCenter Orchestrator, vCAC, Puppet, Chef, BMC and others. However, it is equally important for applications, as it allows them to drive VM and application centric policies around infrastructure requirements.

Enterprise software vendors are already starting to work with Nutanix to integrate their software stack into Nutanix APIs allowing them to drive application and infrastructure requirements such as security, availability, reliability and performance.

  • Web-scale systems should provide programmatic interfaces to allow complete control and automation via HTTP-based services, for intra- and inter-datacenter communication. These APIs must utilize latency and loss-tolerant protocols with avenues for asynchronous request-responses.


Service Level Automation for XenDesktop with Nutanix


It is not uncommon to see enterprise software and infrastructure updates only when support contracts are about to expire or when a new feature is needed. IT organizations are sometimes reluctant to make changes because outages have to be planned and possible manual interventions could be needed throughout the process.

Today Nutanix offer non-disruptive one-click rolling upgrade for all nodes in a cluster, but the native distributed system implementation could also perform non-disruptive 3rd party software and hypervisor upgrades in the near future.

  • Web-scale systems must provide self-defining (and versioned) objects. In the case of SDS, self-defining disk formats with ability to encode and serialize structured data in efficient yet extensible formats, like protobuf, Avro, et al. This way, upgrades of disk data can be done lazily. Web-scale cannot assume a one-shot data upgrade, given the scale.
  • Web-scale systems should have self-describing (and version-aware) services such that different parts of the distributed system can communicate at different versions, without expecting a one-shot upgrade for all components.


When infrastructure services have open API and offer fully functional automation of every aspect it also allow common management and analytics platform. Silos of infrastructure commonly place additional complexity to managing the wealth of performance and analytic data that is generated. Having different hardware, different data centers and different use cases to contend with, it’s all about managing the whole story and seeing problems before they end up on the CIO’s dashboard.

Nutanix has a comprehensive management framework that brings powerful control and simplicity to the management experience. Using a multi-cluster management UI, administrators can get end-to-end visibility, analytics and control of clusters, allowing them to easily deploy, maintain and scale their infrastructure.

  • Analytics software to reduce human interaction. Web-scale infrastructures at large web companies have a 1:10,000 ratio for SRE per machines managed. Enterprises are currently at 1:500 ratios. There is a huge gap that only analytics and automation can fill.


Nutanix PRISM Central Demo Video (multi-datacenter management)
NOS 4.0 Cluster Health – Slices & Dices


Nutanix chose to build data and control planes from the ground up to be a Web-scale distributed system following all the above properties and guidelines. These guidelines not only guarantee resiliency, scalability, consistency, and tolerance to failures, but also ensure a platform to bootstrap future datacenter innovation.


Screen Shot 2014-06-21 at 6.52.10 PM


Hear firsthand from the pioneers and practitioners of web-scale IT and learn why it’s becoming increasingly relevant for the enterprise datacenter during an interactive online event that Nutanix is hosting. The event has an impressive speakers line up, including executives and technologists from Dell, Apigee, Wikibon, Twitter, Datastax, Citrix, The Register and Veeam. It’s this Wednesday, June 25th, 2014. Click HERE link below for more information.


This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net

1 ping

  1. […] Web Scale IT. It’s Here. It’s Now. So is this just marketecture (something that looks good on a marketing slide but can’t be implemented the same in the real world) and another industry buzz word created to sell a solution looking for a problem? It might have been if there were no substance to it, and if there wasn’t a fundamental problem with the way IT is done in enterprises and smaller businesses. It is broken, it is way more complicated than it needs to be, and it’s time that it changed. Why spend so much time feeding and watering IT infrastructure when you could be much more focused on business value creation and your customers. Wasn’t the promise of virtualization and cloud that you would spend less time and money on operations and more time on innovation, customers and value creation? Well the problem is that although we virtualized things (and had less overall to manage), we didn’t change how IT was done, so we ended up with the same problem, and some new ones at the same time. We had less to manage sure, but the proportion focused on maintenance and operations stayed the same (with declining budgets to boot), because the way we architected the systems didn’t change. Traditional converged systems (often just a converged PO) go some way forward, other hyper converged systems go a bit further, but you need to take the principles of web scale before you can really change the game. […]

Leave a Reply

Your email address will not be published.