It seems that spreading FUD (fear, uncertainty, and doubt) in the storage industry is in vogue once again and I also want to take the opportunity to completely dismiss some of the FUD that storage solution vendors are trying to spread into the software-defined-storage market, and in particular about Nutanix.
Nutanix being the major disruptor and market leader in the mainstream enterprise hyper-converged and web-scale computing technology makes competitors starting to spread ‘misguided’ information about our technology.
I respect my readers and for this reason I will not create a battle card with X and Y vendors; that’s just not my style. Rather, I am going to educate on what is real and what is not real in regards to converged and web-scale infrastructures, and Nutanix.
FUD 1: Web-Scale is not for Common Enterprises.
Web-Scale is more than a far-fetched concept. Web-scale architecture and properties are not something new and have been systematically used by web companies like Google, Facebook and Amazon. The major difference is that now these same technologies that allowed those companies to scale to massive compute environments are being introduced into mainstream enterprises, with purpose-built virtualization properties. The cost efficiencies of web scale public clouds are coming to the enterprises at any scale.
Your organization may not be Google or Facebook, but your enterprise also struggles to maintain business SLA requirements, and at the same time has to be agile, resilient and attend new business demands. There is nothing new here, but what is different is how systems and architectures are built for resiliency, serviceability and scalability.
In a web-scale system there should be architectural considerations for no single point of failure or bottleneck for management services.
There should be considerations to allow datacenter architectures to be expanded and continue to function normally as one unit, instead of relying on multiple deployments of functional units that are not scalable units by themselves.
Systems should be built from ground up to expect and tolerate failures while upholding the promised performance and availability guarantees or service level agreements
Those are all key elements that every enterprise should embrace when acquiring or procuring new systems or services. Today’s businesses cannot afford to deal with traditionally disruptive tasks, such as rolling or forklift upgrades. Systems and workflows must be always online and available.
Simply put, that is what Web-Scale stands for. There’s no fairy dust, it’s just software systems that have been purposely built to augment business SLAs. So, to say that web-scale IT is a non-sense, is just calling customers dumb.
[Update] The link below will take you to a blog post I published a little while ago where I explain in more detail what are the properties that make Nutanix a web-scale platform. http://myvirtualcloud.net/?p=6030
FUD 2: Hypervisor-based storage is better
I will try to make this topic as succinct as possible since this could be a blog post on its own. I have heard from few people mentioning that a hypervisor-based approach for software-defined storage is better because of additional performance, kernel integration and simpler data path. Let’s me take each individually.
Since VMware’s Virtual SAN is the only in-hypervisor solution on the market at this point in time I will use VSAN as example, but the same explanation would apply to any solution built or baked into the hypervisor kernel.
If we go back in time you will recall that VMware moved from ESX to ESXi. In an official note published by VMware the following has been stated “The functionality and performance of VMware ESX and ESXi are the same; the difference between the two hypervisors resides in their packaging architecture and operational management. VMware ESXi is the latest hypervisor architecture from VMware. It has an ultra thin footprint with no reliance on a general-purpose OS, setting a new bar for security and reliability.”
At the time many people criticized the move, but it was the right thing to do, particularly because of the reduced footprint and reduced attack surface. Since then, VMware started to create virtual appliances for management (vSHIELD Endpoint, VMware VSA, vSphere Management Assistant (vMA)).
Having the storage or network stack independent of the hypervisor reduces the chances to malicious attacks given its smaller code footprint.
Another important point is about the innovation and release cycles. It’s great to have the latest and greatest software features and updates; I generally update my iPhone as soon as a new patch or version is released, but if the implementation is heavily delayed because the of release cycle of the hypervisor it starts not to make sense to have it built-in. In the Nutanix case, NOS 4.0 was just announced, only eight months after NOS 3.5. The next release should come even before the end of the year, giving customers access to the latest and greatest technologies that help business to be more agile and efficient.
Finally, I’ll give that having SDS integrated into the hypervisor provide the hypervisor vendor with the ability to execute some smart hand-off, scheduling and integrations. However, if the hypervisor doesn’t provide the proper storage API’s for other vendors you should be warned you are going into a full stack lock-in since you will not be able to utilize other hypervisors and solutions.
But really if the hypervisor wasn’t any good at doing I/O and VM’s on the hypervisor are somehow slow, wouldn’t that mean the hypervisor is not suitable for business critical applications? They need lots of low latency I/O’s. We all know this is not the case and that vSphere is a great hypervisor for business critical applications because its storage I/O is very efficient. In fact, most hypervisors today do a fairly good job when it comes to the storage stack efficiency. The fact is the data path between a VM through the VMKernel to storage is very fast, in the small 10’s of microseconds, and this is for standard storage access, if you use DirectPath/IO to bypass the normal VMKernel storage path and provide direct access, it’s even faster.
FUD 3: VSA IO Data Path is Convoluted
I also heard other vendors and bloggers incorrectly saying that Nutanix extends the I/O path, forcing I/O to go through the hypervisor twice; being the first when it leaves the guest VM and the second when it leaves the Nutanix controller VM.
While it may be true, it’s also incorrect to say that VM traffic goes via the hypervisor to access disk devices, because Nutanix uses the VMware VMDirectPath I/O pass-through to allow direct access to disk controller and all of it’s devices, reducing I/O transposition. This also allows the native controller drivers to be used, avoiding any bugs that may be introduced by non-standard or custom written drivers for the hypervisor.
You will recall that VMware announced the 1 Million IOPs on a single VM during VMworld 2011 (here is the article). Since the guest VMs use the network stack in a private vSwicth configuration to communicate to the Nutanix Controller, the Nutanix controller has over 1 million IOPs a it’s disposition. While a fantastic number, I doubt that any real application running in a single host is able to drive that much IOPS and throughput. Therefore, saying that data path is convoluted is pure FUD as it doesn’t impact the performance.
The picture above demonstrates how simple the data path really is. However, if you want to learn more about it I strongly recommend you reading The Nutanix Bible by my colleague Steven Poitras. The Nutanix Bible is a clear statement to the market and community about how transparent Nutanix is about its technology.
FUD 4: Nutanix can only have a maximum of 2TB minus 512b VMDKs.
Nutanix supports 62TB VMDKs, Nutanix is NOT limited to 2TB minus 512b as claimed by some competitors trying to spread FUD.
FUD 5: Data Locality is not important since 10-gigabit Ethernet is faster than SSD
Nutanix utilizes a shared-nothing distributed architecture and it ensures data is always replicated across Flash, HDD, Nodes and Blocks for high availability. A VM may access data from anywhere on a Nutanix cluster, but Nutanix controllers (CVM) will always ensure that overtime active data blocks that belong to a VM are present in the host where the VM is currently running. This process is transparent and occurs in the background using free CPU cycles.
Data Locality is a key performance enabler for any workload, always ensuring important VM data is always as close as possible to memory and CPU, avoiding multiple hops and eliminating dependency on Storage Area Network.
The competitors started to say that 10-gigabit Ethernet is faster than SSD reads and that data locality isn’t important in storage distributed architecture designs. Having data locally with any system prevents the network from becoming a point of congestion. If you’ve got everything running over the network and it’s getting congested then it is going to impact your performance. While it may be true that 10-gigabit Ethernet is faster than SSD reads, they forget that in order to transfer data over the 10-gigabit Ethernet the source host has to read it from a disk as well. So it is actually the transfer time + the read time.
Simon Ritter, Head of Java Evangelism at Oracle Corporation, published the following results regarding accessing data in different medias that pretty much resembles Nutanix experiences with data locality.
Moving forward, when Nutanix start shipping appliances with NVMe PCIe SSDs that can drive 3000 Mbps (that’s 24-30Gbits) of sequential reads you will fully saturate a 10-gigabit Ethernet link before you can fully utilize the NVMe PCIe SSD.
FUD 6: Nutanix is less than 3 years old with just over few customers
This is one of those FUD that come from competitive decks where whoever put the deck together has no idea what they are writing. Nutanix is 4 years old and has been shipping the product for over 2 years. As of today Nutanix has hundreds (closer to 1000) customers worldwide with multiple thousands appliances and nodes shipped. Nutanix customers host tier 1 applications, BigData, VDI, Test/Dev, DMZ, Branch Offices and other workloads in small and very large clusters, which some go over hundreds and hundreds of nodes.
As I mentioned in my introduction I am not going to create a competitive scenario here to demonstrate why Nutanix is better than every other software-defined-storage solution on the market from a resiliency, scalability, serviceability and features stand points. Nutanix has created this market and is the incontestable leader. I could have gone with this FUD list for much longer, but I think these items were the ones bothering me the most.
All my other blog posts have been about the Nutanix features and goodies and how they work – being transparent about Nutanix technology is very important to us.
Moreover, the customer satisfaction with the overall Nutanix experience is just extraordinary, with an incredibly high Net Promoter Score (NPS) of +73 (on a scale of -100 to +100). This is particularly satisfying when you consider the average technology company scores in the 20s. (Read more here). I actually would encourage you to ask other vendor’s for their NPS.
Thanks to Michael Webster for revising and contributing to this article.
Disclaimer: Any views or opinions expressed in this article are my own, not my employer’s. The content published here is not read, reviewed, or approved by Nutanix and does not necessarily represent or reflect the views or opinions of Nutanix or any of its divisions, subsidiaries, or business partners.
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.