This is one of those topics that often create heated debates between defendants from both sides of the industry. I work for Nutanix, a hyper-converged platform; therefore you might say I am biased. That’s fine, but I tell you that I will try to stick to the value of each platform and not only to the questions about performance. There’s far more to a platform than performance, but I will also not shy away from answering the question with a simple and straightforward answer.
– Yes, hyper-converged is faster than SAN.
The first part of my answer is in form of a question. What is the performance measurement and how is it being compared? Is it IO per second or is it latency? or is it a combination of both? What is the IO and data size being handled and what is the application workload? What is the mix of IO reads versus writes, and what type of redundancy and protection is being implemented?
It is impossible to state hyper-converged is faster than a SAN without first defining what the comparison metrics are. IOPS and infrastructure don’t matter, applications matter, and application response times to end users is what matters most. Being able to consistently provide acceptable performance as application load increases is far more important.
If the comparison is not defined then we can just say ‘yes’, hyper-converged is faster, because it is possible to engineer a platform to deliver whatever performance is required for scaled-out workloads, millions and millions of IOPs. Where hyper-converged may not be faster than SAN is where a single workload or virtual machine require hundreds of thousands of IOPs; far more than what a single hyper-converged unit fully loaded with RAM and SSD is able to deliver, or where latency requirements are far lower than what a single hyper-converged unit is able to deliver.
So if there is an application workload that isn’t a good fit for hyper-convergence, then SAN will be faster. There aren’t many real life application workloads running in a single host that are able to drive that much performance.
Normally these applications types are able to scale-out themselves, such as Hadoop, Exchange, OLTP databases etc. For extreme cases it is conceivable to design hyper-converged All-Flash nodes that would be able to handle the large majority of workloads and still be part of massive scale-out converged clusters.
With this in mind it is plausible to say that SANs are scale-up solutions and built for a handful, maybe hundreds of physical and virtual applications while a hyper-converged scale-out platform is built for thousands of virtual applications, yet delivering consistent performance at low latencies.
Most individual application workloads need a few hundred to a few thousand IOPs and hyper-convergence can easily deliver that at very low latencies. The difference between SAN and hyper-convergence then becomes about if infrastructures can continue to deliver acceptable application service levels and scale linearly and indefinitely. A hyper-converged web-scale platform can do that, whereas a SAN cannot. A SAN would need to be replaced when the datacenter exceeds the maximum performance or capacity configuration.
It’s important to remember that behind the shiny lights, colorful plastic, and lion’s share of the IT budget a SAN is just a bunch of disks attached to a server behind a network. Anybody remember buying a storage array and SAN prior to VMware? VMware amplified the need for SAN with data services such as vMotion and storage vMotion.
Hyper-converged solutions like Nutanix, vSAN and others can pack 4 SANs into a 2U appliance. If a single SAN shared by every host is fast, then a SAN for every host is even faster.
So, if the question is can hyper-convergence run more application workloads and get more productive work done with the same service levels as application load increases, then ‘yes’, hyper-convergence is much faster than SAN.
Is hyper-converged faster than SAN? Probably, but that’s the least of people’s concerns in modern datacenters. Whether hyper-convergence is x microseconds faster or not really doesn’t address the pain aspects caused by legacy 3-tier architectures (management, scalability etc.).
One of my colleagues met with the application team at a customer that was sharing some of their frustrations with their current SAN architecture. After brief introductions and overview of the existing platform the principal architect said that sometimes his database queries take 30 seconds to complete and other times it takes 5 minutes. He doesn’t particularly care if the queries take 5 minutes each time they’re run, but the response time has to be consistent. When he doesn’t have consistency it becomes impossible to troubleshoot application issues.
At the end of the day there are other aspects that make hyper-converged architectures compelling regardless on how ‘fast’ they are. Simplicity to manage, incremental scaling with linear performance, easy upgrades when new features and improvements appear, etc.
With SAN there is a honeymoon period. When you first provision LUNs from a new, nearly empty array, everything is great. Predictable performance. Little effort. A few dozen LUN allocations later, the reality of being a SAN admin sets in. The life of a SAN admin is one of constant data movement from one array to another to maintain acceptable user experiences.
Why do I care if the new array is faster than the old array, if it didn’t make my life easier when it came to managing the data it served to my users?
If hyper-convergence can provide predictable performance a large single namespace that can scale and drastically reduce the constant data movement endemic projects in SAN environments, then it’s the clear choice to run the infrastructure even if SAN performance is marginally better in some use-cases. Large SAN’s are good at solving problems for a small percentage of applications that cannot benefit from hyper-converged architectures and where large scale-up performance is required.
Storage performance should not just focus on situations where things are going well, but also when things break, or get overloaded. When a disk fails you want the rebuild to be fast and the performance to be predictable. This unlikely is going to be the case on many traditional SAN’s. However in a hyper-converged web-scale environment the loss of even a 4TB SATA disk would result in minimum rebuild time because of the distributed nature of the recovery process.
Many SANs also have a global cache that is vulnerable to overload conditions. If you overload the cache then suddenly you find that the array bypasses it altogether and writes directly to the underlying disks. This causes a significant impact to performance, sometimes to the point of appearing to freeze the array or applications. This could be the result of a noisy neighbor, or just an overloaded configuration. The distributed nature of a hyper-converged solution avoid this problem altogether, provided it leverages data localization. An impact to a virtual machine or one host will not impact any other hosts or virtual machines. The cache is distributed so each host has it’s own independent cache. This help to prevent one virtual machine from monopolizing resources of all the other virtual machines at an architectural level and makes the system much more resilient and predictable under load.
Having said all that it is important to notice that hyper-converged solutions are not all built the same. You should do a proper analysis to what solution best suits your organization and your requirements.
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net