This is a 4 part blog series.
- Nutanix 5.0 Features Overview (Beyond Marketing) – Part 1
- Nutanix 5.0 Features Overview (Beyond Marketing) – Part 2
- Nutanix 5.0 Features Overview (Beyond Marketing) – Part 3
- Nutanix 5.0 Features Overview (Beyond Marketing) – Part 4
Disclaimer: Any future product or roadmap information is intended to outline general product directions, and is not a commitment, promise or legal obligation for Nutanix to deliver any material, code, or functionality. This information should not be used when making a purchasing decision. Further, note that Nutanix has not determined whether separate fees will be charged for any future product enhancements or functionality which may ultimately be made available, and may choose to charge separate fees for the delivery of any product enhancements or functionality which are ultimately made available.
For official information on features and timeframe refer to the official Nutanix Press Release (here).
These are the features introduced with this blog post series:
- Cisco UCS B-Series Blade Servers Support
- Acropolis Affinity and Anti-affinity
- Acropolis Dynamic Scheduling (DRS++)
- REST API 2.0 and 3.0
- Support for XenServer TechPreview
- Network Visualization
- What-if analysis for New workloads and Allocation-based forecasting
- Native Self-Service Portal
- Snapshots – Self Service Restore UI
- Network Partner Integration Framework
- Metro Availability Witness
- VM Flash Mode Improvements
- Acropolis File Services GA (ESXi and AHV)
- Acropolis Block Services (CHAP authentication)
- Oracle VM and Oracle Linux Certified for AHV
- SAP Netweaver stack Certified for AHV
- Prism Search Improvements (support for Boolean expressions)
- I/O Metrics Visualization
- 1-Click Licensing
- LCM – Lifecycle Manager
- Additional Prism Improvements
- AHV Scale Improvements
- AHV CPU and Memory Hot Add (Tech Preview)
- Advanced Compression for Cold Data
- Acropolis Change Block Tracking (CBT) for Backup Vendors
- Predictable Performance with Autonomic QoS
- (New) NCC 3.0 with Prism Integration
- (New) 1-Node Replication Target
- (New) Improved Mixed Workload Support with QoS
- (New) Simplified SATADOM Replacement Workflow
- (New) Mixed Node Support with Adaptive Replica Selection
- (New) Dynamically Decreased Erasure Coding Stripes – Node Removals
- (New) Multi Metadata Disk Support for use available SSDs on the node for metadata
- (New) Erasure Coding(EC) support for changing the Replication Factor (RF) on containers
- (New) Inline Compression for OpLog
Now that we have legal disclaimer out-of-the-way… let’s get into it!
NCC 3.0 with Prism Integration
Historically just about every interaction with NCChas required command line access on a CVM. This was a frustration for system administrators who are not CLI savvy or customers who prefer the GUI. As of NCC 3.0 in AOS 5.0, NCC is fully integrated with PRISM and many improvements have been added.
- NCC now takes ~5mins to run
- Many improvements to existing checks
- Bug fixes and more robust NCC infrastructure.
- New plugins (~15+ plugins in 2.3 + 3.0)
- XenServer support
Many aspects of NCC are now functional via PRISM
- 300+ NCC checks can now be managed through PRISM
- Alert associated with every check
- Checks can be manually executed from the GUI, and results can be downloaded
- Log collector can also be triggered from the GUI
Distributed Storage Fabric (DSF)
1-Node Replication Target
SMB customers need a cost-effective replication solution for branch offices. AOS 5.0 allows a single Nutanix node (NX-1155, 1N2U, 2xSSD + 10xHD) to be used as a fully on-boarded replication target for Nutanix clusters. This is a single node AHV cluster with FT-1 that doesn’t run VMs, but works integrated with Nutanix cluster replication sources using any of the supported hypervisors.
Improved Mixed Workload Support with QoS
This is one of those deep internal improvements that enormously affect the system’s ability and performance when running multiple diverse applications with different workload profiles on a single Nutanix node. AOS 5.0 separates Read and Write I/O Queues; ensuring write-intensive workloads (or write bursts) will not starve out read operations, and vice-versa. This is achieved through the replacement of the admission controller and OpLog queues with a single weighted fair queue, priority propagation and disk queue optimizations.
I won’t bother you with the details because it starts to get technical very quickly and it probably needs a dedicated article; but this new feature ensures that I/O priorities are maintained through the entire IO path augmenting performance and I/O reliability when the system is under stress.
Simplified SATADOM Replacement Workflow
The host boot disk (SATADOM) replacement involves an elaborated manual procedure that is lengthy and needs to be performed by a Nutanix system engineer. AOS 5.0 automates and simplifies the workflow allowing the system admin to drive it from within PRISM to give a one-click (nearly) experience.
Mixed Node Support with Adaptive Replica Selection
Another important feature that augment cluster balance and performance. AOS 5.0 does smart placement of data copies based on drive capacity and performance utilization providing always consistent performance levels with optimum resource utilization even with heterogeneous nodes in a cluster. E.g. Regular Node + Storage-heavy Nodes, or NX1000 + NX3000 Nodes.
The smart placement utilizes disk usage and performance stats for each disk in the cluster to create a Disk Fitness Stats. This fitness value is a function of disk fullness percentage and the disk queue length (number of operations in flight for that disk). The disk for the data write is then selected via weighted random lottery to prevent herding behaviors.
Dynamically Decreased Erasure Coding Stripes – Node Removals
Erasure Coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media. Each Nutanix container has a defined replication factor (RF) for data resiliency and availability, either RF2 or RF3. Learn more about EC-X here.
Prior to AOS 5.0 If a cluster had EC container(s), removing a node was somewhat restrictive because the EC strip would be distributed across the cluster – at least 7 nodes if highest EC container RF is 2, and at least 9 nodes if highest EC container RF is 3. The solution was to turn off EC for the containers, but it took a long time to convert to non-EC bytes, and the cluster must have had enough free space.
AOS 5.0 now maintains EC protection even with node removals, keeping protection overhead limited. It does that dynamically decreasing the EC strip size once nodes are removed from a cluster, and dynamically increasing the EC strip size once new nodes are added to the cluster.
Multi Metadata Disk Support for use available SSDs on the node for metadata
AOS 5.0 now automatically distribute metadata across available SSDs in a node (maximum of four). The automated distribution of metadata across SSDs help to accommodate the Read/Write pressure during peak events as the metadata disk is was also used by other system components. Distributing the read/write load improve IOPS and reduce latencies, therefore removing the single-SSD bottleneck. Another benefit of distributing the metadata writes is the uniform wear across SSD media devices.
Erasure Coding (EC) support for changing the Replication Factor (RF) on containers
Erasure coding (EC) is a method of data protection in which data is broken into fragments, expanded and encoded with redundant data pieces and stored across a set of different locations or storage media. Each Nutanix container has a defined replication factor (RF) for data resiliency and availability, either RF2 or RF3. Learn more about EC-X here.
With AOS 5.0 EC-X provides the ability to modify the Replication Factor (RF) for Containers that have Erasure Coding enabled, providing greater flexibility for customers to achieve their desired level of data protection during the application life cycle. EC enabled containers now can go from RF3 to RF2 or vice-versa, and the EC encoding automatically changes to match that.
Inline Compression for OpLog
In AOS 5.0 Random Writes get automatically compressed Inline before hitting the OpLog. The OpLog is like a filesystem journal and is built as a staging area to handle bursts of random writes, coalesce them, and then sequentially drain the data to the extent store.
With dynamic compression the Nutanix cluster gain improved space utilization and improved burst handling for sustained random writes for the OpLog space. The OpLog space now can also absorb sustained random write bursts for a longer duration.
That’s it… AOS 5.0 is a huge release with major improvements in the areas of performance, reliability, availability, supportability and user experience. Several other smaller features are also part of the release, but they are not meaningful enough to be featured in this blog series.
I would like to acknowledge the huge effort from our PM, R&D, QA, Release Management and Support teams shipping such a ‘fantastic’ product release, and also for their continuous innovation efforts to bring to customers and partners by far the best HCI product on the market today, without any doubts in my opinion. A big thank you!
Now you must be asking yourself when you can 1-click upgrade your clusters to AOS 5.0. While I don’t control the Release Management train and I also cannot disclose the exact due date, I can say that it will be soon. So, stay tuned!!
NEWS UPDATE – A Part 4 post was added to the series – Nutanix 5.0 Features Overview (Beyond Marketing) – Part 4
This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net