Sep 07 2014

New Nutanix Data-at-Rest Encryption (How it works)

Advertisement

By Andre Leibovici and Anshuman Ratnani

 

In a previous article I published the first part of a multi-month announcement for NOS 4.1 (Nutanix 4.1 Features Overview (Beyond Marketing) – Part 1). As part of the announcement I disclosed the new Data-at-rest encryption feature.

Nutanix clusters are deployed in a variety of customer environments requiring different levels of security, including sensitive/classified environments. These customers typically harden IT products deployed in their datacenters based on very specific guidelines, and are mandated to procure products that have obtained industry standard certifications.

Data-at-rest encryption is one such key criteria that customers use to evaluate a product when procuring IT solutions to meet their project requirements. Nutanix data-at-Rest encryption satisfies regulatory requirements for government agencies, banking, financial, healthcare and other G2000 enterprise customers who consider data security products and solutions.

The data-at-rest encryption feature is being released with NOS 4.1 and allow Nutanix customers to encrypt storage using strong encryption algorithm and only allow access to this data (decrypt) when presented with the correct credentials, and is compliant with regulatory requirements for data at rest encryption.

Nutanix data-at-rest encryption leverages FIPS 140-2 Level-2 validated self-encrypting drives, making it a future proof since it uses open standard protocols KMIP and TCG.

 

“The National Institute of Standards and Technology (NIST) issued the FIPS 140 Publication Series to coordinate the requirements and standards for cryptography modules that include both hardware and software components…. The standard provides four increasing, qualitative levels of security intended to cover a wide range of potential applications and environments. The security requirements cover areas related to the secure design and implementation of a cryptographic module.” – Wikipedia

 

FIPS 140-2 defines four levels of security, simply named “Level 1″ to “Level 4″. It does not specify in detail what level of security is required by any particular application.

Level 1
Security Level 1 provides the lowest level of security. Basic security requirements are specified for a cryptographic module (e.g., at least one Approved algorithm or Approved security function shall be used). No specific physical security mechanisms are required in a Security Level 1 cryptographic module beyond the basic requirement for production-grade components.

Level 2
Security Level 2 improves upon the physical security mechanisms of a Security Level 1 cryptographic module by requiring features that show evidence of tampering, including tamper-evident coatings or seals that must be broken to obtain physical access to the plaintext cryptographic keys and critical security parameters (CSPs) within the module, or pick-resistant locks on covers or doors to protect against unauthorized physical access.

 

As an example, FIPS 140-2 is a requirement to achieve compliance with the HIPAA standard to protect healthcare data. Already mandated by the U.S. Department of Defense (DoD) for encryption, FIPS 140-2 is a powerful security solution that reduces risk without increasing costs.

 

The first supported encryption features within Nutanix are:

  • Instantaneous enable/disable encryption
  • Encrypt data at rest at a cluster-wide level
  • Instantaneous Secure Erase disk(s)
  • Rotate passwords per security policy
  • Ability to enable/disable on-disk encryption with live data
  • Ability to transform the cluster from a secure configuration to non-secure configuration (and vice-versa)
  • Secure-Erase (using Crypto Erase) the specific partition and subsequently use it for storing data from other partitions that are being marked un-secure
  • Instantaneous Secure Erase disk(s) using Crypto Erase

 

 

How does Nutanix Data-at-rest encryption work?

To enable Nutanix Data-at-rest encryption a 3rd party Key Management server is required. At the time of the launch only ESXi is supported and only the SafeNet KeySecure Cryptographic Key Management System is certified, but other key management systems will be supported.

Nutanix supports any KMIP 1.0 compliant key management system, but others have not yet been certified. The key management system can even be a VM running on the Nutanix cluster, and since Nutanix leverage hardware encryption using the self-encrypting drives the performance impact on the cluster is minimal.

 

 

Nutanix clusters do not come with data-at-rest encryption turned on by default and it has to be turned on by the administrator using the PRISM UI or nCLI. The PRISM UI provides a simple and easy way to management Key Management Device details and Certificate Authorities.

Each Nutanix node automatically generates an authentication certificate and adds it to the Key Management Device. At this point the nodes also auto-generate and set PINs on their respective FIPS validated SED drives. The Nutanix controller in each node then adds the PINs to the Key Management Device.

Please note that once the PIN is set on an SED, you need the PIN to unlock the device (lose the PIN, lose data). The PIN can be reset using the SecureErase primitive to ‘unsecure’ the disk/partition, but all existing data is lost in this case. This is important to understand, in case you are moving the drives between clusters or nodes.

The ESXi and NTNX boot partition remain unencrypted – SEDs support encrypting individual disk partitions selectively using the ‘BAND’ feature (a range of blocks).

 

 
Important Deployment Considerations

In this first release it is not possible to mix a Nutanix data-at-rest encryption enabled cluster with a non-encryopted cluster because the platform requires special FIPS 140-2 Level 2 SED drives to meet the data at rest encryption requirements. By breaking the homogeneity of the cluster, one will violate the data at rest encryption requirement for copies of data stored on non-SED drives.  However, both encrypted and non-encrypted clusters can be managed via single-pane of management using PRISM Central.

Data in-flight is NOT encrypted, that means data being transmitted between virtual machines and the Nutanix CVM are not encrypted. Data is only encrypted once they touch the SED drives, either SSD or HDDs. In saying that, the Nutanix Controller VM has been exceptionally hardened and is being put trough a number of security checks, validations and certifications.

The Nutanix Cloud Connect, also introduced in my article Nutanix 4.1 Features Overview (Beyond Marketing) – Part 1 will also support at-rest encryption using Server Side Encryption.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6601

Sep 02 2014

Who runs Horizon 6 more efficiently? Nutanix or VSAN?

One great aspect of working for a cutting edge storage vendor such as Nutanix is that for all of the competitive sniping in the marketplace, the technology really does speak for itself.

A question that was asked of me recently by an individual new to SDS had me thinking about the traits that make solutions different from each other. Naturally, different solutions have distinctive architecture, features and capabilities, but occasionally they are not enough make a clear distinction.

I was posed a question about difference between Nutanix and other SDS vendors. As this person was totally new to the technology I could not talk about the fundamental differences between MapReduce and other metadata management methodologies, or the differences between replication factor based on 4Mb extents versus large block chunks, or the benefits of using commodity x86 hardware versus application-specific integrated circuit (ASIC), or even between hypervisor embedded versus user space.

In this specific case where I really needed to simplify things, I used the metaphor comparing SDS solutions to cars and there are different types of cars, with selected features/ functions and you pick the one that suits you best.

One thing that struck me when I recently came across the new VMware Virtual SAN for Horizon View paper, was the vast differences in requirements between VSAN and Nutanix. These have huge impact from a hardware and budgetary perspective. To assist from a customer perspective, I want to comment on few important differences between the two products.

Before I move forward with this article I would like to state that I consider VMware vSphere the best overall hypervisor on the market today, but that doesn’t mean every feature is the best of breed.

HDD Linked Clones – For Linked Clones Virtual SAN recommends the use of 4 x 15,000 RPM SAS disks. Nutanix runs the exact same workload with better user experience and latencies using significantly cheaper and lower performance SATA drives (4 x 7.2K SATA).

Another option for Nutanix is to limit Linked Clones to the SSD tier while using native VAAI-NAS cloning, thus reducing the amount of IOPs required.

 

HDD Full Clones – For Full Clones Virtual SAN recommends the use of 12 x 900GB 10K SAS drives. Nutanix runs the exact same workload with better user experience and latencies using the same 4 x 7.2K SATA disks.

These benefits not only relate to CAPEX costs, but also OPEX thanks to the reduced power consumption due o the reduced footprint. My colleague Martijn Bosschaart wrote an excellent article demonstrating how Nutanix OPEX operate in regards to power and cooling while comparing to VBlock and FlexPod. I highly recommend the reading it here.

 

HDD Capacity – VMware Virtual SAN paper recommends 1.2TB of RAW capacity for Linked Clones which is somewhat similar to what Nutanix would require since de-duplication and management is done in the virtualization layer via View Composer Linked Clones, but when it comes to full clones the requirement is 10.8TB per node, while Nutanix due to the performance and capacity de-duplication features uses only 1.26TB per node.

 

Network Adapter – The Horizon View RA with VMware Virtual SAN recommends the use of jumbo frames. Jumbo frames decrease the CPU utilization of the network stack, in turn increasing the potential consolidation ratio on hosts in which CPU utilization becomes a bottleneck. There are benefits in using jumbo frames with Nutanix too, but not a requirement.
[Update] Another document from VMware mentions that Jumbo Frames are not a requirement.

 

IOPs – VMware Virtual SAN relies on CBRC to offload read IOPS from the cluster and network when using Linked Clones. CBRC is a 100% host-based RAM-Based caching solution that helps to reduce Read IOs issued to the storage subsystem and thus improves scalability of the storage subsystem while being completely transparent to the guest OS.

While CBRC does offload read IOs it uses a mechanism that burst CPU and IOPs during the data hashing process. This process runs every time a desktop pool is created or recomposed; and VMware recommends administrators to only execute these operations during maintenance periods. Nutanix provides the same CBCR benefit using in-line performance tier de-duplication on SSDs and in-memory, providing similar microsecond latency user experience.

Read More: CBRC-like Functionality For Any VDI Solution with Nutanix

 

Memory – The reference architecture mentions that each server commits 70% of the total available memory, probably for cluster HA purposes. The total amount of memory per host used for 100 desktops is 165GB and CBRC is 2GB. VMware has a 256GB memory requirement, thus Virtual SAN allocates approximately 12.2GB of memory.

To support Linked Clones Nutanix requires 16GB memory, but no additional CBRC (2GB) is required since Nutanix perform block de-duplication.

To enable block de-duplication at the performance tier Nutanix requires 24GB RAM, while enabling MapReduce de-duplication for the capacity tier 32GB RAM is required. Please note that full clone VMs do not require MapReduce de-duplication when cloned using VAAI.

I would have given Virtual SAN a slight advantage on the memory argument over Nutanix if was not for the following sentence in the VMware vSphere documentation center “During prolonged periods of high random write I/O load from multiple virtual machines, the system might experience high memory over commitment. This can make a Virtual SAN cluster unresponsive. Once the load is decreased, the cluster becomes responsive again” (link)

Nutanix also allow administrators to assign additional RAM towards caching for further performance improvements. In the Nutanix reference architecture a total of 32GB was assigned to the Nutanix Controller VMs since vSphere hosts were not overcommitted.

 

Datastore – It’s a documented fact that VMware Virtual SAN release supports a maximum of 2,048 desktops while maintaining the ability to maintain data protection. You can still have 3200 VM’s in a VSAN Cluster, but only 2048 will be protected. Virtual SAN has also a soft limit of 100 virtual desktops per host.

Nutanix imposes no limits on the number of VM’s protected per datastore and that because Nutanix supports multiple data stores per cluster there is no limit to the number of VM’s that can be protected even taking VMware limits into consideration.

 

CPU – Nutanix uses 8 vCPUs to run all features, including data replication, de-duplication, compression, backups, snapshots, replication, date tiering, etc. Virtual SAN is said to use a maximum of 10% of the total host CPU. However, a quick look at the VSAN reference architecture demonstrates that it can easily utilize close to 40% of the available host CPU cycles to deliver the required IOPs. 

[Update 1] I was corrected by Wade Holmes, in the VSAN Reference Architecture each host has a total of 46.5Ghz available, and 2.5GHz is the average used. This is approximately 10% of the total amount of GHz available.

[Update 2] Upon mention by a reader I re-analyzed the results in the reference architecture. According to the reference architecture at peak VSAN uses approximately 7GHz per ESXi server during Heavy workload. VSAN uses 15.1% of total CPU and not 10% as advertised in their reference architecture. That would all be fine if VMware was not advertising that VSAN never uses more than 10% CPU. Read more in my comment below.

Screen Shot 2014-09-02 at 12.28.04 AM

Windows RDS – VMware Horizon 6 enable application remoting via Windows RDS. On Nutanix each Windows RDS server can be cloned in approximately 6 seconds and are natively de-duplicated; as matter of fact Nutanix will only operate metadata avoiding data duplication during the cloning process. For this reason, creating 1 or 100 RDS servers will not impact on performance or capacity.

The Horizon View with VSAN reference architecture does not include RDS application servers, even thou it’s an inherent part of the solution. The lack of data de-duplication services in VSAN penalize this type of deployment with large storage capacity requirements and high SSD caching stage/de-staging operations.

 

So, does Nutanix run Horizon 6 more efficiently than VSAN?

As you observe in this simplistic comparison, SDS solutions are not made equal and they do perform differently under similar conditions and with comparable amount of assigned resources. In the referred reference architecture Nutanix provides better resource utilization with less hardware requirements and additional 10 virtual desktops per host.

Adding all that to the linear scale-out approach, the ease of management, the performance delivered from in-memory and SSD caching, the automated tiering, the data locality and shadow cloning make Nutanix the clear platform leader for Horizon View Linked Clones or Full Clones for persistent or floating desktops.

I’ll leave you with this great pictorial.

One of our Nutanix customers was so enamored with the heat consumption & savings on Nutanix (left) vs competitor rack (right) they decided to take infrared thermal pictures of their datacenters. Looking forward to what’s next, it only gets better for the customer.

nutanix_infrared

Read More: How Going Web-Scale is Opening up New Opportunities

The Horizon View with VSAN Reference Architecture referred in this article can be found here.
The Horizon View with Nutanix Reference Architecture referred in this article can be found here.

I wrote this article from the best of my knowledge and comparing numbers with the published Horizon View w/ VSAN Reference Architecture. If you feel my article or numbers are not correct or do not portrait how either of products work or behave please feel free to advise and I will update the article accordingly.

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6586

Aug 25 2014

Extend Visibility with the Stratusphere APIs

I am a big fan of programatic interfaces and have written extensively about it while working for Nutanix, EMC or VMware. Programmatic interfaces allow administrators to create custom automation workflows using scripting languages or workflow engines like vCenter Orchestrator, vCAC, Puppet, Chef, BMC and others. This is the ultimate goal for a true Software-Defined Datacenter, enabling a powerful application ecosystem that make automated decisions about the use of infrastructure resources in a simple, efficient and easy manner.

When I provided LiquidwareLabs with a space for a sponsored article I could not be happier to learn that they are now fully embracing programatic interfaces within their platform.

-Andre

 

Sponsored Article

Byline: Kevin Cooke, Product Director, Stratusphere Solutions for Liquidware Labs

Multi-platform and heterogeneous virtual environments can be tricky to manage. Responsibility is often divided, and the silos are very much interdependent. For example, different IT groups may own the delivered application or service, the virtualization layer, and the supporting hardware; and lastly, the operations group, who are accountable to meeting SLAs and keeping things running smoothly. Knowing where constraints may be present or staying ahead of the curve can be a difficult process.

Consider desktop virtualization for a moment. You have the hosts and storage infrastructure, which may be managed by your server and storage team. There’s the hypervisor and allocation of virtual resources, possibly managed by a dedicated virtualization team. Let’s not forget the network team, who provide the pipes, supporting directory services and remote connectivity. Out on the fringe are the folks that manage data center operations, and of course, there are the magicians who provide the desktop VMs, application delivery and image management. And we certainly cannot forget the business units and end users, who expect nothing less than user experience nirvana.

Complicating matters somewhat, each of the above groups likely has adopted their own tools and workflow to best support each area of purview. And while the overall approach itself is not problematic, it does open cracks between the groups that can pose problems when users complain about poor experience.

“Is it the image?” “The hardware?” “The network … who knows?”

“Storage is showing some performance challenges, but is the issue really due to the data store or is it a lack of IOPs?”

Pressure is mounting. Management is requiring a status report, the storage team is under fire and no one has thought to look at the rogue application processes wreaking havoc on vCPU queueing and overflowing vRAM paging.

Of course, having everyone use the same monitoring solution can help to avoid some of the challenges in this scenario, but the reality is most of the discrete IT teams in the thick of this mess already have an existing tool and workflow to support their troubleshooting approach. So how do you preserve these existing methodologies and approaches, while providing information to the most relevant IT teams in a way that supports their approach and choice of tool?

Enter the Stratusphere API and Shared Visibility

Stratusphere UX from Liquidware Labs, is a solution that supports the functions of monitoring, performance validation and diagnosing the complexities highlighted in the very common real-world example outlined above. The solution—which includes the Stratusphere Hub, Database, Network Station and Connector ID Key—gathers extremely detailed metrics and information about all aspects of the virtual desktop user experience.

In a 1,000 or so desktop environment, Stratusphere UX will gather a few million data points per hour. Details about the machines, users, applications, network (pipes and services), as well as the contributing infrastructure will be captured and stored in the Stratusphere Database. This information can be correlated to constraining events and tied to a composite metric that quantifies the user experience. Better still, this information is exposed and easily accessible through the Stratusphere application programming interface (API).

In its more basic use, the Stratusphere API can be used to generate ad-hoc reporting and to support the creation of custom-defined tables that can be exported in HTML, CSV and native Microsoft Excel formats. But the real silo-busting power of the Stratusphere API lies in its ability to export information in JavaScript Object Notation (JSON) format, an open standard text format that is human-readable and easily transmitted between server and web-based applications.

In the above VDI downtime event—and with the visibility provided via the Stratusphere API—relevant metrics and information about the performance, user experience and state of other areas of the architecture can be flowed from Stratusphere UX to the native tools each IT group has previously chosen to gain visibility of their little corner of the overall architecture. For example, the JSON-outputted metrics and information can be sent to HP Operations Manager to assist the server, storage and networking groups, while BMC Remedy receives a helpdesk ticket to alert operations of the groups, users and applications affected.

The power in this approach is preserving workflow. IT groups are able to work within their existing monitoring tools and can leverage proven methods to troubleshoot and determine how their component of the architecture may be contributing to the poor performance. Further, the use of the Stratusphere API greatly facilitates inter-group trouble ticketing as a defined composite metric like the VDI User Experience can be leveraged to baseline and correlate activities across functional IT groups.

Stratusphere UX was designed to provide visibility into complex multi-platform and heterogeneous virtual environments. Its user-centric approach to monitoring, performance validation and diagnostic capabilities take much of the complexity out of next-generation desktop workspaces. And with the Stratusphere API, IT groups are able to support these virtual environments in a way that meets business goals, minimizes risk and supports both the organizational and IT changes ahead.

 

http://www.liquidwarelabs.com

 

This article was first published by Andre Leibovici (@andreleibovici) at myvirtualcloud.net.

Permanent link to this article: http://myvirtualcloud.net/?p=6568

Older posts «

» Newer posts