«

»

Sep 10 2012

VDI Architectures using Storage Class Memory

Advertisement

There are so many disruptive VMware technologies in the VDI space that sometimes is hard to keep up with them all.  I am looking forward to few of them, such as the VMware Distributed Storage and Space-Efficient Sparse Virtual Disks.

At same pace the storage industry is being transformed and now most storage solutions on the market are leveraging the performance of flash memory in it’s many form factors. Some vendors have Solid State Drives, other PCIe card devices, and other vendors utilize the PCIe cards in their scale-out solutions – but at the end of the day they are all using flash memory.

Because of this transformation on the hardware and software stack much of the architectures and methodologies to deploy VMware View and VDI solutions in general are also changing.

The the use of dedicated Solid State Drives to host VMware View Replica disks is not something that I would recommend anymore for a greenfield deployment. Just to remind you that I was once advocate of the Dedicated Replica Datastores and wrote many articles about it (How to offload Write IOs from VDI deployments).

As an example of this paradigm shift is the work that my VMware colleague’s Tristan Todd and Mac Binesh at the Technical Marketing Team, and Daniel Beveridge at the CTO Office have been doing with PCIe Fash devices.

They team used couple Virident FlashMAX II cards (The Virident FlashMAX II in my Lab) to deploy Linked Clone non-persistent desktops using View Composer. The other well-known vendor for these cards is FusionIO, but there are others.

Performance will vary from vendor to vendor, but in general a single Storage Class Memory (SCM) in a PCIe form factor is able to ingest hundreds of thousands of IO requests. These PCIe cards are normally low capacity (more capacity, more $$$) but in this scenario where Linked Clone non-persistent desktops are used, the capacity is not something to worry about.

 

Here is the interesting part of the tests.

Because of this extraordinaire capacity to ingest IO requests the team started to play with the amount of RAM assigned to each virtual desktop in each host. They started to reduce the amount of RAM from an initial 2GB down to 512MB, while maintaining Windows Swapping at the 2GB size.

The expected result is obvious and Windows started to heavily swap to disk when no physical memory is available.

“To optimally size your paging file you should start all the applications you run at the same time, load typical data sets, and then note the commit charge peak (or look at this value after a period of time where you know maximum load was attained). Set the paging file minimum to be that value minus the amount of RAM in your system (if the value is negative, pick a minimum size to permit the kind of crash dump you are configured for). If you want to have some breathing room for potentially large commit demands, set the maximum to double that number.”

What the they found out using VMware View Planner is that they were able to run 250 virtual desktops in a single host assigning only 512MB RAM to each desktop while maintaining disk latency under 1.0ms.

Screen Shot 2012-09-06 at 4.53.00 PM

 

These numbers have a major impact on VDI deployment CAPEX cost. The table above clearly demonstrates that reducing the amount of RAM per desktop also reduces the Host RAM Active memory (duh!).

Ultimately, it also means that it is possible to drive consolidation upwards because less memory is required per host, as long there is enough CPU GHz available to host desktops. I would not be surprised to soon see hosts supporting 500 desktops with approximately 180GB physical RAM only.

What about CBRC or View Storage Accelerator?
I would argue that View Storage Accelerator isn’t of much importance anymore in this architecture. However, if you prefer desktop read IOs from the replica disk being served in microseconds rather  than milliseconds I still recommend the use of CBRC.

 

This is amazing and breakthrough, but there should be caution when designing and deploying such solution.

Fault Domain – In a solution that only relies on local host PCIe devices the fault domain is the entire host. When running 250 or 500 desktops in a single host, what would be the business impact if this host fails? There are few possible solutions, such as having spare hosts ready to take on the workload, but ultimately it’s a business decision.

Non-Persistent Desktops – An architecture that leverage local host PCIe devices does not provide reliability for persistent desktops where users may need the ability to deploy their own applications and persist them across sessions. If the PCIe device panics or dies the user desktop would be lost without a proper image backup. Put here VMware Mirage technology and this may not be a hurdle anymore.

 

The VDI architectural scenario is definitely changing, and changing fast. In this new scenario there is no need for Dedicated Replica Datastores anymore. It’s all Flash; all local to the hosts, no intermediary caching.

Some other impressive numbers are related to the ability to quickly provision and recompose desktops using Storage Class Memory. I personally think these numbers are not very important for most VDI deployments as Administrators normally provision or re-compose desktops over weekends or overnight. Still an impressive number!

 

Screen Shot 2012-09-09 at 1.13.13 PM

 

Finally, if you want to learn more about the architecture tested, the results and other technologies that may provide similar results I would urge you to attend session EUC1190 during VMworld. VMware US is over, but you still have a chance to attend the session at VMworld Barcelona… and in the near future I will probably talk more about how VDI architecture is changing.

This article was first published by Andre Leibovici (@andreleibovici) atmyvirtualcloud.net.

 

Similar Posts:

Permanent link to this article: http://myvirtualcloud.net/?p=3884

5 comments

2 pings

Skip to comment form

  1. John Nicholson

    I’m curious on these designs, but where is the break even point vs. just throwing more RAM at the problem?

    RAM’s hitting something like $10 a GB anyways and a $6k Flash card for 250 users works out to $24 per user (about as much as just buying the RAM anyways). Now If I was using this to exceed densities beyond what I could hit with just stuffing servers with 192GB of RAM this might be a problem, or If I have special concerns like RU/power/Cooling efficiencies, but at that point I’m starting to look at ridiculous failure domain’s and the vMotion’s to clear a host reach the territory where I need IB.

    I understand the value on the Replica’s but CBRC and my SAN cache hovering around 60-80% read hits is clearing up that problem pretty well.

    Maybe the value for these things is for the linked clones, and that I only see if we do some kind of data reduction or something to make the cost per GB for linked clone data more palatable.

  2. Andre Leibovici

    @John Nicholson
    You can throw RAM and not allow heavy swapping to the disks, however you would still need fast disks in each local host that must handle the normal Windows + Application IOPS. This on itself, when increasing consolidation ratio, will require SSD type performance.

    If you need to utilize SSD and they provide the capacity to handle high IOPS consolidation, than why not utilize their full potential without having also to add RAM to each host?

    CBRC will only cater for read IOPS. You say that your VDI is 80% reads, which is contrary to 99% of deployments where 80% of the IO are write IOs. So in your case (I would review recommend you to review your numbers) CBRC removing most of the IO hit, and maybe you don’t need SSD, but as I said, this type of IO behavior is very unusual in VDI environments.

    Andre

  3. John Nicholson

    My Cache hits for reads hover in that range, not my actual mix of reads vs. writes. (my mix is generally in the 75% TO 80% writes, which like you said is pretty standard). I was pointing out that I just see a diminished value in local SSD for the replica (As it is read only, and the two tiers of cache with CBRC and SAN handle its load pretty well).

    Now looking a the economics of linked clones on random writes I can start to see a model where these things work, but given the costs its still a bit fuzzy vs just some type of extended write cache for the SAN. You would have to be running some really lean linked clones to where the IOPS/GB ratio requirements were pretty steep to beat out most shared storage.

    Looking at 1.3IOPS per GB needed for the linked clone pool (going off pretty much the defaults for your calculator, need to pull my own stats again here) its pretty clear that 15k drives get expensive quick to scale properly to meet the write requirement assuming we ignore write cache. For 768 desktops we will need a full 24 drive shelf to hit the 3.2TB usable with the ~2500 of write IOPS I would need. I could achieve the same thing with Local SSD and reduce rackspace/power but honestly cost is going to run reasonably close in terms of hitting the capacity requirement, and I”m going to need to use larger servers to support the 3 x SSD’s I would be using to hit that density, and I’ve lost the efficiencies of space that come with a central pool and thin provisioning more aggressively. Considering Skyhawk, Pure and others SSD vendors are within or cheaper of the cost per GB of internal flash I’m thinking it may be more cost effective to leave linked clones on central SAN’s still, and maybe just use SSD for swapping, as trying to size internal SSD’s have their own headache’s if you make any major changes to the linked clones sizing. My cost estimates are based on the $7 per GB for PCI Flash, and the ~10 for pure, or $4 for skyhawk (then again there’s added cost in fabric infrastructure so its not all free IOPS).

    Sorry for the rambling post here, just like I said trying to build a cost model for linked clones to be on local flash. If they can find a way to do dedupe or compression in vSphere I think it may push things strongly for that, but at least in the meanwhile the SWAP function for a density deployment still looks nice.

  4. VJ

    Andrei,

    Which tools do you use to generate workload and reports to evaluate storage? from last few days I am trying but can`t get the View Planner tool working. It`s not so easy tool to get it running.

    The IO Meter tool is not helpful to calculate IOPS from evaluating NFS storage like NetApp. Because the IOPS data is flowing over the NIC. So the network and storage traffic is combined together.

    Please help on which tools can be used to evaluate storage. Something thats simple to use.
    Can IO blaster help?

    VJ

  5. Andre Leibovici

    VJ,

    If you want to generate a VDI look-alike workload I suggest Login VSI or View Planner. Login VSI is more widely utilized.

    -Andre

  1. myvirtualcloud.net » A VMware View POC with XtremIO

    [...] High IO performance and low latency also allow the amount of memory allocated to each virtual desktop to be reduced, thus  increasing Windows pagefile IO operations on the SAN. This reduces the cost of each physical host, or allows for higher consolidation ratios. I discussed this very same concept in my article VDI Architectures using Storage Class Memory. [...]

  2. myvirtualcloud.net » Why use <Insert Vendor Here> storage for VDI? Part 2

    [...] I also commented about other storage technologies that may change the way VDI is being deployed today. For more information read VDI Architectures using Storage Class Memory. [...]

Leave a Reply