«

»

Sep 19 2011

The Right Hardware for a 10K VDI solution

I have been involved in many VDI designs over the past years, and recently I have been involved in a design and architecture for a 10,000 user VDI solution. I thought it would be interesting to share here the considerations and decisions that goes behind hardware definition for such a large project.

So, first let’s review the requirements, constraints and assumptions that must be followed during the design.

 

Requirements:

  • 10,000 VMs powered-on with average CPU of 200MHz
  • 15% CPU overhead for eventual spikes
  • Cisco UCS Blade Servers
  • Windows 7 Enterprise 32bit VMs with 2GB RAM
  • High Availability is a requirement
  • VMware View Linked Clones
  • vSphere 5 must be used

Constraints:

  • Floor space in the datacenter is scarce

Assumptions:

  • Storage array and networking gear are already in place
  • Transparent Page Sharing ratio is 20%
  • 1GB Hypervisor Memory Overhead (ESXi)

Considerations

  • TPS ratio – Because of Windows7 ASLR (Address Space Layout Randomization) the TPS ratio (Transparent Page Sharing) is reduced to about 10-20% (it was about 30-40% with Windows XP).
  • Eight hosts per vSphere Cluster  is the maximum supported with View Composer
  • Cisco UCS unified computing is the platform of choice for this design because it provides a simplified converged architecture, yet allowing for automated and centralized management. However, the methodology demonstrated below is valid for any hardware vendor and model. For more about UCS… refer to UCS and UCSM Basics.

 

Because of scarce floor space the solution requires high consolidation ratio. That means, many VMs per core, many cores per socket, and many sockets per server.

The maximum number of VMs/core supported by VMware View is 16. The target will be 12 VMs/core as it automatically guarantees one host cluster for failover (N+1). Using 12 VMs/core over 8 hosts is the same thing as using 13.72 VMs/core over 7 hosts and leaving 1 spare host for failover.

Example:

  • 8 hosts x 96 (12 VMs x 8 cores) = 768
  • 7 hosts x 128 (13.72 VMs x 8 cores) = 768

Therefore, when using a host with 8 cores the total number of VMs per host under normal operating  conditions is 96. In a HA event the number of VMs per host will be 110 (109.76). In this design I will not use 8 core/hosts, but the formula is still valid.

 

Step 1 – Hosted VMs

Cisco UCS B440 M2 High blade server seemed like a good choice (see table below) with 4 sockets and Intel Xeon E7-4800 with up to 10 cores per processor (number of cores from Intel Website). Two options are available with this blade server – 20 cores with 2 sockets, or 40 cores with 4 sockets.

 

Scenario 1 with 40 cores on Cisco UCS B440 M2 High:

  • Hosts: 21
  • CPU Speed Required: 2.7GHz
  • RAM: 856 GB
  • VM/host Ratio: 480
  • Number of Chassis: 6 (full width blade) *

Scenario 1 with 20 cores on Cisco UCS B440 M2 High:

  • Hosts: 42
  • CPU Speed Required: 2.7GHz
  • RAM: 432 GB
  • VM/host Ratio: 240
  • Number of Chassis: 11 (full width blade) *

*The Cisco UCS 5100 Series Chassis support half-width and full-width blade servers. When populated with half-width blades the chassis fit 8 blades; when populated with full-width blades the chassis fit 4 blades.

 

Looking at scenario number 1 we immediately see that Cisco UCS B440 M2 High with maximum of 512GB (see table below) does not support the amount of RAM required. Scenario 2, has no issues with the amount of RAM, however two things do not look right to me:

  1. CPU clock required is very close to 2.8GHz Max. Turbo frequency (refer to Intel Website)
  2. Number blades and Chassis required is too high, requiring 3 racks (max. 4 chassis per rack)

image

Source: Cisco Website as of 9/16/2011

 

The second blade I would like to try is the  Cisco UCS B250 M2 Extended Memory Blade server with 2 sockets, up to 384GB RAM and is Half width. The Intel CPU is the 5600 Series and support up to 6 cores. This blade server with 2 sockets and 6 cores would allow 144 VMs per hosts (I’m still trying the 12 VMs/Core). The Intel 5600 Series seems not to be an issue with it’s Max. Frequency at 3.6GHz.

 

Scenario with 12cores on Cisco UCS B250 M2 Extended Memory:

  • Hosts: 70
  • CPU Speed Required: 2.7GHz
  • RAM: 256 GB
  • VM/host Ratio: 144
  • Number of Chassis:9 (Half width blade)

The number of VM per host allow for a very good consolidation ratio, however two things do not look right to me:

  1. The number of blade server is too high, requiring additional power and cooling.
  2. Number of Chassis required is too high, requiring 3 racks (max. 4 chassis per rack)

 

Let’s look at another option yet. The Cisco UCS B230 M2 has only two sockets but is Half-width, requiring less Cisco UCS Chassis. The maximum number of cores supported by the Intel E7-2800 is 10, therefore the blade have 20 Cores available.

 

Scenario with 20 cores on Cisco UCS B230 M2:

  • Hosts: 42
  • CPU Speed Required: 2.7GHz
  • RAM: 432 GB
  • VM/host Ratio: 240
  • Number of Chassis: 6 (Half width blade)

Looking at scenario above immediately see that Cisco UCS B230 M2 will not have problems supporting the amount of RAM required. Also, because it’s a Half width blade it’s possible to fit 8 blades per Chassis, reducing the amount of Chassis to 6. However, one things does not look right to me:

  1. The E7-2800 CPU has a Max. Frequency is 2.9GHz only with Turbo frequency enabled(refer to Intel Website). It’s not recommended to run Turbo Frequencies constantly during production time. Turbo also requires additional power and cooling.

A option to reduce CPU MHz consumption is to lower the number of VMs/Core. In doing that we also lower the consolidation ratio, therefore requiring more blade servers and chassis. In this specific scenario, when I lower VM/Core ratio from 12 to 10 the number of hosts increase to from 42 to 50, requiring 7 Chassis instead of 6.

 

I’ll try once again the same Cisco UCS B230 M2 Extended Memory, however reducing the number of VMs/core to 11. Reducing VMs/Core will reduce CPU clock required, RAM and Blades.

 

Scenario with 20 cores on Cisco UCS B230 M2 Extended Memory:

  • Hosts: 46
  • CPU Speed Required: 2.5GHz
  • RAM: 392GB
  • VM/host Ratio: 220
  • Number of Chassis: 6 (Half width blade)

Looking at the scenario above we can see that the numbers are looking better.

  1. Number of hosts is not too high and the consolidation ratio is at 220 VMs/host. The benefits are the reduced number of chassis required, less power, and less cooling.
  2. The Intel E7-2800 normal Max. frequency is 2.4 GHz, a little bit under the required. However, it will only max out Turbo Frequency at 2.8GHz. In this scenario the solution will not be running with Turbo constantly, but will be above regular max. frequency. Remember that we have added 15% overhead as part of the requirement. (refer to Intel Website)
  3. The maximum RAM supported is 384GB, witch is a little bit under the 392GB required. If the hosts runs out of memory it may have to swap to disk. However, before swapping to disk vSphere will try to reallocate VMs using DRS and also run memory compression.

The Cisco UCS B230 M2 Extended Memory would be my pick in this specific scenario with 10,000 VMs. My selection does not mean this is the right solution for your organization given that hardware definition is totally based on requirements, constraints and assumptions.

 

image

 

Step 2 – Infrastructure

Along with all 10,000 virtual desktops there is need to size hardware to support the software infrastructure supporting the solution. For VMware View those would be Connection Servers, Security Servers, Transfer Servers, vCenter Servers and any additional 3rd party software. For Citrix XenDesktop those may be Web Interfaces, Desktop Delivery Controllers, PVS Servers and XenApp Servers.

The infrastructure yet could include Domain Controllers, License Servers, Data Collectors.

For this design we will use VMware View 5.0:

  • 7 vCenter Servers with 8GB RAM
  • 6 Connection Servers with 10GB RAM (N+1)
  • No Security Servers
  • No Transfer Servers

It is not good practice to host management servers on the same infrastructure and clusters supporting the virtual desktop workload for reliability reasons. You could opt for hosting those infrastructure VMs in a completely different set of servers; perhaps your server virtualization infrastructure.

I have chosen to use two blades of the same model and configuration (Cisco UCS B230 M2 Extended Memory) to allow for replacement for the virtual desktop workload if I need to.

Here is how the solution would look like in my datacenter (nice, tidy and in two racks only) image

 

There are several factors that create a wide range of software and hardware combinations. As an example, if VM memory is reduced to 1.5GB the amount of host/RAM required is 304 GB. Changing de amount of memory will not change the number of hosts, unless you assign more VMs per core. The consequence of assigning more VMs per core is that you will need more GHz per CPU and you server may not support.

For this blog post I have used my Online VDI Calculator to determine the number hosts, host CPU frequencies and host memory. The calculator is fully optimized to support the latest release of vSphere and VMware View.

16 comments

1 ping

Skip to comment form

  1. Gabrie van Zanten

    Nice post !!! Love to see how thinks work out when users are really using this environment. Tell about any miscalculations or surprises that came up. Needed more or less hardware than expected?

  2. Paul

    Cool article and great read! I agree, would love to see a part two with results.

  3. AFidel

    Can View not redistribute failed VM’s across multiple clusters? Because if it is capable then doing 14 VM’s per core on the B250 Extended memory would seem to be a more cost effective solution while still fitting in a two rack footprint.

  4. Paul

    Thank you for showing us your thought process around VDI design. It was very enlightning. I have noticed that a lot of people are starting to implement UCS systems instead of the traditional hardware vendors; IBM, HP etc. Is this because of price or is it just about the quality of the product?

  5. Paul B

    Hi, good read. We have over 1k Users deployed on VDI using Cisco UCS – B250M2s with View 4. Am curious to know what storage you are using and if you plan to use non-persistent clones / Persona Management? How do you intend to implement DR? Would love to have a chat with you about your experiences to date.

  6. Sham Sao

    The best way to reduce the amount of hardware required is to run the virtual machines on distributed end-point devices (laptops and desktops), while centralizing the management functions such as provisioning, patching, and shared image management. Using this approach, you can centrally manage 1000s of PCs as easily as one (reaping all of the benefits of desktop virtualization) while cutting the infrastructure requirements by over 90%. VirtualComputer.com presented this approach at the VMworld 2011 show, and won an award in the “Desktop Virtualization” category for it. They also offer a client-hypervisor that’s totally free to use as well as a centralized management system that’s also totally free for anyone who has 5PCs or less. For companies with more than 5 PCs, there’s a cost for the management system, but the client-hypervisor is still free.

  7. Andre Leibovici

    @AFidel
    The Cisco UCS B250 with Extended Memory has Intel CPU 5600 Series with support up to 6 cores. Even if the GHz required is avaiable; with 14 VMs/core we would need 120 servers.

  8. Andre Leibovici

    @Paul
    Price is competitive with other major brands, however you see adoption because of the architecture. Cisco UCS architecture was designed and optimized for various layers of virtualization (Server Virtualization, Network Virtualization etc.) to provide an environment where applications run on one or more uniform pools of server resources.

  9. Andre Leibovici

    @Paul B
    The storage are two VNX7500 that will handle 5K desktops each. There will be combination of Linked Clones and Ful Clones, Persistent and Floating. DR is not part of this design, however i have written about DR in the past. Please refer to “VMware View Disaster Recovery Scenarios & Options” at http://myvirtualcloud.net/?p=1203

  10. Andre Leibovici

    @Sham Sao
    Despite I think client-hosted VDI works great for some use cases I have to disagree that it will reduce the amount of hardware (you are implicitly saying will cost less).

    The performance you get from VMs running in server environment is only comparable with the latest laptops/desktops available on the market, that include SSDs. The price of those laptops is higher than the whole infrastructure to support all 10K desktops.

    Client-hosted VDI is great for specific use cases.

  11. Brandon Potter

    I know its an old post, but the Cisco UCS B230 M2 Extended Memory is a full width blade, not 1/2, therefore you will need 11.5 chassis.

    Be interested on your thoughts around the new 16core AMD cpu, and using it to give VDI’s 2vCPUs, instead of the tradditional 1vCPU. Ram prices are beginning to move down, and its almost affordable to cram 512gb into a BL685 (16gb x 32 dimm slots)

  12. Brandon Potter

    ignore my post, i see you used the UCS B230 M2 Extended Memory , not the b250 m2

  13. Toby Armfield

    Andre,

    regarding the blades, in section on the 250M2 these are full width blades, unles you meant the B230M2 as well in this section.

    I would have also used the 6140 fabric interconnects with that many chassis, you have 2 x 6120 which will give you 40 ports for the 6 chassis you are porosing and as each chassis can support 8 x 10GB connections you will have to limit the number to each chassis and restrict the bandwidth accordingly. There is mention of the VNX storage, so assuming these are using the FC modules for the FE’s then you would still need to reatain some ports for northbound networking, probably restricting the chassis to 4 connections and 40GB bandwidth rather than the full 80GB that would be avilable with the 6140.

  14. Andre Leibovici

    @Toby Armfield

    As today I actually would probably choose the 6142 fabric interconnects to solve the FE’s issues. According to the Cisco table the 250M2 with Extended Memory are Full Witdh (look at the table above).

    Thanks for complementing the article…

    Andre Leibovici

  15. CR112

    What was the application workload (VSI medium, Heavy, light?) that you used to determine your configuration and validate that you Vram per guest and Vcpu per core numbers are accurate?

  16. Andre Leibovici

    @CR112
    There was no workload simulation here. In this scenario the specification used was: 10,000 VMs powered-on with average CPU of 200MHz. If this was a real deployment you would have to find out what the CPU, RAM and Storage requirements are for your solution.

    Andre

  1. myvirtualcloud.net » A Guide of How to Buy the Wrong Hardware for your VDI

    […] No matter what they do their CPU power will be underutilized due to the lack of memory resources or the limits imposed by the hypervisor. I have dug into this kind of problem before in my article The Right Hardware for a 10K VDI solution. […]

Leave a Reply