In September 2010 during the IDF conference in San Francisco Intel unveiled their next-generation server Xeon CPU, which has been dubbed Westmere-EX. The numbers are impressive, 10-core / 20 Thread Processor with ability to address 2TB of triple-channel memory, which is two times the addressable memory of the previous generation.
In May Intel announced a new 32-core server chip based on a new high-performance computing server architecture that mixes general x86 cores with specialized cores for faster processing of highly parallel scientific and commercial applications. The chip will be available in the second half of 2010.
“Knights Ferry includes 32 main Xeon chip cores in the server CPU socket, with corresponding 512-bit vector processing units in the PCI-Express slot. The chip runs four threads per core and includes 8MB of shared cache, and up to 2GB of fast GDDR5 memory.”
Fantastic pieces of technology for large number of different workloads, especially databases which will be able to run solely from memory as opposed to a non-volatile storage device. However, waste of money and deprecated high availability for VDI solutions.
My customers often ask me for my recommendations, and at the same time tell me that they had been thinking about buying Nehalem processors with four, six, or even eight cores. My answer invariable is a rhetoric question – Why?
The truth is that whilst multi-cores may benefits several types of workloads for VDI will only reduce high-availability and make the solution more expensive. To understand my reasons let’s remember some limitation around vSphere 4.1, vCenter 4.1, VMware View 4.5 and View Composer 2.5. These are the latest generation of components that makeup a View solution.
Validated architecture supports 2,000 virtual desktop
Maximum of 320 virtual machines per host
Validated architecture supports 16 virtual desktops per core in Nehalem systems
View Composer 2.5
Maximum of 8 hosts per cluster
Maximum of 128 linked clone virtual desktops per datastore
In a new deployment Windows7 would likely be the GuestOS used, and the minimum amount of RAM recommended for this OS is 2GB. For the sake of this article let’s also assume that there are no CPU intensive applications and a single virtual CPU (vCPU) will suffice most users. Yet, because of Windows7 ASLR (Address space layout randomization) the TPS ratio (Transparent Page Sharing) is reduced to about 10% (it was about 40% with Windows XP).
Now that we have assumptions and limits defined let’s run couple simulations. The first simulation just out of curiosity will be using the forthcoming 32-core processor.
64 Cores (2 sockets x 32 cores) x 16 VMs per host=1024 VMs per host (HT Disabled)
This scenario requires approximately 2008GB RAM per host.
~128 Cores (2 sockets x 32 HT cores) x 16 VMs per host = 2048 VMs per host (HT Enabled)
This scenario requires approximately 4000GB RAM per host.
Not to extend much the discussion, vSphere 4.1 support a maximum of 320 virtual machines per host. For the first example, the maximum number of VMs per CPU/Core would have to be 5 to keep the number of VMs below 320. Under these conditions the system and CPUs would be largely underutilised.
The second scenario with 128 HT cores requires a total of 4000GB RAM. The architecture supports a maximum of 2TB, therefore to keep the memory within the boundaries the maximum number of VMs per CPU/Core is ~8. Looks a bit better than running without HT but is still leaving expensive machinery largely underutilised.
These scenarios were just to play with the numbers a little bit. Let’s see what happens in a real-world deployment with 2 sockets with 6 CPU/Cores. I am discounting HT and Turbo Boost technologies in this scenario as they don’t exactly count for double the amount of cores.
12 Cores (2 sockets x 6 cores) x 16 VMs per host=192 VMs per host
This scenario requires approximately 312GB RAM per host.
The 12 cores scenario works well for the number of virtual machines per host, however would require approximately 384GB RAM per host. As today, most systems out there do not offer support for this amount of RAM, with few exceptions such as Cisco ASIC memory extension architecture.
If you have a system that support this amount of memory, then you need to start thinking how long your organisation or customer is willing to wait for virtual desktops to be powered on in another host if a HA event is invoked, or how long will take to vMotion all VMs for host maintenance.
If you have around 550 VMs in total this would represent an outage close to 33%. Just on storage requirements each hosts with 192 VMs require approximately 6528 IOPs under a normal workload of 10 IOPs per user with Read/Write ratio of 20/80. And don’t forget to account for network connectivity and throughput for all virtual machines.
For VDI solutions most of my clients are currently adopting Nehalem systems with 2 sockets and 4 cores each, total of 8 cores per host. To me this configuration seems to provide the best correlation between Cost x Performance x Availability. Using the validated architecture of 16 VMs per CPU/Core it is possible to host approximately 128 VMs per host; and each host will require approximately 256GB.
For some systems 256GB is still not feasible. In those cases 12 VMs per CPU/core will allow for 96 VMs per host and 192GB. That sound like a more reasonable number to achieve for all systems out there.
It’s important to note that a 8 host cluster fully populated with 16 VMs per CPU/Core allows for 1024 VMs without spare capacity for a HA events. In this case if a host is shutdown or lost there is no spare capacity to power on VMs on additional hosts. Running an 8 host cluster fully populated with 12 VMs per CPU/Core allows for 768 VMs with the exact capacity to support a host failure. VMware’s reference architecture, aka Block, recommends 1024 VMs per cluster, however there is no HA events in consideration.
My personal recommendation is to use a maximum of 12 VMs per CPU/Core in a full populated 8 host cluster. If you have a different opinion or approach I am happy to discuss.
As a take away – If you are planning to design a View solution keep these numbers in mind as they affect the end-result of the project, and at the bottom line $$$ (costs).
Cores and more Cores… We don’t need them!…… at least for VDI.
** All numbers presented here assume a standard distribution of virtual machines with similar and regular workloads. The number may vary according to specifics for each environment.