«

»

Jan 12 2011

Cores and more Cores… We don’t need them!

In September 2010 during the IDF conference in San Francisco Intel unveiled their next-generation server Xeon CPU, which has been dubbed Westmere-EX. The numbers are  impressive, 10-core / 20 Thread Processor with ability to address 2TB of triple-channel memory, which is two times the addressable memory of the previous generation.

In May Intel announced a new 32-core server chip based on a new high-performance computing server architecture that mixes general x86 cores with specialized cores for faster processing of highly parallel scientific and commercial applications. The chip will be available in the second half of 2010.

“Knights Ferry includes 32 main Xeon chip cores in the server CPU socket, with corresponding 512-bit vector processing units in the PCI-Express slot. The chip runs four threads per core and includes 8MB of shared cache, and up to 2GB of fast GDDR5 memory.”

Fantastic pieces of technology for large number of different workloads, especially databases which will be able to run solely from memory as opposed to a non-volatile storage device. However, waste of money and deprecated high availability for VDI solutions.

My customers often ask me for my recommendations, and at the same time tell me that they had been thinking about buying Nehalem processors with four, six, or even eight cores. My answer invariable is a rhetoric question – Why?

The truth is that whilst multi-cores may benefits several types of workloads for VDI will only reduce high-availability and make the solution more expensive. To understand my reasons let’s remember some limitation around vSphere 4.1, vCenter 4.1, VMware View 4.5 and View Composer 2.5. These are the latest generation of components that makeup a View solution.

 

vCenter 4.1
Validated architecture supports 2,000 virtual desktop

vSphere 4.1
Maximum of 320 virtual machines per host
Validated architecture supports 16 virtual desktops per core in Nehalem systems

View Composer 2.5
Maximum of 8 hosts per cluster
Maximum of 128 linked clone virtual desktops per datastore

 

In a new deployment Windows7 would likely be the GuestOS used, and the minimum amount of RAM recommended for this OS is 2GB. For the sake of this article let’s also assume that there are no CPU intensive applications and a single virtual CPU (vCPU) will suffice most users. Yet, because of Windows7 ASLR (Address space layout randomization) the TPS ratio (Transparent Page Sharing) is reduced to about 10% (it was about 40% with Windows XP).

Now that we have assumptions and limits defined let’s run couple simulations. The first simulation just out of curiosity will be using the forthcoming 32-core processor.

 

64 Cores (2 sockets x 32 cores) x 16 VMs per host=1024 VMs per host (HT Disabled)
This scenario requires approximately 2008GB RAM per host.

~128 Cores (2 sockets x 32 HT cores) x 16 VMs per host = 2048 VMs per host (HT Enabled)
This scenario requires approximately 4000GB RAM per host.

 

Not to extend much the discussion, vSphere 4.1 support a maximum of 320 virtual machines per host. For the first example, the maximum number of VMs per CPU/Core would have to be 5 to keep the number of VMs below 320. Under these conditions the system and CPUs would be largely underutilised.

The second scenario with 128 HT cores requires a total of 4000GB RAM. The architecture supports a maximum of 2TB, therefore to keep the memory within the boundaries the maximum number of VMs per CPU/Core is ~8. Looks a bit better than running without HT but is still leaving expensive machinery largely underutilised.

These scenarios were just to play with the numbers a little bit. Let’s see what happens in a real-world deployment with 2 sockets with 6 CPU/Cores. I am discounting HT and Turbo Boost technologies in this scenario as they don’t exactly count for double the amount of cores.

 

12 Cores (2 sockets x 6 cores) x 16 VMs per host=192 VMs per host
This scenario requires approximately 312GB RAM per host.

 

The 12 cores scenario works well for the number of virtual machines per host, however would require approximately 384GB RAM per host. As today, most systems out there do not offer support for this amount of RAM, with few exceptions such as Cisco ASIC memory extension architecture.

If you have a system that support this amount of memory, then you need to start thinking how long your organisation or customer is willing to wait for virtual desktops to be powered on in another host if a HA event is invoked, or how long will take to vMotion all VMs for host maintenance.

If you have around 550 VMs in total this would represent an outage close to 33%. Just on storage requirements each hosts with 192 VMs require approximately 6528 IOPs under a normal workload of 10 IOPs per user with Read/Write ratio of 20/80. And don’t forget to account for network connectivity and throughput for all virtual machines.

 

For VDI solutions most of my clients are currently adopting Nehalem systems with 2 sockets and 4 cores each, total of 8 cores per host. To me this configuration seems to provide the best correlation between Cost x Performance x Availability. Using the validated architecture of 16 VMs per CPU/Core it is possible to host approximately 128 VMs per host; and each host will require approximately 256GB.

For some systems 256GB is still not feasible. In those cases 12 VMs per CPU/core will allow for 96 VMs per host and 192GB. That sound like a more reasonable number to achieve for all systems out there.

It’s important to note that a 8 host cluster fully populated with 16 VMs per CPU/Core allows for 1024 VMs without spare capacity for a HA events. In this case if a host is shutdown or lost there is no spare capacity to power on VMs on additional hosts. Running an 8 host cluster fully populated with 12 VMs per CPU/Core allows for 768 VMs with the exact capacity to support a host failure. VMware’s reference architecture, aka Block, recommends 1024 VMs per cluster, however there is no HA events in consideration.

My personal recommendation is to use  a maximum of 12 VMs per CPU/Core in a full populated 8 host cluster. If you have a different opinion or approach I am happy to discuss.

 

As a take away – If you are planning to design a View solution keep these numbers in mind as they affect the end-result of the project, and at the bottom line $$$ (costs).

Cores and more Cores… We don’t need them!…… at least for VDI.

 

** All numbers presented here assume a standard distribution of virtual machines with similar and regular workloads. The number may vary according to specifics for each environment.

Similar Posts:

Permanent link to this article: http://myvirtualcloud.net/?p=1519

18 comments

6 pings

Skip to comment form

  1. Andre Leibovici

    There is some dicussion going on about my comment: “because of Windows7 ASLR (Address space layout randomization) the TPS ratio (Transparent Page Sharing) is reduced to about 10% only (it was about 40% with Windows XP).”

    My undersrtanding and real-world experience is that ASLR memory randomization, makes TPS more difficult as there are less common pages between systems.

    I will take that as a homework and validate this number against real-world VDI production environment as soon as I can. In the meantime I was not able to find any documentation supporting the idea that TPS is affected, but also didn’t find anything saying that does not affect; only some non-vmware blogposts.

  2. Mihai

    Your calculations asume 16 VMs per core, isn’t that a little too much for real VMs used by normal people (not helpdesk drones)? What about sudden spikes? What about outlook crashing and using 100% CPU? What about boot/login storms?

    What’s wrong with having some headroom and planning a 50% load on the server? It’s not like those 10core CPUs are going to be significantly more expensive than existing 6 core ones.

    Yes, it would be stupid to have a mammoth server with 1TB RAM and 500VMs but the additional cores will offer a better user experience for those VMs.

  3. Matt Liebowitz

    When talking about TPS, are you assuming that you have overcommitted memory in the host or disabled the use of large pages? If you’re still using large pages then the amount of memory shared will be very low, with the exception of zeroed out pages (mostly from bootup).

    I’ve done testing with Windows 2008 and found a nearly identical amount of memory sharing with TPS with or without ASLR enabled. And in practice I don’t see much of a difference in the amount of memory shared between older VMs that don’t use ASLR (Windows XP/2003) and those that do (anything after 2008/Vista). It may have an impact but in my experience it appears to be marginal.

  4. James H.

    I think your conclusion is correct, but, that your supporting facts are a bit over-stated–for now. For example, I don’t think that 2GB is typical and VMware actually recommends 2vCPU for W7.

    But, ultimately, you are pointing at a problem about to happen. Ie..very soon, we will have more cores than we can effectively use. (This is already true in the virtual server world where my customers are typically comfortable with about 40 or 50 virtual servers being affected by a single hardware outage.)

    In the same way, customers will eventually stop asking “How many VDI sessions CAN I fit onto a host?” and start asking “How many VDI sessions SHOULD I fit onto a host?”.

  5. Andre Leibovici

    @Mihai
    16 VMs is the validated maximum number of VM/Core. I normally recomend using 12 or less VM/Core to allow for HA event in a cluster. Nothing wrong to plan for spare CPU & memory capacity if you can afford it. VMware recommendation is to not exceed 80% CPU of memory resource utilization

    @Matt Liebowitz
    In hardware-assisted memory virtualization systems, ESX will preferentially back guest physical pages with large host physical pages for better performance. If there is not a sufficient 2MB contiguous memory region in the host, ESX will still back guest memory using small pages (4KB). ESX still generates hashes for the 4KB pages within each large page during page scanning. KB 1021095

    I have found some View documentation supporting 30% TPS with Win7 but I will run my tests as well and publish them here.

    @James H.
    VMware recommendation is 1vCPU per Windows7 guest. In my tests a profiled Windows7 will consume as much as 1GB only for the OS, without any aditional application running.
    You are absolutely right about customer asking “How many VDI sessions SHOULD I fit onto a host?”, however you also have the option to buy your host based on your design specifications.

  6. level380

    With core ratios that you list here, I don’t want to be on your VDI farm!!!

    Cores are ‘cheap’ when looking at the total cost of a server. Why not get 6 or 8 core CPU compared to 4 core CPUs. Its a 50-100% increase of cores, allows more cores for scheduling and lower ready times will been seen on the guests, which result in snappy VDI.

    No one wants to be on an overloaded VDI farm with high ready time waiting for the mouse to move!

    1. Andre Leibovici

      Nothing wrong with your line of thought, however real world deployments is a differed scenario. Organisations are moving to VDI aiming for Capex and Opex savings. Leaving resources under-utilised will jeopardise the whole purpose of the move. Of course there are other benefits but ask the CIO and CFO what they are expecting.

      12 VMs per CPU/Core is a very doable ration and the end-user experience will be very snappy the environment is backed by the right storage and network. Even 16 VMs per CPU/core is doable for most VDI deployments, and those are numbers validated architectures.

  7. level380

    Each to their own, but it seems short sight to assume that the loads won’t change. Look at the large increase between WinXP and Win7 for hardware. What happens when Win8 comes out and requires a dual vcpu as a min? Then your entire case above is thrown out and your VDI numbers are havled!

    Now if you brought those 6-8 core CPUs instead of saving a few k on the server cost and getting the 4 cores ones, you wouldn’t have any issues.

    So, I still disagree, when spending 30-50k on a server with enough RAM etc for the VDI host, why not spend a extra 1-2k (very small increase) and get the 6-8 core CPUs, provide ample CPU overhead now.

    You could say whats the point in getting dual 4 core, why not get a single 8 core CPU?

  8. Michael

    One area I could see all this extra processing power being used is in providing fault tolerance. Once VMware releases FT-SMP and this scales to the maximum number of VM’s per host, and the networks etc are fast enough to keep up, I could keep scenarios where you have mirrored hosts / FT VM’s in large numbers (or why not just mirror entire hosts rather than individual VM’s?). This would aid reaching the higher consolidation ratios per host. I think this would be quite a few years in the future though, especially seeing as I haven’t heard of anyone talking about building FT into VDI at all. But I could definitely see its usefulness with important persistent/dedicated desktops (and of course servers), but would not be needed for floating/non-presistent desktops.

    There are plenty of places getting into the 70 or 80 or 90 VM’s per host, but to get to the really high numbers per host you need to ensure availability if there is a host failure and this will require either application level fault tolerance and load balancing, so that the loss of a single host isn’t that catastrophic, or something else.

  9. Andre Leibovici

    @level380
    Just out of curiosity I went to Dell Online Store and customised a PowerEdge R710 dual-socked with 192GB RAM. I then simulated changind the CPUs from Intel Xeon X5677 3.46GHz with 4Cores to X5680 3.33GHzwith 6Cores. There was marginal price diference between the two, actually the 6Cores was about $300 cheaper.

    If we are talking about providing more CPU/Cores without increasing VM density I agree with you that 6-8 cores should be used. What I also noticed on Dell servers is that moving to 8 Cores would require a diferent range of servers.

    Thanks for your comments!

  10. @langonej

    This is great conversation from Andre’s original post. While every environment is certainly different, heavy USB usage, bi-directional audio and rendering aside, ~12 VMs per core is a number I’ve certainly used before with success. In my opinion, the bottleneck is rarely CPU and more likely storage or network. As several people have mentioned, CPU is a fairly cheap insurance policy when going from 4 to 6-cores. I am not recommending more than 6-core processors for the projects I’m involved with as it doesn’t make much sense from a design perspective.

    If there’s a place to spend your money within a VDI project it’s likely on services and storage. A proper load balancing solution, non-persistent desktops and disperse storage can drastically reduce the impact a host failure may have within an organization.

    Also, I personally am not seeing FT as part of VMware View solutions except for the most rare of circumstances. Integrating View into SRM is of far greater importance (again, you can design around this to some degree).

  11. Loren

    Why are we comparing future chipsets against current software limitations? It is highly likely that VMware will continue increasing their supported maximums to match what is possible with the hardware available at the time the software is released.

  12. Andre Leibovici

    @Loren
    First of all thanks for your commment. I agree with you, however as today these are the limitations we have. It’s not all about software upgrade or support from VMware , but also about how many virtual desktop would you be willing to host in a single server without major impacts to users should you loose a host.

  13. James H.

    @Andre
    Well, I suppose VMware is a big company and has different recommendations depending on who you find to answer the question. But, I do have a ppt from VMware that states to use 2vCPU for Win7. Previously, I always used 1vCPU but, I’ve switched and I think the user experience is better for it. As you state, CPU contention is rarely the issue nowadays so, it seems like a solid direction and its only going to become less so.

  14. tessil

    Hi ,
    i had a question : are you implying that processing power is not really the bottleneck here ? only memory capacity is required for supporting more VMs/core ?
    what about memory bandwidth ?

  15. Andre Leibovici

    @tessil
    Most VDI deployments are memory bound, not CPU bound. Off course, if you try to host 100 desktops in a server with enough RAM but with only a single dual-core socket it won’t fly. VMware has been recommending 12 to 16 VMs per CPU core, but is really depends on the workload type you have.

    Nowadays you will buy at minimum dual-quad Nehalem processors and you will have 2 populated sockets – total of 8 cores. In saying that, 6 core processors may be even cheaper than 4 core processors, allowing for 12 cores.

    Memory bandwidth is the data transfer rate for your RAM; the faster the better it is.

  16. Srayala

    what are the current Changes in Vsphrer5 with respective to TPS .I have come to know from some blogs about ASLR for windows7 saying that It is efficient when you enable TPS .
    is there any additional major changes introdcued in Vspher5?
    I recently browsed to Intel website for my new server buid say with follwing configuration
    ·    2X Intel Xeon 2.66 GHz Six Core Processor
    ·    96 GB Memory , 1066MHz, Quad Ranked RDIMMs for 4
    ·     3 X146 GB 15k RPM SAS Hot Plug HDD
    ·     RAID 5 for PERC H700 Controller (Non-Mixed Drives)
    ·     2x PERC H800 RAID Adapter for External JBOD, 512MB
    ·     2 X Qlogic 8 Gbps Dual Channel Fibre Channel HBA
    ·     2 X Gigabit Dual Ethernet NIC

    with the above configuration how many maximum VM’s I can run efficienly,assume all users are medium workers say 1.5 GB memory each VM.

    Any suggestion are valuble for as I am planning this model servers for my new VDI project.

    thanks
    Srayala    

  17. Andre Leibovici

    @Srayala
    TPS with Windows 7 is impirical due to ASRL.
    I recommend testing your own desktop images.

    In regards to sizing use my online VDI calculator the hardware required. http://myvirtualcloud.net/?page_id=1076

  1. Tweets that mention myvirtualcloud.net » Cores and more Cores… We don’t need them! -- Topsy.com

    [...] This post was mentioned on Twitter by Lori MacVittie, Hyper-V-Server Blog, Walt Howd and others. Walt Howd said: Cores and more Cores… We don’t need them!: In September 2010 during the IDF conference in San… http://goo.gl/fb/oPgRD [...]

  2. More guests per logical processor in Hyper-V R2 SP1 – Hyper-V blog by Hans Vredevoort

    [...] An interesting take on core density and VM’s per logical core is Andre Leibovici’s blog “Cores and more Cores … We don’t need them!” http://myvirtualcloud.net/?p=1519 [...]

  3. Scaling up/out? Or genuine performance troubleshooting? | VMGuru.nl - I choose (a virtual) life!

    [...] I was finishing this post Erik pointed me to this article talking about cores and even more cores. The author of this article also concludes that more [...]

  4. Scaling up/out? Or genuine performance troubleshooting? « Virtualization.BlogNotions - Thoughts from Industry Experts

    [...] I was finishing this post Erik pointed me to this article talking about cores and even more cores. The author of this article also concludes that more [...]

  5. Bookmarks for January 24th through February 23rd | Savage Nomads

    [...] myvirtualcloud.net » Cores and more Cores… We don’t need them! – buying Nehalem processors with four, six, or even eight cores. My answer invariable is a rhetoric question – Why? About the AuthorJason Benway Has Written 1335 Articles For Us!I am the admin of this site and enjoy enterprise technologies and gadgets.Getting The Latest Tweet… Did you know Jason has a website? Go see what you're missing… [...]

  6. More guests per logical processor in Hyper-V R2 SP1 | Hyper-v.nu

    [...] per logical core is Andre Leibovici’s blog “Cores and more Cores … We don’t need them!” http://myvirtualcloud.net/?p=1519 Core, guest, Hyper-V R2 SP1, logical processor, LP, vCPU, VDI, virtual [...]

Leave a Reply