«

»

VMware View Disaster Recovery Scenarios & Options

Disaster recovery for VMware View has always been one of those hot topics. As today there is no official support from VMware for SRM integration with VMware View, nor VMware View Pods are able to replicate across datacentres. I know that interesting things are cooking as I write this article.

So, assuming that your organisation does not have any of those new fancy technologies such as EMC V-PLEX, Cisco OTV or NetaApp Metrocluster; and assuming you want implement a supported solution. What are the options to provide DR for your organisation’s virtual desktop infrastructure?

I am providing three different scenarios that could also be implemented as a mix, creating a fourth scenario.

The Partial Active – Active

This scenario describes a Partial Active-Active setup where each datacentre is responsible for providing virtual desktop to a determined set of users based on the connection broker information provided by the user when launching VMware View client or Thin Client. This is the scenario where users must know their connection broker address (that can be tricky in some organisations).

To support the full workload when DR mode is activated, the administrator will pre-provision desktop pools required to support the workload of the other datacentre. These desktop pools may remain enabled or be disabled. In VMware View 4.5 desktop pool enablement can easily be automated through the use of PowerShell commandlets.

The virtual desktops maintained by the desktop pools may remain On or be Powered Off. Powering Off desktops when not in use allow CPU cycles, Storage IO and electricity savings. The downside is that when a DR event is triggered all virtual desktops will be powered on. This process may consume resources from all stacks potentially creating a boot storm and affecting performance of existing virtual desktops in the datacentre.

Load balance virtual desktop pools across clusters is a recommended practice for this type of DR environment. When creating desktop pools make sure they are interleaved with the pools dedicated to DR. As an example, a cluster with 1024 virtual desktops would have one desktop pool for production and one DR desktop pool disabled. This setup will allow optimal resource performance for virtual desktops in production.

clip_image001

Virtual Desktop portability is not a feature provided in this scenario. Wherever the user may connect from he/she will always be redirected to the parent datacentre, and ultimately to the user’s own virtual desktop.

The figure below demonstrates the user connecting from a different site and having the connection diverted to the parent datacentre. This scenario allows use of dedicated desktops and persistent disks. Floating pools can be used in conjunction with application virtualisation (ThinApp), Roaming Profiles and/or Folder Redirection.

clip_image002

The bi-directional array replication on the figured above is there to demonstrate how roaming profiles and folder redirection could be available on both datacentres. In order for that to happen it is important to make sure that name resolution for the profiles folder is active on both sites and diverting to the appropriate local IP address.

In case of a DR event there are two options. A) Users are told of a new address to reach the connection brokers on the secondary datacentre, B) Load Balancers are smart enough to divert the connections to the correct pool of connection brokers on the secondary datacentre. Some load balancers have the ability to trigger action-scripts that could automatically enable the disabled DR pools from the secondary datacentre.

clip_image003

The key point in this scenario is to understand the steps that should be taken if a DR event has already taken place. These steps can be manual or automated. Some organisations prefer to manually change DNS resolution when required to allow user to connect to the secondary datacentre using a seamless connection alias.

The Active – Passive

This is probably the simplest implementation because requires a single active site. All users connect to the same datacentre independent of the location or region of the site. The upside of this implementation is that all your data is centralised and in some cases array replication is not required. The downside is mostly related to links and network bandwidth usage. If the environment has regional branches, normally coupled with high latency and low bandwidth, the operational expenditures to support a decent connectivity may become unaffordable.

clip_image004

In this scenario all desktop pools from datacentre B are disabled and all virtual desktops are Powered Off, only being enabled during a DR event. It is possible to leave Pools and virtual desktops Off however the savings on electricity outweigh the operational benefits.

The administrator will pre-provision desktop pools required to support the workload from the other datacentre. The drawback here is that the master templates, replicas and linked clones will have to be manually updated time-to-time when there has been a change. The number of changes and the use application virtualisation will directly impact on the number updates to be done on the DR environment. With a bit of luck when the DR event happen you will have latest master image and replicas on the secondary datacentre , and hopefully it will be all re-composed and ready to be powered on.

As I mentioned, this is the simplest implementation and requires some manual intervention. In saying that, there is a lot that could be automated and orchestrated through the use of PowerShell and PowerCLI.

clip_image003[1]

The Full Active-Active

This is the true Active-Active DR implementation where there are no desktop owners and all desktops are exactly the same, being refreshed after user logoff as an option. The applications are user-base and delivered via application virtualisation.

Some pre-requisites are essential to fully utilise this scenario.

User Profiles – All user profiles must be available on datacentres using folder redirection, roaming profiles or a 3rd party solution for persona management. The replication can be achieved in few different ways, being the most common array based replication and Windows DFS.

Floating Pools – All desktop pools should be set to the floating type. This will allow consistent desktop experience to all users independent of the desktop being used.

Application Virtualisation – This is a critical component to select the application that each user should have access to. Application layering is a tendency and the system images should be left as light as possible; perhaps only with the antivirus and few required patches installed. All other applications should be delivered through virtualisation. This will also reduce operational maintenance and number of re-compositions.

Smart Load Balancers -Load Balancers are smart enough to understand where the user is connecting from (home, Site A, Site B) and divert the connection to the closest datacentre.

Assuming array replication is in place and virtual applications are available across both datacentres, the users will always get the exact same working environment every time they connect to any of the desktops on any of the datacentres.

In some circumstances you may allow some desktop pools to be enabled and some disabled. This would be based on the number of travelling users in your organisation. However, in a DR situation all desktop pools would have to be enabled. This can be done manually or trough smart load balancers with ability to trigger action-scripts that could automatically enable the disabled DR pools from the secondary datacentre.

clip_image005

Another important point in this this scenario is that there is no DNS changes required should a DR event occur. Neither users need to be informed of any actions or changes on how they connect to their desktops. This is the true VMware View seamless DR.

clip_image006

All my DR scenarios assume that users do not have any information that need to be available on the secondary datacentre other than their profiles and redirected folder. However, in some circumstances users will necessitate dedicated desktops instead of Floating ones. I recommend treating these cases as an exception and maybe assigning those users to full-clones instead of linked-clones. My article DR for critical desktops in VMware View describe how full-clone virtual desktops can be replicated and reinstated on a secondary datacentre.

2 comments

2 pings

  1. TheOX

    Hi, congratulations for your very interesting and useful blog!

    Although this post is somewhat dated, I noticed that vSphere 5.0 / View 5.0 didn’t bring anything new about the (lack of) SRM / View interaction. So the problem of designing and implementing a DR solution for View is still there, and the scenarios represented here are still valid, if I’m correct.

    I’d like to ask you a couple of questions about the Active-Passive solution you described in this post. I have a View deployment with 90% dedicated linked clones (with persistent disks) and 10% floating desktops (again linked clones, with profile/folder redirection). Single site, small deployment (70 seats).

    The aim would be a DR solution working even in the case the primary data center is unavailable. All network redundancy is already present so that is not a problem.

    Let’s assume I already have array replication in place, so I have the VMFS datastore with all VMs/replicas/linked clones replicated to the DR site. After this point, a flurry of questions arise in my mind:

    – I should need a second vCenter instance for DR ESXi nodes, right?
    – I should need a second View Connection Manager “talking” to the second vCenter, right?
    – Do I just need to “import” to the inventory of this second vCenter all the VMs and stuff from the replicated LUN using vSphere Client? All replicas/clones relations and bounds are nicely preserved and I can expect everything to work fine?
    – If so, how can I configure a pool in View Manager to use existing virtual machines with View Composer / Linked Clones features enabled?
    – The final situation would be having the exact same desktop pools in both sites, only the DR pools would be disabled?

    I’m quite confused… from my understanding, View is a complex architecture not easily “movable” between sites. But perhaps (hopefully) I could be wrong.

    Or maybe you meant to create new, different virtual desktops in the DR site and only map the users’ persistent disks to them? But in that case what I would need the array replication for, apart from the persistent disks?

    Thanks in advance, regards and keep up the good work!

    OX

  2. TheOX

    Never mind my comment… just read another article of yours that confirmed my fears – headaches incoming…

    http://myvirtualcloud.net/?p=1716

    A complete, separate View infrastructure is needed, plus *some* manual work to assign users and persistent disks to desktops.

    Thanks anyway…

  1. How to Recover Linked Clone Desktops in a DR Site » myvirtualcloud.net

    […] protection and replication of user data and profiles. Back in 2010 I wrote an article entitled VMware View Disaster Recovery Scenarios & Options and the options available at the time are still pretty much […]

  2. New ‘Acropolis File Services’ with Native Support for VMware Horizon UEM and Citrix Profile Manager » myvirtualcloud.net

    […] really! recommend you to re-read my article VMware View Disaster Recovery Scenarios & Options  (I will soon update to include AFS). Times have changed and now AFS can provide all the necessary […]

Leave a Reply