Disaster recovery for VMware View has always been one of those hot topics. As today there is no official support from VMware for SRM integration with VMware View, nor VMware View Pods are able to replicate across datacentres. I know that interesting things are cooking as I write this article.
So, assuming that your organisation does not have any of those new fancy technologies such as EMC V-PLEX, Cisco OTV or NetaApp Metrocluster; and assuming you want implement a supported solution. What are the options to provide DR for your organisation’s virtual desktop infrastructure?
I am providing three different scenarios that could also be implemented as a mix, creating a fourth scenario.
The Partial Active – Active
This scenario describes a Partial Active-Active setup where each datacentre is responsible for providing virtual desktop to a determined set of users based on the connection broker information provided by the user when launching VMware View client or Thin Client. This is the scenario where users must know their connection broker address (that can be tricky in some organisations).
To support the full workload when DR mode is activated, the administrator will pre-provision desktop pools required to support the workload of the other datacentre. These desktop pools may remain enabled or be disabled. In VMware View 4.5 desktop pool enablement can easily be automated through the use of PowerShell commandlets.
The virtual desktops maintained by the desktop pools may remain On or be Powered Off. Powering Off desktops when not in use allow CPU cycles, Storage IO and electricity savings. The downside is that when a DR event is triggered all virtual desktops will be powered on. This process may consume resources from all stacks potentially creating a boot storm and affecting performance of existing virtual desktops in the datacentre.
Load balance virtual desktop pools across clusters is a recommended practice for this type of DR environment. When creating desktop pools make sure they are interleaved with the pools dedicated to DR. As an example, a cluster with 1024 virtual desktops would have one desktop pool for production and one DR desktop pool disabled. This setup will allow optimal resource performance for virtual desktops in production.
Virtual Desktop portability is not a feature provided in this scenario. Wherever the user may connect from he/she will always be redirected to the parent datacentre, and ultimately to the user’s own virtual desktop.
The figure below demonstrates the user connecting from a different site and having the connection diverted to the parent datacentre. This scenario allows use of dedicated desktops and persistent disks. Floating pools can be used in conjunction with application virtualisation (ThinApp), Roaming Profiles and/or Folder Redirection.
The bi-directional array replication on the figured above is there to demonstrate how roaming profiles and folder redirection could be available on both datacentres. In order for that to happen it is important to make sure that name resolution for the profiles folder is active on both sites and diverting to the appropriate local IP address.
In case of a DR event there are two options. A) Users are told of a new address to reach the connection brokers on the secondary datacentre, B) Load Balancers are smart enough to divert the connections to the correct pool of connection brokers on the secondary datacentre. Some load balancers have the ability to trigger action-scripts that could automatically enable the disabled DR pools from the secondary datacentre.
The key point in this scenario is to understand the steps that should be taken if a DR event has already taken place. These steps can be manual or automated. Some organisations prefer to manually change DNS resolution when required to allow user to connect to the secondary datacentre using a seamless connection alias.
The Active – Passive
This is probably the simplest implementation because requires a single active site. All users connect to the same datacentre independent of the location or region of the site. The upside of this implementation is that all your data is centralised and in some cases array replication is not required. The downside is mostly related to links and network bandwidth usage. If the environment has regional branches, normally coupled with high latency and low bandwidth, the operational expenditures to support a decent connectivity may become unaffordable.
In this scenario all desktop pools from datacentre B are disabled and all virtual desktops are Powered Off, only being enabled during a DR event. It is possible to leave Pools and virtual desktops Off however the savings on electricity outweigh the operational benefits.
The administrator will pre-provision desktop pools required to support the workload from the other datacentre. The drawback here is that the master templates, replicas and linked clones will have to be manually updated time-to-time when there has been a change. The number of changes and the use application virtualisation will directly impact on the number updates to be done on the DR environment. With a bit of luck when the DR event happen you will have latest master image and replicas on the secondary datacentre , and hopefully it will be all re-composed and ready to be powered on.
As I mentioned, this is the simplest implementation and requires some manual intervention. In saying that, there is a lot that could be automated and orchestrated through the use of PowerShell and PowerCLI.
The Full Active-Active
This is the true Active-Active DR implementation where there are no desktop owners and all desktops are exactly the same, being refreshed after user logoff as an option. The applications are user-base and delivered via application virtualisation.
Some pre-requisites are essential to fully utilise this scenario.
User Profiles – All user profiles must be available on datacentres using folder redirection, roaming profiles or a 3rd party solution for persona management. The replication can be achieved in few different ways, being the most common array based replication and Windows DFS.
Floating Pools – All desktop pools should be set to the floating type. This will allow consistent desktop experience to all users independent of the desktop being used.
Application Virtualisation – This is a critical component to select the application that each user should have access to. Application layering is a tendency and the system images should be left as light as possible; perhaps only with the antivirus and few required patches installed. All other applications should be delivered through virtualisation. This will also reduce operational maintenance and number of re-compositions.
Smart Load Balancers -Load Balancers are smart enough to understand where the user is connecting from (home, Site A, Site B) and divert the connection to the closest datacentre.
Assuming array replication is in place and virtual applications are available across both datacentres, the users will always get the exact same working environment every time they connect to any of the desktops on any of the datacentres.
In some circumstances you may allow some desktop pools to be enabled and some disabled. This would be based on the number of travelling users in your organisation. However, in a DR situation all desktop pools would have to be enabled. This can be done manually or trough smart load balancers with ability to trigger action-scripts that could automatically enable the disabled DR pools from the secondary datacentre.
Another important point in this this scenario is that there is no DNS changes required should a DR event occur. Neither users need to be informed of any actions or changes on how they connect to their desktops. This is the true VMware View seamless DR.
All my DR scenarios assume that users do not have any information that need to be available on the secondary datacentre other than their profiles and redirected folder. However, in some circumstances users will necessitate dedicated desktops instead of Floating ones. I recommend treating these cases as an exception and maybe assigning those users to full-clones instead of linked-clones. My article DR for critical desktops in VMware View describe how full-clone virtual desktops can be replicated and reinstated on a secondary datacentre.