Why Can’t I Have A Stretched VMware Horizon View Cluster?

For those of you who are involved in the architecture of Horizon View environments, I'm sure you this is a question you've always asked, but may have only ever got the response of "VMware doesn't support that design". Why doesn't VMware support this design? It works….most of the time?

During my time at VMware I've heard many different reasons as to why VMware does not support this design. Finally I have an accurate reason and wanted to share it with you.

What is a stretched VMware Horizon View cluster?

For those of who are asking yourselves, "What is a stretched VMware Horizon View cluster?", I'll set the scene a little. A stretched Horizon View cluster is when a single View Pod (cluster of View Connection Severs), spans more than one physical location connected by WAN / MAN / MLAN and NOT by a LAN.

An example of a stretched Horizon View Cluster:

Stretched Horizon View Cluster

After an internal discussion around this subject, a colleague and good friend of mine Mike Barnett, who used to be an Escalation Engineer in VMware's GSS (Global Support Services) set the record straight for us. 

So I've decide to adapt his explanation and share it with you. 

Why can't I have a stretched VMware Horizon View Cluster? 

The main reason we (VMware) can't support View across physical locations is due to the Java Messaging Service (JMS) component.

Within View we use AD LDS (Active Directory Lightweight Directory Services) alongside JMS. AD LDS is based on the Active Directory framework which has a robust and resilient site-based architecture which allows it to support distributed environments.  View uses AD LDS to store information such as entitlements, View desktop states etc. This information is distributed to all other Connections Servers in the View Pod (cluster of View Connection Servers) using the built-in AD LDS replication. AD LDS is designed to be a store of information which the Connection Server pulls from when starting up. JMS is a separate system within View which manages the running state of the servers including task scheduling, current desktop states, etc. It runs in memory and loads much of it's initial state information from the AD LDS database on startup.

JMS is designed to be a very fast messaging system, this is the reason we use it within View. The View Connection Servers need to communicate any changes that occur across the cluster as quickly as possible (sub-millisecond speeds). These operations include changes in state of VMs and VM allocation events.

For example, User A logs into Connection Server 1 and is allocated Desktop1 in Floating Pool 1. User B logs in at the same moment to Connection Server 2 selecting the same pool, Floating Pool 1. If this notification is not as close to instantaneous as possible, User B could be incorrectly allocated Desktop1, causing an error message for one of the users when there is a clash of login. Latency, Jitter and other WAN induced conditions can cause this transfer of data to not reach the other Connection Servers in a timely manor, resulting in many different issues.

When a VM's state changes the other Connection servers within the View Pod are sent those changes via JMS. These state changes are then committed to the AD LDS instance for consumption by the Admin Web UI as well as to be used if a Connection Server were to be restarted. The other Connection Servers use this information when making decisions about which desktop to allocate to a given user as well as allowing any given Connection Server to act as the administration access point.

There are additional issues though. When a desktop is enabled for View, whether provisioned in an automated pool or added to a manual pool, the Connection Server populates the VMX file with a 'machine.id' string. (You can see this in the VMX of any View VM.) This machine.id is read by the Agent out of the VMX to get various settings needed for operation of the desktop. One of these attributes (vdi.broker.brokers) contains the hostname of every Connection Server in the cluster. So if you have 4 Connection Servers you will have 4 hostnames. When the View Agent starts it will read this machine.id value and randomly select a Connection Server to connect to so it can report its status.

In a stretched cluster you have physically separated Connection Servers. Any View Agent can talk to any Connection Server. This means that an Agent in Site A could be reporting to a Connection Server in Site B. This causes problems similar to the issue mentioned above. The Agent uses JMS to connect to the Connection Server to send its status and when a user logs On it's possible for a conflict to occur, even with very low latency.

The above reasons address the latency issues, but there are other problems. Because our implementation of JMS is not designed to operate in a multi-site architecture we haven't developed any specific handlers for Connection Servers becoming unavailable. Even with a very fast/low latency/very resilient network connection, the uptime is still not as high as a LAN connection. It just isn't, regardless of the term used to refer to it (WAN / MAN / MLAN etc).

How Should I Architect Horizon View Across Multiple Sites?

VMware recommends having two View Pods, or one View Pod per site if you happen to have more than two sites. This removes any of the issues that we talked about previously and is a full supported architecture. Something similar to the following image.

Multi-Site Horizon View Architecture

If you want to find out more about large-scale, multi-site Horizon View environments, I'd recommend you read this VMware blog post: Demystifying VMware View Large Scale Designs

I know it was a lot to read. I hope this has cleared up any questions you had.

Once again, thanks to Mike Barnett for this great explanation. 

  • http://vmstan.com/ Michael Stanclift

    This is always what I *thought* (latency of communication of ready/use state) and what I would usually tell my customers, but its great to know the actual reasons why. I also didn’t realize the piece about the VMX file containing information about all the brokers, good to know.

  • Forbes Guthrie

    So how about splitting Security Servers and Connection Servers across a WAN? I kinda know the official answer, but I’m interested in what you think. I’ve actually tested this, and continue to run it this way in lab environment. It’s potentially very useful as remote sites are aggregated through a datacenter for external connections (internet).

  • Mike Barnett

    Hi Forbes,

    Having Security Servers across a WAN from the Connection Servers they are connected to is also technically unsupported.

    I will say however that the limitations that drastically affect the Connection Servers when they are WAN separated are not quite as bad for a Security Server. The JMS channel between the SS and CS (4001) is not as heavily used as it carries desktop connection information so there is less chance of information being missed.

    The main concern in this case is just that the code wrapping either end of the SS/CS connection isn’t designed to have high latency so there could be breakdowns in tolerance.

    Bottom line, you shouldn’t run into any major issues. Once you approach 10k desktops with >5 logins/second across the cluster you could hit something. I haven’t tested any of this live though so you may be fine. :)

    -Mike

  • forbsy

    For the branch office use case. Now, it looks like VMware Agent Direct Connection (VADC) is supported.
    That would mean that virtual desktops should exist at the branch office
    while connection servers exist at the primary datacentre (due to view
    not supported in a stretch pod).

    Does this mean that the branch ESXi servers (that physically exist in
    branch office) need to be part of the core datacentre vCenter (able to
    be added to core datacentre vCenter Server), so that View Administrator
    can see the branch office ESXi servers when provisioning desktop pools
    to the branch office?

    As an alternative, could we create the branch desktop ESXi servers under
    a branch vCenter Server and use Linked Mode? The View Admin guide
    states that you can add vCenter Server Instances in a Linked Mode group
    to View Administrator. This alternative seems like the better
    architecture as the connection/security servers exist in the core
    vCenter Server datacentre, while the branch office ESXi servers are
    managed by a local vCenter Server.

  • Michael Frank

    I know this post is a couple years old but has this changed. With VMware’s announcement of Horizon 6.2 and support for VSAN stretched clusters it seems that a stretched view deployment is now possible. I have been looking for any documentation on that but can’t find it. Can you assist? Thanks.

  • Ray Heffer

    Hi Michael, no this hasn’t changed. We do support stretched clusters with Horizon 6.2 now but Simons blog still holds true. The supported configuration is to create affinity rules to keep all of the Connection servers at one site (LAN). The alternative if you don’t have a stretched cluster is Cloud Pod Architecture, which provides global entitlements across two sites (pods).