This is the first part in a series about building PSC architecture the rest of the articles are here:
- Part 1 Design for Platform services controller
- Part 2 Installing Platform services controller
- Part 3 Setting up replication on the platform services controller
The platform services controller that was introduced in vSphere 6.0 has been a source of challenge for a lot of people who are upgrading into it. I have struggled to identify the best architecture to follow. This article assumes that you want to have a multi-vCenter single sign on domain with external PSC’s. There are a few key items to consider in architecting PSC’s:
Recovery
- If you lose all PSC’s you cannot connect a vCenter to a new PSC you must re-install the vCenter loosing all data
- To recover all failed PSC’s restore a single PSC from backup (Image level backup is supported) then redeploy new PSC’s for the rest. Restoring multiple PSC’s may introduce some inconsistencies depending on time of backup.
- In 6.5 vCenter cannot be repointed to a PSC in a different site on the same domain (6.0 can)
- All 6.x versions of vCenter do not support repointing to a PSC in a different domain
- If you lose all PSC’s at a site you can install new PSC’s at the site as long as at least one PSC at another site survived then repoint the vCenter to the new PSC
Replication
- All PSC replication is bi-directional but not automatically in a ring (big one)
- By default each PSC is replicating with only a single other PSC (the one you select when installing the additional PSC)
- Site names do not have anything to do with replication today they are a logical construct for load balancers and future usage
- Changes are not unique to a site but to a domain – in other words all changes at all sites are replicated to all other PSC’s assuming they are part of the domain
Availability
- vCenter points to a single PSC never more than one at a time
- PSC’s behind a load balancer (up to 4 supported) are active/passive via load balancer configuration
- If you use a load balancer configuration for PSC and have a failure of the active PSC the load balancer repoints to another PSC and no reconfiguration is required
- Site name is important with load balancers you should place all PSC’s behind a load balancer in their own site – non-load balanced PSC’s at same site should have a different site name
Features
- PSC’s have to be part of the same domain together to use enhanced linked mode
Performance
- PSC can replicate to one or many other PSC’s (with an impact with many). You want to minimize the number of replication partners because of performance impact.
Topology
- Ring is the supported topology best practice today
- PSC’s know each other by IP address or domain name (ensure domain is correct including PTR) – using IP is discouraged because it can never be changed; use of FQDN allows for IP mobility.
- PSC’s are authentication sources so NTP is critical and the same NTP across all PSC’s is critical. (If you join one PSC to AD all need to be joined to same AD – best not to mix appliance and windows PSC’s)
- The only reason to have external PSC’s is to use enhanced linked mode – if you don’t need ELM use an embedded PSC with vCenter and back vCenter up at the same time – see http://vmware.com/go/psctree
Scalability
- Current limits are on 8 PSC’s in a domain in 6.0 and 10 in a domain in 6.5
With all of these items in hand here are some design tips:
- Always have n+1 PSC’s in other words never have a single PSC in a domain when using ELM
- Have a solid method for restoring your PSC’s – Image level or 6.5 restore feature
So what is the correct topology for PSC’s?
This is a challenging question. Let’s identify some design elements to consider
- Failure of a single component should not create replication partitions
- Complexity of setup should be minimized
- Number of replication agreements should be minimized for performance reasons
- Scaling out additional PSC’s should be as simple as possible
Ring
I spent some time in the ISP world and learned to love rings. They create two paths to every destination and are easy to setup and maintain. They do have issues when two points fail at the same time and potentially create partitions of routing until one of the two is restored. VMware recommends a ring topology for PSC’s at the time of this article as shown below:
Let’s review this topology against the design elements:
- Failure of a single component should not create replication partitions
- True due to ring there are two ways for everything to replicate
- Complexity of setup should be minimized
- The setup ensures redundancy without lots of manually created performance impacting replication agreements (one manual agreement)
- Number of replication agreements should be minimized for performance reasons
- True
- Scaling out additional PSC’s should be as simple as possible
- Adding a new PSC means the following:
- Add new PSC joined to LAX-2
- Add new agreement between new PSC and SFO-1
- Remove agreement between LAX-2 and SFO-1
- Adding a new PSC means the following:
Looks mostly simple you do need to track who is providing your ring backup loop. Which is a manual documentation process today.
Ring with additional redundancy
The VMware validated design states that for a two site enhanced linked mode topology you should build the following:
A few items to illustrate (in case you have not read the VVD)
- Four vCenters
- Four PSC’s (in blue)
- Each PSC replicates with its same site peer and one remote site peer thus making sure it’s changes are stored at two sites and with two copies that are then replicated locally and remotely (all four get it)
Let’s evaluate against the design elements:
- Failure of a single component should not create replication partitions
- True due to ring there are four ways for everything to replicate
- Complexity of setup should be minimized
- The setup requires forethought and at least one manual replication agreements
- Number of replication agreements should be minimized for performance reasons
- It has more replication agreements
- Scaling out additional PSC’s should be as simple as possible
- Adding a new PSC means potentially more replication agreements or more design
Update: The VVD reached out and wanted to be clear that adding additional sites is pretty easy. I believe the challenge comes when you try to identify disaster zones. Because PSC’s are replicating all changes everywhere it does not matter if all replication agreements fail you can still regenerate a site.
Which option should I use?
That is really up to you. I personally love the simplicity of a ring. Nether of these options increase availability of the PSC layer they are about data consistency and integrity. Use a load balancer if your management plane SLA does not support downtime.
Bloody great article (s) mate around PSC design. Much appreciated!
Thanks for reading and your kind words
Great summary!
Thanks for reading and commenting
Fantastic post! Very informative