joseph

February 28, 2017

Repoint 6.x vCenter to a new PSC

Since a vCenter has a connection to a single PSC it’s important to understand how to move between PSC’s and deploy new ones when old ones have failed. This article details this mobility and process.

Once installed check for working vCenter

Then login via ssh and check which PSC is being used

Let’s repoint it to psc2.griffiths.local

cmsso-util repoint --repoint-psc psc2.griffiths.local

Now we are pointing to psc2 at site1. In 6.0 you were able to repoint a vCenter to different site PSC’s this is no longer available in 6.5 (Yep no longer possible remember this trying to repoint can cause some really bad stuff in 6.5).

As you can see we have repointed the psc from 1 to 2 at the same site:

So what do you do when all your PSC’s at a site have failed? (Don’t have a single PSC at a site first off..) Or this:

Install a new PSC pointing to a remaining site psc we will use psc3 at site2 to create a new PSC5 at site1. In order to test this I shutdown psc1 and psc2 to simulate failures.

So we are creating:

After the PSC is installed it will replicate with psc3.griffiths.local only. We then can repoint vc1 to psc5 and rebuild missing psc’s at site1. We have to make sure PSC5 was deployed correctly first via visiting it’s webpage:

Now we can repoint the vc to psc5 at site1.

And it’s working

William Lam posted a script to automatically change vCenter to new PSC when a failure is detected and it’s here:

http://www.virtuallyghetto.com/2015/12/how-to-automatically-repoint-failover-vcsa-to-another-replicated-platform-services-controller-psc.html

For those who don’t want to read the script it’s very simple it runs on the vCenter appliance and checks the PSC web page for a return code of 200 if it fails 3 times it switches to another PSC. It runs as a automated task every x minutes.

Remove an old PSC

cmsso-util unregister –node-pnid OLD_PSC_Name –username administrator@sso_domainname

So to remove psc1.griffiths.local I would type:

cmsso-util unregister –node-pnid psc1.griffiths.local –username administrator@vsphere.local

February 28, 2017

Setting up replication on the platform services controller

In the previous set of articles I discussed the following:

In this article I will discuss how to create the reference design from the following:

The reference replicate design should be:

So we need to add two new replication agreements.

PSC1 <->PC3

First login to psc1.griffiths.local and enable shell. Then change the directory into /usr/lib/vmware-vmdir/bin

Look at the current agreements

Add an agreement with psc3.griffiths.local using the following command:

./vdcrepadmin -f createagreement -2 -h psc3.griffiths.local -u Administrator -H psc1.griffiths.local

You can see that we now have two agreements. Rinse and repeat on PSC2 <=> PSC4 and we have the VVD topology. Notice in this example that I did everything from PSC1 while referencing different PSC’s:

Now you have your replication agreements. Looking at the vdcrepadmin command you have the following options

vdcrepadmin -f showpartners

 vdcrepadmin -f showpartnerstatus

 vdcrepadmin -f showservers

 vdcrepadmin -f createagreement [-2]

 vdcrepadmin -f removeagreement [-2]

 vdcrepadmin -f isfirstcycledone

This essentially allows you to:

Show replication partners – showpartners
Show replication status (including latest replication number) – showpartnerstatus
Show all PSC’s in domain – showservers
Create replication agreement – createagreement
Remove replication agreement (don’t remove all) – removeagreement
Check that the PSC has done initial replication – isfirstcycledone

So there you have it health is checked via showpartnerstatus.

February 28, 2017

Installing Platform services controller

In the previous post I discussed the reference architecture and design tips for the PSC. Here are all the posts in the series:

Part 1 Design for Platform services controller
Part 2 Installing Platform services controller
Part 3 Setting up replication on the platform services controller

In this article I will setup four platform services controllers at two sites. In my home lab I didn’t want to create complexity by using two different routed networks so I left them on the same subnet but created two sites inside the domain. By the end of this article I will have created:

Four PSC’s at two sites. So lets get into the install:

I will use the 6.0 appliance for this article. Before starting I have setup DNS for all the names both forward and reverse.

Connect to the ESXi host:

Name here is the name that will show up in vCenter

Choose a PSC

Creating a new SSO domain and calling it vsphere.local with the site name of site1

I changed the IP after this screenshot so it should read 192.168.10.160 and because it’s a demo I am syncing with my ESXi host.. in a real situation you want to sync with NTP.

After installer is complete try to visit the web page for the newly installed PSC. If you don’t get this page remove the PSC and try again. Error handling is not the best.

I enabled ssh (if you forgot login to the console and enable there) so I ssh’ed into the new PSC to perform some checks. Remember to use shell.set ---enabled True to enable bash shell then type shell to enter it.

Change the directory to /usr/lib/vmware-vmdir/bin/ and execute the command vdcrepadmin as shown to identify all PSC’s in the domain

The command showpartners shows the replication partners (which don’t exist yet)

Psc2.griffiths.local

We are going to join current domain I will not repeat screen shots that are redundant. The key thing to notice here is that I am creating a replication agreement between psc1 and psc2 that is bi-directional (does the PSC support uni-directional replication: NO – can you do it yes..)

Here we are joining the PSC domain and entering the name of it’s initial replication partner:

Notice how it pulls the site names out of psc1

Save and go to the web site to check after deployment

Login via SSH notice here that I have used the -h (host) to point to psc1.griffiths.local any PSC can be queried from any other. Showservers now shows both PSC’s

Showpartners or replication agreements shows that psc1.griffiths.local is in a replication agreement with psc2. If you ran the command on host psc2 it would show psc1.griffiths.local as the replication member.

Now we want to add a second site and another PSC and replicate it with psc2

Key item: I have entered psc2 as the joining PSC not psc1. If I had entered PSC 1 here it would be replicating with 1 and ignore 2.

Once deployed go to web page to confirm working deployment

SSH to test connections notice the replicate agreement for psc2 now shows two partners. The showservers command also shows the site name for each PSC.

Here you can see the full replicate agreements

PSC4

Skipping known steps and joining to psc3 site2

Once deployed check for working PSC via web portal

Replication with only psc3.griffiths.local

We now have:

As expected. This is not quite the ring but closer.

February 17, 2017

Design for Platform services controller (PSC)

This is the first part in a series about building PSC architecture the rest of the articles are here:

The platform services controller that was introduced in vSphere 6.0 has been a source of challenge for a lot of people who are upgrading into it. I have struggled to identify the best architecture to follow. This article assumes that you want to have a multi-vCenter single sign on domain with external PSC’s. There are a few key items to consider in architecting PSC’s:

Recovery

If you lose all PSC’s you cannot connect a vCenter to a new PSC you must re-install the vCenter loosing all data
To recover all failed PSC’s restore a single PSC from backup (Image level backup is supported) then redeploy new PSC’s for the rest. Restoring multiple PSC’s may introduce some inconsistencies depending on time of backup.
In 6.5 vCenter cannot be repointed to a PSC in a different site on the same domain (6.0 can)
All 6.x versions of vCenter do not support repointing to a PSC in a different domain
If you lose all PSC’s at a site you can install new PSC’s at the site as long as at least one PSC at another site survived then repoint the vCenter to the new PSC

Replication

All PSC replication is bi-directional but not automatically in a ring (big one)
By default each PSC is replicating with only a single other PSC (the one you select when installing the additional PSC)
Site names do not have anything to do with replication today they are a logical construct for load balancers and future usage
Changes are not unique to a site but to a domain – in other words all changes at all sites are replicated to all other PSC’s assuming they are part of the domain

Availability

vCenter points to a single PSC never more than one at a time
PSC’s behind a load balancer (up to 4 supported) are active/passive via load balancer configuration
If you use a load balancer configuration for PSC and have a failure of the active PSC the load balancer repoints to another PSC and no reconfiguration is required
Site name is important with load balancers you should place all PSC’s behind a load balancer in their own site – non-load balanced PSC’s at same site should have a different site name

Features

PSC’s have to be part of the same domain together to use enhanced linked mode

Performance

PSC can replicate to one or many other PSC’s (with an impact with many). You want to minimize the number of replication partners because of performance impact.

Topology

Ring is the supported topology best practice today
PSC’s know each other by IP address or domain name (ensure domain is correct including PTR) – using IP is discouraged because it can never be changed; use of FQDN allows for IP mobility.
PSC’s are authentication sources so NTP is critical and the same NTP across all PSC’s is critical. (If you join one PSC to AD all need to be joined to same AD – best not to mix appliance and windows PSC’s)
The only reason to have external PSC’s is to use enhanced linked mode – if you don’t need ELM use an embedded PSC with vCenter and back vCenter up at the same time – see http://vmware.com/go/psctree

Scalability

Current limits are on 8 PSC’s in a domain in 6.0 and 10 in a domain in 6.5

With all of these items in hand here are some design tips:

Always have n+1 PSC’s in other words never have a single PSC in a domain when using ELM
Have a solid method for restoring your PSC’s – Image level or 6.5 restore feature

So what is the correct topology for PSC’s?

This is a challenging question. Let’s identify some design elements to consider

Failure of a single component should not create replication partitions
Complexity of setup should be minimized
Number of replication agreements should be minimized for performance reasons
Scaling out additional PSC’s should be as simple as possible

Ring

I spent some time in the ISP world and learned to love rings. They create two paths to every destination and are easy to setup and maintain. They do have issues when two points fail at the same time and potentially create partitions of routing until one of the two is restored. VMware recommends a ring topology for PSC’s at the time of this article as shown below:

Let’s review this topology against the design elements:

Failure of a single component should not create replication partitions
- True due to ring there are two ways for everything to replicate
Complexity of setup should be minimized
- The setup ensures redundancy without lots of manually created performance impacting replication agreements (one manual agreement)
Number of replication agreements should be minimized for performance reasons
- True
Scaling out additional PSC’s should be as simple as possible
- Adding a new PSC means the following:
  - Add new PSC joined to LAX-2
  - Add new agreement between new PSC and SFO-1
  - Remove agreement between LAX-2 and SFO-1

Looks mostly simple you do need to track who is providing your ring backup loop. Which is a manual documentation process today.

Ring with additional redundancy

The VMware validated design states that for a two site enhanced linked mode topology you should build the following:

A few items to illustrate (in case you have not read the VVD)

Four vCenters
Four PSC’s (in blue)
Each PSC replicates with its same site peer and one remote site peer thus making sure it’s changes are stored at two sites and with two copies that are then replicated locally and remotely (all four get it)

Let’s evaluate against the design elements:

Failure of a single component should not create replication partitions
- True due to ring there are four ways for everything to replicate
Complexity of setup should be minimized
- The setup requires forethought and at least one manual replication agreements
Number of replication agreements should be minimized for performance reasons
- It has more replication agreements
Scaling out additional PSC’s should be as simple as possible
- Adding a new PSC means potentially more replication agreements or more design

Update: The VVD reached out and wanted to be clear that adding additional sites is pretty easy. I believe the challenge comes when you try to identify disaster zones. Because PSC’s are replicating all changes everywhere it does not matter if all replication agreements fail you can still regenerate a site.

Which option should I use?

That is really up to you. I personally love the simplicity of a ring. Nether of these options increase availability of the PSC layer they are about data consistency and integrity. Use a load balancer if your management plane SLA does not support downtime.

January 21, 2017

NSX Manager still running but disconnected from vCenter

A quick note in case you run into this issue. I was running into problems where my NSX manager was running and everything seemed fine (NSX manager login / Console) but I could not manage NSX elements from inside vCenter. No NSX manager was showing up. Reconnecting to vCenter or rebooting would resolve this issue but then I had the problem again the next day. I could not figure out the issue… then it dawned on me what happens every day…. BACKUP! Somehow my NSX manager was added to the nightly backup and it would lose connection during this time. Here is the only approved method for backing up a NSX manager:

Use the configuration backup in the NSX manager administration console to make normal and regular backups

To recover a NSX manager do the following:

Deploy a new NSX manager using OVF (same version of NSX as backup) with same IP as original manager
Restore the configuration from the backup
Reboot the NSX manager to ensure clean configuration
Ensure it shows up in the GUI

Image level backups are not supported or a good idea 🙂

December 2, 2016

VMkernel types updated with design guidance for multi-site

Holy crap what do all these VMware VMkernel type mean? I started this article and realized I had already written one here. Sad when google leads you to something you wrote… looks like I don’t remember too well… Perhaps I should just go yell for the kids to get off my lawn now. I wanted to take a minute to revise my post with some new things I have learned and some guidance.

From my previous post:

vMotion traffic – Required for vMotion – Moves the state of virtual machines (active datadisk svMotion, active memory and execution state) during a vMotion
Provisioning traffic – Not required will use management network if not setup – cold migration, cloning and snapshot creation (powered off virtual machines = cold)
Fault tolerance traffic (FT) – Required for FT – Enables fault tolerance traffic on the host – only a single adapter may be used for FT per host
Management traffic – Required – Management of host and vCenter server
vSphere replication traffic – Only needed if using vSphere replication– outgoing replication data from ESXi host to vSphere replication server
vSphere replication NFC traffic – Only needed if using vSphere replication – handles incoming replication data on the target replication site
Virtual SAN – Required for VSAN – virtual san traffic on the host
VXLAN – used for NSX not controlled from the add vmkernel interface.

I wanted to provide a little better explanation around design elements with some interfaces. Specifically I want to focus on vMotion and Provisioning traffic. Let’s create a few scenario’s and see what interface is used assuming I have all the VMkernel interfaces listed above:

VM1 is running and we want to migrate from host1 to host2 at datacenter1 – vMotion
VM1 is running with a snapshot and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (if it does not exist management network is used)
VM1 is running with a snapshot and we want to storage migrate from host1 DC1 to host4 DC3 – storage vMotion – Provisioning traffic (if it does not exist management network is used)
VM1 is not running and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (very low bandwidth used)
VM1 is not running has a snapshot and we want to migrate from host1 to host2 at datacenter1 – Provisioning traffic (very low bandwidth used)
VM2 is being created at datacenter1 – Provisioning traffic

So design guidance in a multi-site implementation you should have the following interfaces if you wish to separate the TCP-IP stack or use network IO control to avoid bad neighbor situations. (Or you could just assign it all to management vmk and go nuts on that interface = bad idea)

Management
vMotion
Provisioning

Use of other vmkernel interfaces depends on if you are using replication, vSAN or NSX.

Should you have multi-nic vMotion?

Multi-nic vMotion enables faster vMotion of multiple entries off a host (as long as they don’t have snapshots). It still is a good idea if you have large vm’s or lots of vm’s on a host.

Should you have multi-nic Provisioning?

No idea if it’s even supported or a good idea. Provisioning network is used for long distance vMotion so the idea might be good… I would not use it today.

November 17, 2016

Should IT build a castle or a mobile home?

So I have many hobbies to keep my mind busy during idle times… like when driving a car. One of my favorite hobbies is to identify the best candidate locations to live in if the Zombie apocalypse was to happen. As I drive in my car between locations I see many different buildings and I attempt to rate large buildings by their Zombie proof nature. There are many things to consider in the perfect Zombie defense location for example:

Avoiding buildings with large amounts of windows or first floor windows
Building made of materials that cannot be bludgeoned open for example stone
More than one exit but not too many exits
A location that can be defended on all sides and allows visible approach

There are many other considerations like proximity to water and food etc.. but basically I am looking for the modern equivalent of a castle:

OK what does this have to do with IT

Traditional infrastructure is architected like a castle its primary goal is to secure at the perimeter and be very imposing to keep people out. During a zombie attack this model is great until they get in then it becomes a grave yard. IT architects myself include spend a lot of time considering all the factors that are required to build the perfect castle. There are considerations like:

Availability
Recoverability
Manageability
Performance
Security

That all have to be considered and as you add another wing to your castle every one of these elements of design must be considered for the whole castle. We cannot add a new wing that bridges the moat without extending the moat etc.. Our design to build the perfect castle has created a monolithic drag. While development teams move from annual releases to quarters or weeks or days we continue to attempt to control the world from a perimeter design perspective. If we could identify all possible additions to the castle at the beginning we could potentially account for them. This was true in the castle days: there were only so many ways to get into the castle and so many methods to break in. Even worse the castle provided lots of nooks and locations for zombies to hide and attack me when not expecting it.. This is the challenge with the Zombie attack they don’t follow the rules they just might create a ladder out of zombie bodies and get into your castle (World War Z style). If we compare zombies to the challenges being thrown at IT today the story becomes valid. How do we deal with constant change and unknown? How do we become agile to change? Is it from building a better castle?

Introducing the mobile home

Today I realized that the perfect solution to my Zombie question was the mobile home. We can all assume that I need a place to sleep. Something that I can secure with reasonable assurance. I can re-enforce the walls and windows on a mobile home and I gain something I don’t have with a castle: mobility. I can move my secured location and goods to new locations. My mobile home is large enough to provide for my needs without providing too many places for zombies to hide. IT needs this type of mobility. Cloud has provided faster time to market for many enterprises but in reality you are only renting space in someone else’s castle. There are all types of methods to secure your valuables from mine but in reality we are at the mercy of the castle owner. What if my service could become a secured mobile home… that would provide the agility I need in the long run. The roach motel is very alive and well in cloud providers today. Many providers have no cross provider capabilities while others provide tools to transform the data between formats. My mobile home needs to be secure and not reconfigured each time I move between locations while looking for resources or avoiding attack. We need to reconsider IT as a secured mobile home and start to build this model. Some functions to consider in my mobile home:

Small enough to provide the required functions (bathroom, kitchen and sleeping space or in IT terms business value) and not an inch larger than required
Self contained security the encircles the service
Mobility without interruption of services

Thanks for reading my rant. Please feel free to provide your favorite zombie hiding location or your thoughts on the future of IT.

November 14, 2016

Breaking out a SSO/PSC to enable enhanced linked mode

The single sign on used to be a fairly painless portion of vCenter (once we got to 5.5, in 5.0 it was a major pain). It was essentially a lightweight directory (vsphere.local) and gateway to active directory. The platform services controller (PSC) of vCenter 6 is a completely different animal. It performs a lot of new functions that are not easy to transfer between instances. For example the PSC does the following:

Handles and stores SSL certificates
Handles and stores license keys
Handles and stores permissions via global permissions layer
Handles and stores replication of Tags and Catagories
Built in automation replication between different sites

Why does it do all these and why do I care?

Well VMware has come to understand that virtual machines cannot be bound to a specific location more and more customer want Hybrid and multi-site capabilities while keeping the same management. A lot of the management functions are based around Tags and permissions have a over arching layer to provide that functionality is huge. I assume that we are going to see more features passed up to the PSC layer in order to make cross site/ vCenter features available.

Architectural change

In 6.0 VMware changed the architecture to have external PSC’s as a preferred mode of operation. In fact they support up to 8 replicated PSC’s and they have two constructs that matter:

Domain (traditionally this has been vsphere.local)
Sites (Physical locations)

Site designation changes how the PSC’s and their multi-masters replicate (choosing to replicate to a single instance at each site then have that instance replicate to local nodes)

The change to external PSC’s is a challenge for many users. First let me be clear about a challenge you can only have one domain: merging domains is not supported. Once you get to 6 you cannot leave a domain and join a different domain I have not seen instructions to do it and it does not seem to be supported. In 5 you can leave a SSO domain and join a different domain so if you are still on 5 and wish to join multiple machines to the same domain do it while on 5 using SSO. If you wish to move from an embeded PSC to an external PSC the process is pretty simple:

Install a new PSC (can be windows or Linux) joined to the embedded PSC
Repoint the vCenter to the new PSC (instructions here)
Remove the old PSC

The key takeaway for all of you who might have slotted off during this article is this: Make any topology changes to vCenter domains before upgrading to 6.

November 14, 2016

Long Distance Cross vCenter vMotion requirements

The ability to move virtual machines long distances between two datacenters while running seems like the key example of the power of abstraction. VMware has enabled this feature but it has a number of requirements that make the cost of ownership a little high. All of these requirements are listed in VMware KB articles but you have to mine them for the details to ensure you are compatible. Having recently been stung by these requirements I thought I would collect them into a single location.

Assumptions:

The following assumptions are made:

You are running two vCenters one at each site
You are running virtual distributed switches at each site

KB Articles mined for the data

Requirements

The source and destination vCenter server instances and ESXi hosts must be running version 6.0 or later.
Requires Enterprise Plus licensing
When initiating the moves in the web client both source and destination vCenter instances must be in Enhanced Linked mode and in the same vCenter Single Sign-On domain (When using API this is not a requirement)
Both vCenter Servers must be time synced for SSO to work
For migration of compute resources only, both ESXi hosts must be connected to the shared virtual machine storage.
When using the vSphere APIs/SDK, both vCenter Server instances may exist in separate vSphere Single Sign-On domains. Additional parameters are required when performing a non-federated cross vCenter Server vMotion.
MAC address must no conflict (different vCenter ID’s will ensure this)
vMotion cannot take place from distributed switch to standard switch
vMotion cannot take place between distributed switches of different versions (source and destination vDS must be the same version)
RTT (round-trip time) latency of 150 milliseconds or less, between hosts
You must create a routeable network for the Traffic for Cold migrations (Provisioning network from VMkernel types)

These requirements can really bite you if you are not careful. Notice there are no constraints on vMotioning from a standard switch to a distributed switch which helps you get around version differences. The truth is that vMotion is a miracle of engineering and then cross vCenter vMotion is an even better miracle but it comes at a cost. Essentially best case scenario you have to have two vCenters in enhanced linked mode on the same version of ESXi, with the same hardware type or in EVC with the same version of distributed switches. It’s a lot of asks to enable the features and something to consider if your planning on using long distance cross vCenter vMotion.

November 11, 2016

Configuring a NSX load balancer from API

A customer asked me this week if there was any examples of customers configuring the NSX load balancer via vRealize Automation. I was surprised when google didn’t turn up any examples. The NSX API guide (which is one of the best guides around) provides the details for how to call each element. You can download it here. Once you have the PDF you can navigate to page 200 which is the start of the load balancer section.

Too many Edge devices

NSX load balancers are Edge service gateways. A normal NSX environment may have a few while others may have hundreds but not all are load balancers. A quick API lookup of all Edges provides this information: (my NSX manager is 192.168.10.28 hence the usage in all examples)

https://192.168.10.28/api/4.0/edges

        <edgeSummary>
            <objectId>edge-57</objectId>
            <objectTypeName>Edge</objectTypeName>
            <vsmUuid>420CD713-469F-7053-8281-A7BD66A1CD46</vsmUuid>
            <nodeId>92484cee-ab3c-4ed2-955e-e5bd135f5be5</nodeId>
            <revision>2</revision>
            <type>
                <typeName>Edge</typeName>
            </type>
            <name>LB-1</name>
            <clientHandle></clientHandle>
            <extendedAttributes/>
            <isUniversal>false</isUniversal>
            <universalRevision>0</universalRevision>
            <id>edge-57</id>
            <state>deployed</state>
            <edgeType>gatewayServices</edgeType>
            <datacenterMoid>datacenter-21</datacenterMoid>
            <datacenterName>Home</datacenterName>
            <tenantId>default</tenantId>
            <apiVersion>4.0</apiVersion>
            <recentJobInfo>
                <jobId>jobdata-34935</jobId>
                <status>SUCCESS</status>
            </recentJobInfo>
            <edgeStatus>GREEN</edgeStatus>
            <numberOfConnectedVnics>1</numberOfConnectedVnics>
            <appliancesSummary>
                <vmVersion>6.2.0</vmVersion>
                <vmBuildInfo>6.2.0-2982179</vmBuildInfo>
                <applianceSize>compact</applianceSize>
                <fqdn>NSX-edge-57</fqdn>
                <numberOfDeployedVms>1</numberOfDeployedVms>
                <activeVseHaIndex>0</activeVseHaIndex>
                <vmMoidOfActiveVse>vm-283</vmMoidOfActiveVse>
                <vmNameOfActiveVse>LB-1-0</vmNameOfActiveVse>
                <hostMoidOfActiveVse>host-29</hostMoidOfActiveVse>
                <hostNameOfActiveVse>vmh1.griffiths.local</hostNameOfActiveVse>
                <resourcePoolMoidOfActiveVse>resgroup-27</resourcePoolMoidOfActiveVse>
                <resourcePoolNameOfActiveVse>Resources</resourcePoolNameOfActiveVse>
                <dataStoreMoidOfActiveVse>datastore-31</dataStoreMoidOfActiveVse>
                <dataStoreNameOfActiveVse>SYN8-NFS-GEN-VOL1</dataStoreNameOfActiveVse>
                <statusFromVseUpdatedOn>1478911807005</statusFromVseUpdatedOn>
                <communicationChannel>msgbus</communicationChannel>
            </appliancesSummary>
            <hypervisorAssist>false</hypervisorAssist>
            <allowedActions>
                <string>Change Log Level</string>
                <string>Add Edge Appliance</string>
                <string>Delete Edge Appliance</string>
                <string>Edit Edge Appliance</string>
                <string>Edit CLI Credentials</string>
                <string>Change edge appliance size</string>
                <string>Force Sync</string>
                <string>Redeploy Edge</string>
                <string>Change Edge Appliance Core Dump Configuration</string>
                <string>Enable hypervisorAssist</string>
                <string>Edit Highavailability</string>
                <string>Edit Dns</string>
                <string>Edit Syslog</string>
                <string>Edit Automatic Rule Generation Settings</string>
                <string>Disable SSH</string>
                <string>Download Edge TechSupport Logs</string>
            </allowedActions>
        </edgeSummary>

This is for a single Edge gateway in my case I have 57 Edges deployed over the life of my NSX environment and 15 active right now. But only Edge-57 is a load balancer. This report does not provide anything that can be used to identify it as a load balancer from a Edge as a firewall. In order to identify if it’s a load balancer I have to query it’s load balancer configuration using:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config

Notice the addition of the edge-57 name to the query. It returns:

<loadBalancer>
    <version>2</version>
    <enabled>true</enabled>
    <enableServiceInsertion>false</enableServiceInsertion>
    <accelerationEnabled>false</accelerationEnabled>
    <monitor>
        <monitorId>monitor-1</monitorId>
        <type>tcp</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <name>default_tcp_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-2</monitorId>
        <type>http</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_http_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-3</monitorId>
        <type>https</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_https_monitor</name>
    </monitor>
    <logging>
        <enable>false</enable>
        <logLevel>info</logLevel>
    </logging>
</loadBalancer>

Notice that this edge has load balancer enabled as true with some default monitors. To compare here is a edge without the feature enabled:

https://192.168.10.28/api/4.0/edges/edge-56/loadbalancer/config

<loadBalancer>
    <version>1</version>
    <enabled>false</enabled>
    <enableServiceInsertion>false</enableServiceInsertion>
    <accelerationEnabled>false</accelerationEnabled>
    <monitor>
        <monitorId>monitor-1</monitorId>
        <type>tcp</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <name>default_tcp_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-2</monitorId>
        <type>http</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_http_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-3</monitorId>
        <type>https</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_https_monitor</name>
    </monitor>
    <logging>
        <enable>false</enable>
        <logLevel>info</logLevel>
    </logging>
</loadBalancer>

Enabled is false with the same default monitors. So now we know how to identify which edges are load balancers:

Get list of all Edges via API and pull out id element
Query each id element for load balancer config and match on true

Adding virtual servers

You can add virtual servers assuming the application profile and pools are already in place with a POST command with a XML body payload like this (the virtual server IP must already be assigned to the Edge as an interface):

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers

<virtualServer>
<name>http_vip_2</name>
<description>http virtualServer 2</description>
<enabled>true</enabled>
<ipAddress>192.168.10.18</ipAddress>
<protocol>http</protocol>
<port>443,6000-7000</port> 
<connectionLimit>123</connectionLimit>
<connectionRateLimit>123</connectionRateLimit>
<applicationProfileId>applicationProfile-1</applicationProfileId>
<defaultPoolId>pool-1</defaultPoolId>
<enableServiceInsertion>false</enableServiceInsertion>
<accelerationEnabled>true</accelerationEnabled>
</virtualServer>

You can see it’s been created. A quick query:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers

<loadBalancer>
    <virtualServer>
        <virtualServerId>virtualServer-5</virtualServerId>
        <name>http_vip_2</name>
        <description>http virtualServer 2</description>
        <enabled>true</enabled>
        <ipAddress>192.168.10.18</ipAddress>
        <protocol>http</protocol>
        <port>443,6000-7000</port>
        <connectionLimit>123</connectionLimit>
        <connectionRateLimit>123</connectionRateLimit>
        <defaultPoolId>pool-1</defaultPoolId>
        <applicationProfileId>applicationProfile-1</applicationProfileId>
        <enableServiceInsertion>false</enableServiceInsertion>
        <accelerationEnabled>true</accelerationEnabled>
    </virtualServer>
</loadBalancer>

Shows it’s been created. To delete just use the virtualServerId and pass to DELETE

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers/virtualserverID

Pool Members

For pools you have to update the full information to add a backend member or for that matter remove a member. So you first query it:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/pools

<?xml version="1.0" encoding="UTF-8"?>
<loadBalancer>
    <pool>
        <poolId>pool-1</poolId>
        <name>pool-1</name>
        <algorithm>round-robin</algorithm>
        <transparent>false</transparent>
    </pool>
</loadBalancer>

Then you form your PUT with the data elements you need (taken from API guide).

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/pools/pool-1

<pool>
<name>pool-1</name>
<description>pool-tcp-snat</description>
<transparent>false</transparent>
<algorithm>round-robin</algorithm>
<monitorId>monitor-3</monitorId>
<member>
<ipAddress>192.168.10.14</ipAddress>
<weight>1</weight>
<port>80</port>
<minConn>10</minConn>
<maxConn>100</maxConn>
<name>m5</name>
<monitorPort>80</monitorPort>
</member>
</pool>

In the client we see a member added:

Tie it all together

Each of these actions have a update delete and query function that can be done. The real challenge is taking the API inputs and creating user friendly data into vRealize Input to make it user friendly. NSX continues to amaze me as a great product that has a very powerful and documented API. I have run into very little issues trying to figure out how to do anything in NSX with the API. In a future post I may provide some vRealize Orchestrator actions to speed up configuration of load balancers.