VRO to delete lots of virtual switches in NSX

I created a workflow to create 5000 virtual switches in NSX… which was cool.  But it did create 5000 virtual switches which now needed to be deleted.  So I created a new workflow to delete them.   In my case I created all 5000 switches on the same distributed logical router.   I had to delete the DLR before starting this workflow.   I used two RESTAPI calls to complete this work as show below:

And

 

Notice the URL templates on both as they are critical.  Otherwise it’s just a RESTAPI call to NSX Manager.

These two workflows along with the RestHost get passed into a workflow as shown:

 

Otherwise it’s just a scriptable task.  I ran into an issue with it running too fast for NSX manager so I put a 1 second wait in between each call.  Here is the code:

function sleep(milliseconds) {
 var start = new Date().getTime();
 for (var i = 0; i < 1e7; i++) {
 if ((new Date().getTime() - start) > milliseconds){
 break;
 }
 }
}

//Setup the get request for wires
var inParamtersValues = [];
var request = restOperation.createRequest(inParamtersValues, null);
request.contentType = "";
var response = request.execute();
//prepare output parameters
//System.log("Response: " + response);
statusCode = response.statusCode;
statusCodeAttribute = statusCode;
//System.log("Status code: " + statusCode);
contentLength = response.contentLength;
headers = response.getAllHeaders();
contentAsString = response.contentAsString;

System.log(response);

var xmlObj = new XML(contentAsString);

var document = XMLManager.fromString(contentAsString);

var count = document.getElementsByTagName("vdnId");

System.log("Count : " + count.length);

j = 0;

for (i=0; i < count.length; i++)
{
 //System.log("Scope: " + xmlObj.dataPage.virtualWire[i].objectId + " vdnId: " + xmlObj.dataPage.virtualWire[i].vdnId );
 if ( xmlObj.dataPage.virtualWire[i].vdnId > 5010)
 {
 System.log("Scope: " + xmlObj.dataPage.virtualWire[i].objectId + " vdnId: " + xmlObj.dataPage.virtualWire[i].vdnId );
 var virtualWireID = xmlObj.dataPage.virtualWire[i].objectId;
 var inParamtersValues = [virtualWireID];
 var request = restOperationDelete.createRequest(inParamtersValues, null);
 request.contentType = "";
 var response = request.execute();
 statusCode = response.statusCode;
 System.log("Response : " + statusCode + " Rest request : " + virtualWireID);
 j++;
 sleep(1000);
 }
}
System.log("j : " + j + " Total : " + count.length);

It should work great it’s currently set to delete every vnri above 5010 (mine start at 5000)  you can adjust this number to whatever you want…

 

 

 

 

Basic NSX Network virtualization setup

This post will go over the basic setup for network virtualization in NSX.  This is nothing new or exciting but I figured I would share as more users are deploying NSX in their home labs these days.   I will assume that you already have the environment prepared by deploying the manager and controllers and all your ESXi hosts are prepared.

We are going to set up the subnet of 10.0.0.0/17 to be virtually routed as shown below:

This requires the following:

  • Static route on Linksys EA6200 router to point 10.0.0.0/17 to 192.168.10.223 (because my Linksys does not support any dynamic routing protocols)
  •  A logical switch called Transport-10.0.0.0 between the border ESG and the Logical distributed router
  • OSPF configured between ESG-3 and LDR-3

 

Creation of the LDR-3 (pictures to follow steps)

  1. First we need to create a logical switch by choosing Logical Switches, select green + button, Input Name (Transport-10.0.0.0) and description and click ok
  2. Select NSX Edges in Navigator pane, select green + button
  3. In Name and description pane: Install Type: Logical (distributed) route, Name: LDR-3, Hostname ldr3, leave deploy NSX Edge selected, Next
  4. In settings, type your password, I like to enable ssh, click next
  5. In configure deployment: Press the green + to deploy a NSX Edge Appliance, Select correct resource pool, datastore, host, and folder, click ok
  6. Click Next
  7. In Configure interfaces
  8. Select connected to for HA interface: Port group DV-VM, press + below HA and add 192.168.10.224
  9. press the green + button under interfaces
  10. In Add NSX Edge Interface: Name Uplink, Connected to: Transport-10.0.0.0, Press green + to add IP: 10.0.0.2 subnet 24, Click ok
  11. Click Next
  12. In Default gateway settings:  Set the gateway IP as 10.0.0.1 and click next
  13. Ignore the Firewall and HA settings click next
  14. Click finish to deploy LDR

 

 

Creation of the ESG-3 (pictures to follow steps)

  1. Back at the NSX Edge section in Navigator
  2. Press the Green + sign
  3. In Name and description: Choose Edge Services gateway, Name: ESG-3, Hostname esg3 and select Next (in Production you might want high availability or ECMP)
  4. In Settings:  Type Admin password and enable ssh, Next
  5. In Configure deployment: Press Green + sign, Select resource pool, datastore and host then ok and Next
  6. In Configure Interfaces: press the green + sign
  7. Name: Uplink, Connected To: DV-VM, Press green + to add interface: 192.168.10.223 subnet 24, click ok
  8. Click Next
  9. In Default gateway settings insert default gateway of 192.168.10.1 then next
  10. Ignore firewall and HA settings and next
  11. Click Finish to deploy appliance

Configure Physical router

This is unique per router in mine I added a static route for the subnet:

 

Configure LDR

We need to add at least one inside network and configure OSPF.

  1. Logical Switch section we are going to add a switch for 10.0.1.0/24 called LS-10.0.1
  2. In Logical Switch Section: Green + button, Name LS-10.0.1 then OK
  3. Go to NSX Edges in Navigator
  4. Double click on LDR-3
  5. We need to add a interface for the new network Select Manage, Settings, Interfaces
  6. Select Green +
  7. Name: GW-10.0.1: Connected To LS-10.0.1, Green + button to add interface 10.0.1.1 subnet 24,
  8. Select Routing tab, global Configuration
  9. Go to Dynamic Routing configuration and click edit
  10. Make sure the uplink interface is chosen then click ok
  11. Press Publish Changes button
  12. Click on OSPF button
  13. Remove all current area definitions (51 ) with red X then publish changes
  14. Click green + on area definitions and add area 2 (just type 2 in area button leave rest default)
  15. Press green + in area to interface mapping button
  16. Make sure Uplink is selected and area 2 and press OK
  17. Press Edit button next to OSPF configuration and enable OSPF, For protocol address choose a free IP 10.0.0.3, forwarding is 10.0.0.2
  18. Publish Changes
  19. Go to firewall section
  20. Disable firewall
  21. Publish changes

 

Configure ESG-3

  1. Return to Networking & Security main section
  2. Select NSX Edges and double click on ESG-3
  3. Select Manage, Settings, Interfaces
  4. We need to add a interface for the transport between LDR and ESG
  5. Select vnic1 and press Edit button
  6. Connected to: Transport-10.0.0.0, IP: 10.0.0.1 subnet 24
  7. Select Routing
  8. In global configuration: Select edit next to dynamic routing configuration, ensure uplink is selected and press ok
  9. Publish changes
  10. Click on OSPF
  11. Remove current area definitions with red X and publish changes
  12. Add a new area for area 2 leaving everything else default
  13. In the area to interface mapping make sure you chose vnic1 (internal link) and area 2
  14. Select OSPF Configuration and Enable OSPF
  15. Publish Changes
  16. Select Firewall section and disable firewall and publish changes

 

Validate Configuration

Let’s validate configuration three ways: Confirming OSPF settings on ESG-3, Adding a new subnet, ping test

Confirming on ESG-3

  1. Login to ESG-3 via SSH (username admin password set during deployment)
  2. Type the following to see current routes (show ip route)  ensure that the E2 learned route is showing:

Adding a new subnet

  1. Stay logged into the ESG-3
  2. Switch to the Networking and security console, navigate to Logical switches
  3. Press green + to add a switch for LS-10.0.2
  4. select NSX Edges, Double click on LDR-3
  5. Go to Manage and settings
  6. Select Interfaces and press green +
  7. Name: GW-10.0.2, Internal, Connected to:  LS-10.0.2, IP 10.0.2.1 subnet 24
  8. Return to the ESG-3 ssh session and run the command show ip route to see 10.0.2.0/24

Test Via ping

  1. Attempt to ping either gateway on the LDR (10.0.1.1 or 10.0.2.1)

 

Additional commands on ESG-3

Here are some commands that will help you in troubleshooting OSPF

show ip ospf neightbors – show other members of the areas

show ip ospf database – understand current ospf database

 

Redeploy NSX Edges to a different cluster / datacenter

First Issue my bad

I ran into an interesting issue in my home lab.  I recently replaced all my older HP servers with Intel NUC’s.  I could not be happier with the results.   Once I replaced all the ESXi hosts I mounted the storage and started up my virtual machines including vCenter.   Once vCenter and NSX Manager were available I moved all the ESXi hosts to the distributed switch.   This normal process was complicated by NSX.    I should have added the ESXi hosts to the transport zone allowing NSX to join the distributed switch.   Failure to do this made the NSX VXLAN process fail.   I could not prepare the hosts… ultimately I removed the VXLAN entries from the distributed switch and then re-prepared which re-created the VXLAN entries on the switch.   (This is not a good idea if you use it in production so follow the correct path.

Second Issue nice to know

This process generated a second issue the original cluster and datacenter on which my NSX edges used to live was gone.   I assumed that I could just re-deploy NSX edges from the manager.   While this is true the configuration assumes that it will be deploying the Edges to the same datacenter, resource pool and potentially the same host as when it was created.   So if I have a failure and expect to just bring up NSX manager and redeploy to a new cluster it will not work.   You have to adjust the parameters for the edges you can do this via the API or GUI.   I wanted to demonstrate the API method:

I needed to change the resource pool, datastore, and host for my Edge.   I identified my Edge via the identifier name in the GUI.  (edge-8 for me)  Grabbed my favorite REST tool (postman) and formed a query on the current state:

Get https://{nsx-manager-ip}/api/4.0/edges/edge-8/appliances

This returned the configuration for this edge device.  If you need to identify all edges just do

Get https://{nsx-manager-ip}/api/4.0/edges

Then I needed the VMware identifier for resource pool, datastore and host – this can all be gathered via the REST API but I went for Powershell because it was faster for me.  I used the following commands in PowerCLI:

 

get-vmhost | fl - returned host-881

get-resourcepool | fl - returned domain-c861

get-datastore | fl - returned datastore-865

 

Once identified I was ready to form my adjusted query:

 

<appliances>
<applianceSize>compact</applianceSize>
<appliance>
<highAvailabilityIndex>0</highAvailabilityIndex>
<vcUuid>500cfc30-5b2a-6bae-32a3-360e0315ccd3</vcUuid>
<vmId>vm-924</vmId>
<resourcePoolId>domain-c861</resourcePoolId>
<resourcePoolName>domain-c861</resourcePoolName>
<datastoreId>datastore-865</datastoreId>
<datastoreName>datastore-865</datastoreName>
<hostId>host-881</hostId>
<vmFolderId>group-v122</vmFolderId>
<vmFolderName>NSX</vmFolderName>
<vmHostname>esg1-0</vmHostname>
<vmName>ESG-1-0</vmName>
<deployed>true</deployed>
<cpuReservation>
<limit>-1</limit>
<reservation>1000</reservation>
</cpuReservation>
<memoryReservation>
<limit>-1</limit>
<reservation>512</reservation>
</memoryReservation>
<edgeId>edge-9</edgeId>
<configuredResourcePool>
<id>domain-c26</id>
<name>domain-c26</name>
<isValid>false</isValid>
</configuredResourcePool>
<configuredDataStore>
<id>datastore-31</id>
<isValid>false</isValid>
</configuredDataStore>
<configuredHost>
<id>host-29</id>
<isValid>false</isValid>
</configuredHost>
<configuredVmFolder>
<id>group-v122</id>
<name>NSX</name>
<isValid>true</isValid>
</configuredVmFolder>
</appliance>
<deployAppliances>true</deployAppliances>
</appliances>

I used a PUT against https://{nsx-manager-ip}/api/4.0/edges/{edgeId}/appliances  with the above body in xml/application.   Then I was able to redeploy my edge devices without any challenge.

NSX Manager still running but disconnected from vCenter

A quick note in case you run into this issue.   I was running into problems where my NSX manager was running and everything seemed fine (NSX manager login / Console) but I could not manage NSX elements from inside vCenter.   No NSX manager was showing up.   Reconnecting to vCenter or rebooting would resolve this issue but then I had the problem again the next day.   I could not figure out the issue… then it dawned on me what happens every day…. BACKUP!   Somehow my NSX manager was added to the nightly backup and it would lose connection during this time.    Here is the only approved method for backing up a NSX manager:

  1. Use the configuration backup in the NSX manager administration console to make normal and regular backups

 

To recover a NSX manager do the following:

  1. Deploy a new NSX manager using OVF (same version of NSX as backup) with same IP as original manager
  2. Restore the configuration from the backup
  3. Reboot the NSX manager to ensure clean configuration
  4. Ensure it shows up in the GUI

 

Image level backups are not supported or a good idea 🙂

Configuring a NSX load balancer from API

A customer asked me this week if there was any examples of customers configuring the NSX load balancer via vRealize Automation.   I was surprised when google didn’t turn up any examples.  The NSX API guide (which is one of the best guides around) provides the details for how to call each element.  You can download it here. Once you have the PDF you can navigate to page 200 which is the start of the load balancer section.

Too many Edge devices

NSX load balancers are Edge service gateways.   A normal NSX environment may have a few while others may have hundreds but not all are load balancers.   A quick API lookup of all Edges provides this information: (my NSX manager is 192.168.10.28 hence the usage in all examples)

https://192.168.10.28/api/4.0/edges
        <edgeSummary>
            <objectId>edge-57</objectId>
            <objectTypeName>Edge</objectTypeName>
            <vsmUuid>420CD713-469F-7053-8281-A7BD66A1CD46</vsmUuid>
            <nodeId>92484cee-ab3c-4ed2-955e-e5bd135f5be5</nodeId>
            <revision>2</revision>
            <type>
                <typeName>Edge</typeName>
            </type>
            <name>LB-1</name>
            <clientHandle></clientHandle>
            <extendedAttributes/>
            <isUniversal>false</isUniversal>
            <universalRevision>0</universalRevision>
            <id>edge-57</id>
            <state>deployed</state>
            <edgeType>gatewayServices</edgeType>
            <datacenterMoid>datacenter-21</datacenterMoid>
            <datacenterName>Home</datacenterName>
            <tenantId>default</tenantId>
            <apiVersion>4.0</apiVersion>
            <recentJobInfo>
                <jobId>jobdata-34935</jobId>
                <status>SUCCESS</status>
            </recentJobInfo>
            <edgeStatus>GREEN</edgeStatus>
            <numberOfConnectedVnics>1</numberOfConnectedVnics>
            <appliancesSummary>
                <vmVersion>6.2.0</vmVersion>
                <vmBuildInfo>6.2.0-2982179</vmBuildInfo>
                <applianceSize>compact</applianceSize>
                <fqdn>NSX-edge-57</fqdn>
                <numberOfDeployedVms>1</numberOfDeployedVms>
                <activeVseHaIndex>0</activeVseHaIndex>
                <vmMoidOfActiveVse>vm-283</vmMoidOfActiveVse>
                <vmNameOfActiveVse>LB-1-0</vmNameOfActiveVse>
                <hostMoidOfActiveVse>host-29</hostMoidOfActiveVse>
                <hostNameOfActiveVse>vmh1.griffiths.local</hostNameOfActiveVse>
                <resourcePoolMoidOfActiveVse>resgroup-27</resourcePoolMoidOfActiveVse>
                <resourcePoolNameOfActiveVse>Resources</resourcePoolNameOfActiveVse>
                <dataStoreMoidOfActiveVse>datastore-31</dataStoreMoidOfActiveVse>
                <dataStoreNameOfActiveVse>SYN8-NFS-GEN-VOL1</dataStoreNameOfActiveVse>
                <statusFromVseUpdatedOn>1478911807005</statusFromVseUpdatedOn>
                <communicationChannel>msgbus</communicationChannel>
            </appliancesSummary>
            <hypervisorAssist>false</hypervisorAssist>
            <allowedActions>
                <string>Change Log Level</string>
                <string>Add Edge Appliance</string>
                <string>Delete Edge Appliance</string>
                <string>Edit Edge Appliance</string>
                <string>Edit CLI Credentials</string>
                <string>Change edge appliance size</string>
                <string>Force Sync</string>
                <string>Redeploy Edge</string>
                <string>Change Edge Appliance Core Dump Configuration</string>
                <string>Enable hypervisorAssist</string>
                <string>Edit Highavailability</string>
                <string>Edit Dns</string>
                <string>Edit Syslog</string>
                <string>Edit Automatic Rule Generation Settings</string>
                <string>Disable SSH</string>
                <string>Download Edge TechSupport Logs</string>
            </allowedActions>
        </edgeSummary>

 

This is for a single Edge gateway in my case I have 57 Edges deployed over the life of my NSX environment and 15 active right now.   But only Edge-57 is a load balancer.   This report does not provide anything that can be used to identify it as a load balancer from a Edge as a firewall.   In order to identify if it’s a load balancer I have to query it’s load balancer configuration using:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config

Notice the addition of the edge-57 name to the query.   It returns:

<loadBalancer>
    <version>2</version>
    <enabled>true</enabled>
    <enableServiceInsertion>false</enableServiceInsertion>
    <accelerationEnabled>false</accelerationEnabled>
    <monitor>
        <monitorId>monitor-1</monitorId>
        <type>tcp</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <name>default_tcp_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-2</monitorId>
        <type>http</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_http_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-3</monitorId>
        <type>https</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_https_monitor</name>
    </monitor>
    <logging>
        <enable>false</enable>
        <logLevel>info</logLevel>
    </logging>
</loadBalancer>

Notice that this edge has load balancer enabled as true with some default monitors.   To compare here is a edge without the feature enabled:

https://192.168.10.28/api/4.0/edges/edge-56/loadbalancer/config
<loadBalancer>
    <version>1</version>
    <enabled>false</enabled>
    <enableServiceInsertion>false</enableServiceInsertion>
    <accelerationEnabled>false</accelerationEnabled>
    <monitor>
        <monitorId>monitor-1</monitorId>
        <type>tcp</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <name>default_tcp_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-2</monitorId>
        <type>http</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_http_monitor</name>
    </monitor>
    <monitor>
        <monitorId>monitor-3</monitorId>
        <type>https</type>
        <interval>5</interval>
        <timeout>15</timeout>
        <maxRetries>3</maxRetries>
        <method>GET</method>
        <url>/</url>
        <name>default_https_monitor</name>
    </monitor>
    <logging>
        <enable>false</enable>
        <logLevel>info</logLevel>
    </logging>
</loadBalancer>

Enabled is false with the same default monitors.   So now we know how to identify which edges are load balancers:

  • Get list of all Edges via API and pull out id element
  • Query each id element for load balancer config and match on true

 

 

Adding virtual servers

You can add virtual servers assuming the application profile and pools are already in place with a POST command with a XML body payload like this (the virtual server IP must already be assigned to the Edge as an interface):

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers
<virtualServer>
<name>http_vip_2</name>
<description>http virtualServer 2</description>
<enabled>true</enabled>
<ipAddress>192.168.10.18</ipAddress>
<protocol>http</protocol>
<port>443,6000-7000</port> 
<connectionLimit>123</connectionLimit>
<connectionRateLimit>123</connectionRateLimit>
<applicationProfileId>applicationProfile-1</applicationProfileId>
<defaultPoolId>pool-1</defaultPoolId>
<enableServiceInsertion>false</enableServiceInsertion>
<accelerationEnabled>true</accelerationEnabled>
</virtualServer>

capture

You can see it’s been created.  A quick query:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers
<loadBalancer>
    <virtualServer>
        <virtualServerId>virtualServer-5</virtualServerId>
        <name>http_vip_2</name>
        <description>http virtualServer 2</description>
        <enabled>true</enabled>
        <ipAddress>192.168.10.18</ipAddress>
        <protocol>http</protocol>
        <port>443,6000-7000</port>
        <connectionLimit>123</connectionLimit>
        <connectionRateLimit>123</connectionRateLimit>
        <defaultPoolId>pool-1</defaultPoolId>
        <applicationProfileId>applicationProfile-1</applicationProfileId>
        <enableServiceInsertion>false</enableServiceInsertion>
        <accelerationEnabled>true</accelerationEnabled>
    </virtualServer>
</loadBalancer>

 

Shows it’s been created.  To delete just use the virtualServerId and pass to DELETE

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/virtualservers/virtualserverID

 

Pool Members

For pools you have to update the full information to add a backend member or for that matter remove a member.  So you first query it:

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/pools
<?xml version="1.0" encoding="UTF-8"?>
<loadBalancer>
    <pool>
        <poolId>pool-1</poolId>
        <name>pool-1</name>
        <algorithm>round-robin</algorithm>
        <transparent>false</transparent>
    </pool>
</loadBalancer>

Then you form your PUT with the data elements you need (taken from API guide).

https://192.168.10.28/api/4.0/edges/edge-57/loadbalancer/config/pools/pool-1
<pool>
<name>pool-1</name>
<description>pool-tcp-snat</description>
<transparent>false</transparent>
<algorithm>round-robin</algorithm>
<monitorId>monitor-3</monitorId>
<member>
<ipAddress>192.168.10.14</ipAddress>
<weight>1</weight>
<port>80</port>
<minConn>10</minConn>
<maxConn>100</maxConn>
<name>m5</name>
<monitorPort>80</monitorPort>
</member>
</pool>

In the client we see a member added:

capture

Tie it all together

Each of these actions have a update delete and query function that can be done.  The real challenge is taking the API inputs and creating user friendly data into vRealize Input to make it user friendly.    NSX continues to amaze me as a great product that has a very powerful and documented API.    I have run into very little issues trying to figure out how to do anything in NSX with the API.  In a future post I may provide some vRealize Orchestrator actions to speed up configuration of load balancers.

 

 

 

 

 

 

 

 

 

vRO add all virtual machines to NSX exception list

Almost everyone is using a brownfield environment to implement NSX.   Switching the DFW firewall to deny all is the safest bet but hard to do with brownfield environments.   Denying all traffic is a bad idea.  Doing a massive application conversion at once into DFW rules is not practical.  One method to solve this issue is to create an exception for all virtual machines then move them out of exception once you have created the correct allow rules for the machine.   I didn’t want to manually via the GUI add all my machines so I explored the API.

How to explore the API for NSX

VMware’s beta developer center provides the easiest way to explore the NSX API.   You can find the NSX section here.  Searching the api for “excep” quickly turned up the following answer:

api

As you can see there are three methods (get, put, delete).  It’s always safe to start with a get as it does not produce changes.   Using postman for chrome.  I was quickly connected to NSX see my setting below:

n1

The return from this get was lots of lines of machines that I had manually added to the exception list.  For example the following

n1

Looking at this virtual machine you can see it’s identified by the objectID which aligns with the put and delete functions the following worked perfectly:

Delete
https://192.168.10.28/api/2.1/app/excludelist/vm-47

Put
https://192.168.10.28/api/2.1/app/excludelist/vm-47

A quick get showed the vm-47 was back on the list.  Now we had one issue the designation and inventory of objectID’s is not a construct of NSX but of vCenter.

The Plan

In order to be successful in my plans I needed to do the following

  • Gather list of all objectID’s from vCenter
  • Put the list one at a time into NSX’s exclude list
  • Have some way to orchestrate it all together

No surprise I turned to vRealize orchestrator.   I wanted to keep it generic rest connections and not use NSX plugins.   So my journey began.

Orchestrator REST for NSX

  • Login to orchestrator
  • Switch to workflow view
  • Expand the library and locate the add rest host workflow
  • Run the workflow

1

2

3

4

  • Hit submit and wait for it to complete
  • You can verify the connect by visiting the administration section and expanding rest connections

Now we need to add a rest operation for addition to the exception list.

  • Locate the Add REST operation workflow
  • Run it
  • Fill out as shown

1

You now have a put method that takes input of {vm-id} before it can run.  In order to test we go back to our Postman and delete vm-47 and do a get to verify it’s gone:

Delete:

https://192.168.10.28/api/2.1/app/excludelist/vm-47

Get:

https://192.168.10.28/api/2.1/app/excludelist

 

It’ is missing from the get.   Now we need to run our REST operation

  • Locate the workflow called: Invoke a REST operation
  • Run it as shown below

1

2

3

Once completed a quick postman get showed me vm-47 is back on the exclude list.   Now I am ready for prime time.

Creation of an Add to Exclude List workflow

I need to create a workflow that just runs the rest operation to add to exclude list.

  • Copy the Invoke a REST operation
  • New workflow should be called AddNSXExclude
  • Edit new workflow
  • Go to inputs and remove all param_xxx except param_0
  • Move everything else but param_0 to Attributes

1

  • Let’s edit the attributes next
  • Click on the value for ther restOperation and set it to “Put on Exclude List ..” operation you created earlier

1

  • Go to the Schema and edit the REST call scriptable task
  • Remove all param_xxx except param_0 from the IN on the scriptable task

1

  • Edit the top line of the scripting to read like this:

var inParamtersValues = [param_0];

  • Close the scriptable task
  • Click on presentation and remove everything but the content question

1

Now we have a new issue.  We need to have it not error when the return code is not 200.  For example if the object is already on the exception list.   We just want everything on the list right away.   So edit your schema to remove everything but the rest call:

1

 

Put it all together with a list of virtual machines

Time for a new workflow with a scriptable task.

  • In the general tab put a single attribute that is an array of string

1

  • Add a scriptable task to the schema
  • Add a foreach element to the schema after the scriptable task
    • Link the foreach look to the AddNSX workflow you made in the last step
    • Link vmid to param_0

1

  • Edit the scriptable task and add the following code:

//get list of all VM’s
vms = System.getModule(“com.vmware.library.vc.vm”).getAllVMs();

var vmid = new Array();

for each (vm in vms)
{

vmid.push(vm.id);

}

 

  • Add an IN for vmid and an OUT for vmid
  • Run it and your complete you can see the response headers in the logs section

Hope it helps you automate some NSX.

What does apply to mean in NSX Firewall?

When I first started using NSX I ran into this little problem.   What does apply to mean and how should I use it?

Background

I believe the background for the apply to is from physical firewalls.   They allowed you to apply rules to a specific interface.   Applying to an interface had the following effects:

  • Limit the number of rules that have to be processed
  • Allow specific fine-grained controls

Applying rules to specific interfaces had a few issues:

  • You had to have a good understanding of the network topology in order apply rules correctly
  • New interfaces may be missed by rules

You also had the ability to apply the rule to all interfaces that existed.   On the surface if you had enough hardware to apply the rules everywhere it worked great.  Tons of interfaces who didn’t need the rules now had them.    There are a few problems:

  • New interfaces would have no rules and all rules would have to be applyed to them
  • These rules exist only on a single firewall rule creation is specific to that firewall

NSX Firewall

The NSX firewall takes a similar approach to firewall application.  All firewall rules are created in NSX manager and stored inside the NSX manager database.   By default rules are applied to the “distributed firewall”.  This will apply the rules to all virtual machines vNIC, regardless of the virtual machines location.   This creates the same problem as applying on every interface, each vNIC will have a long list of rules to attempt to match.

This is where the apply to tag becomes interesting.   In order to explain I’ll use a simple example:

Two virtual machines: 172.16.0.2 on VNI 5000 and 172.16.20.2 on VNI 5002.

My default firewall rule set allows them to communicate without any issues.  Let assume I want to block all traffic between these machines so I create the following rule:

pic1

Source:  172.16.0.2 virtual machine

Destination: 172.16.20.2 virtual machine

Service: Any

Action: Block

Apply to: Distributed firewall (default)

 

Using Traceflow we can identify where it was blocked:

pic2

You can see clearly the default of distributed switch applied the drop action to the source.   This is really great because it limits the traffic on the physical wire.   Since the object is known as a managed object in NSX the rule is enforced as soon as possible.   If you have a physical entity that is not managed by NSX the rule will be applied upon the destination.   This is hard to prove because traceflow cannot provide visibility to physical entities.

What does apply to do?

Simply put it tells NSX where to apply the firewall rule.  Lets examine some of the options for my rule above:

  • Host
  • Cluster
  • Virtual machine
  • IP or Mac set
  • etc..

It provides the full list of objects that DFW rules can made with including dynamic sets and tags.   This is really powerful.   For the sake of this example lets apply the rule to the destination virtual machine instead of the DFW.

pic1

Using traceflow we can see the results:

pic2

My attempted connection was dropped at the destination where I applied the firewall rule.    You can also see how it between 7 and 8 the message left host 3 and went across my physical network to host 1 (black hole of visibility)

Why use the apply to feature?

  • Reduce the amount of rules applied to each vNIC
  • Enforce the rule at a specific location (think situations with VM overlap or rule overlap)

Apply to does add to the complexity of the environment and troubleshooting but can limit scope.   This is where careful planning and understanding of the environment can really help.   Arkin can help as well but that’s another days post.

Greatest tool for NSX!

I want to let you in on a little secret of NSX called Traceflow.   It was made available in the 6.2 release and I am in love with it.   In order to explain my love let’s do a history lesson a fantastic read :

History Lesson (Get off my lawn kids time)

Back in the old days (pretty much right now in every enterprise) you had a bunch of switches, routers and firewalls.   When a server was having a problem communicating with another server you had to trace its MAC address through every hop manually.   You might be lucky and use a SIEM to identify if a firewall was dropping the traffic.   Understanding each hop of the traffic is a pain.    It takes time and can be very complex in enterprise implementations.

Enter NSX

NSX does some complex routing, switching and firewalling.   Your visibility into the process in the past was articles like mine.   With traceflow you can prove your theory and identify data paths.    It still does not have visibility beyond the NSX world and into the physical.   Hopefully some day we will have that too.   Traceflow can get you pretty close.

Where is this traceflow of which you speak?

Login to vCenter, select networking and security and it’s on the right side most of the way down.   It allows you to select a source and a destination then inject packets.   The NSX components report back as the injected packet passes by allowing you to trace the flow of communication.

Show me some meat

Sounds good.  Lets assume we have two virtual machines 172.16.0.2 and 172.16.0.3 both on VNI (think vlan) 5000.   They are on the same ESXi host.   There are no firewall rules blocking traffic.   Here is the output from traceflow:

first_same_host

Look at that.  The injected packet came from 172.16.0.2 and hit the vNIC FW then was forwarded directly to 172.16.0.3’s vNIC firewall and into the machine.   This is simple and exactly what we expect.  Let do the exact same thing except move the second machine to another ESXi host:

second_diff_hosts

Now we have added the VTEP (virtual tunnel end point) connection between ESXi hosts.  VTEP communication is layer 3 between ESXi hosts creating a stretch of VNI 5000 between distances or right next to each other.

Neat meat but it really only shows layer 2 communication that’s easy

How about some routing then.  Two virtual machines 172.16.0.2 VNI 5000 and 172.16.10.2 VNI 5001.   Each on the same ESXi host:

Third_usingtwo_networks

Look at that now we see the logical router in the mix taking the traffic from Logical switch (LS-172.16.0) and routing it to Logical router LS-172.16.10.   Suddenly the flow of traffic is not a mystery.

What about if the firewall is blocking the traffic?

I assumed you would ask so here is a new firewall rule I added:

rule

And the traceflow:

after_fw_rule_added_1

Yep my packet was dropped and it tells me where and what rule number blocked it.

What is the only problem with traceflow?

That is does not show the traffic flow on my physical network.   This should be very simple given that all my traffic for NSX is routed we should not have complex layer 2 stretches or lots of vlans to ensure are in place.   It’s just routed communication that can start at top of rack with the correct design.

NSX Controller forever deploying never working

I ran into an issue with NSX in the home lab where a new NSX controller was deploying and a power outage interrupted the deployment.  This left me in a state of deploying forever.  After waiting a day and being in the same state I removed the inoperable virtual machine from vCenter.  The issue persisted in NSX.

bad show

As you can see controller-18 is forever deploying.  The NSX manager command line showed it as deploying so it’s a database issue somewhere.  Since the deployment action was in place it was impossible to remove the controller and the cluster health was bad with only two controllers.   I don’t have any magic method for working with the NSX manager database (I assume it’s postgres but I really don’t know) other than the API.  So off to the API I went.   First I wanted to query for all controllers to make sure I had the correct name (ID field in picture above).   So I setup my REST connection for

https://IP_of_Manager/api/2.0/vdn/controller

no-controller

I returned that the ID is indeed controller-18.   Once I knew the controller number is was a simple delete method with the right command line:

Capture

Removal via

https://nsx_manager_ip/api/2.0/vdn/controller/controller-18?forceRemoval=True

 

After this command I queried again to confirm it was gone:

after

Since 18 was my last controller it returned nothing.   Hopefully if you have a stuck deploying NSX controller this article will help you remove it.

VCIX-NV VMware Certified Implementation Expert Network Virtualization Exam Experience

While attending VMworld I have made a habit of taking advantage of the discounted certifications.   Each year I have pushed into a new certification and given a try.   This year I have my sights set on VCIX-NV.   I normally like to schedule these tests during the general session to avoid the impact of a 3 hour test during VMworld sessions.  (I can always watch the general sessions on YouTube Later)  This year they closed my secret loop-hole only allowing me to take it during Thursday’s general session.    This has a profound impact on my plans for Wednesday night.   Wednesday Night is the VMworld party which is always really awesome.   I am sad to say I skipped it 100% this year and spent the time studying and going to bed early.   I am happy to announce I passed the exam and earned a VCP-NV and VCIX-NV in the same day.   Here is some details on what I studied:

  • The VMware HOL labs
  • The blueprint documents (yes I read all the documentation provided by VMware except the CLI guide.. it’s a really more of a reference guide)
  • I was lucky to spend some time with Elver Sosa and Ron Flax (both VCDX-NV’s) last year that helped me understand the technology
  • Plural Sight course on NSX by Jason Nash (these are really good)
  • Time in the HOL doing things not on the list (like more firewall rules)

 

This test requires that you do a series of live exercises that can build upon each other.  Some time management tips are:

  • Skip ahead if allows and see how tasks fit in
  • Read carefully what is expected there are a lot of tips
  • Do what you can partial credit is points (at least I think it is)
  • Spend time before understanding how to troubleshoot nsx and verify your settings
  • Don’t be afraid to skip a question if you really don’t know time is not your friend

 

It was like the VCAP-DCA test something I really enjoyed doing… I really wish it didn’t have the time crunch but it was a fun exam.  The best advice I can give you is read the blueprint and documents and use the HOL from VMware to gain experience.