First Issue my bad
I ran into an interesting issue in my home lab. I recently replaced all my older HP servers with Intel NUC’s. I could not be happier with the results. Once I replaced all the ESXi hosts I mounted the storage and started up my virtual machines including vCenter. Once vCenter and NSX Manager were available I moved all the ESXi hosts to the distributed switch. This normal process was complicated by NSX. I should have added the ESXi hosts to the transport zone allowing NSX to join the distributed switch. Failure to do this made the NSX VXLAN process fail. I could not prepare the hosts… ultimately I removed the VXLAN entries from the distributed switch and then re-prepared which re-created the VXLAN entries on the switch. (This is not a good idea if you use it in production so follow the correct path.
Second Issue nice to know
This process generated a second issue the original cluster and datacenter on which my NSX edges used to live was gone. I assumed that I could just re-deploy NSX edges from the manager. While this is true the configuration assumes that it will be deploying the Edges to the same datacenter, resource pool and potentially the same host as when it was created. So if I have a failure and expect to just bring up NSX manager and redeploy to a new cluster it will not work. You have to adjust the parameters for the edges you can do this via the API or GUI. I wanted to demonstrate the API method:
I needed to change the resource pool, datastore, and host for my Edge. I identified my Edge via the identifier name in the GUI. (edge-8 for me) Grabbed my favorite REST tool (postman) and formed a query on the current state:
Get https://{nsx-manager-ip}/api/4.0/edges/edge-8/appliances
This returned the configuration for this edge device. If you need to identify all edges just do
Get https://{nsx-manager-ip}/api/4.0/edges
Then I needed the VMware identifier for resource pool, datastore and host – this can all be gathered via the REST API but I went for Powershell because it was faster for me. I used the following commands in PowerCLI:
get-vmhost | fl - returned host-881 get-resourcepool | fl - returned domain-c861 get-datastore | fl - returned datastore-865
Once identified I was ready to form my adjusted query:
<appliances>
<applianceSize>compact</applianceSize>
<appliance>
<highAvailabilityIndex>0</highAvailabilityIndex>
<vcUuid>500cfc30-5b2a-6bae-32a3-360e0315ccd3</vcUuid>
<vmId>vm-924</vmId>
<resourcePoolId>domain-c861</resourcePoolId>
<resourcePoolName>domain-c861</resourcePoolName>
<datastoreId>datastore-865</datastoreId>
<datastoreName>datastore-865</datastoreName>
<hostId>host-881</hostId>
<vmFolderId>group-v122</vmFolderId>
<vmFolderName>NSX</vmFolderName>
<vmHostname>esg1-0</vmHostname>
<vmName>ESG-1-0</vmName>
<deployed>true</deployed>
<cpuReservation>
<limit>-1</limit>
<reservation>1000</reservation>
</cpuReservation>
<memoryReservation>
<limit>-1</limit>
<reservation>512</reservation>
</memoryReservation>
<edgeId>edge-9</edgeId>
<configuredResourcePool>
<id>domain-c26</id>
<name>domain-c26</name>
<isValid>false</isValid>
</configuredResourcePool>
<configuredDataStore>
<id>datastore-31</id>
<isValid>false</isValid>
</configuredDataStore>
<configuredHost>
<id>host-29</id>
<isValid>false</isValid>
</configuredHost>
<configuredVmFolder>
<id>group-v122</id>
<name>NSX</name>
<isValid>true</isValid>
</configuredVmFolder>
</appliance>
<deployAppliances>true</deployAppliances>
</appliances>
I used a PUT against https://{nsx-manager-ip}/api/4.0/edges/{edgeId}/appliances with the above body in xml/application. Then I was able to redeploy my edge devices without any challenge.