Cross site vMotion requires VMware switches technology

Cross site vMotion is a feature that really shows the power of the VMware platform.   When combined with NSX you can move live running virtual machines across long distances.    It’s a huge advantage for customers looking to balance workloads or avoid potential disasters.   I learned today that this feature does require VMware’s virtual standard switch or distributed switch it will not work on any third party switches today.   In addition there are only certain supported migration paths:

VSS = Virtual Standard Switch

VDS = Virtual distributed switch

There are only certain supported migration paths:

VSS -> VSS

VSS -> VDS

VDS ->VDS

Notice that VDS -> VSS is not supported.

Double your storage capacity without buying a new storage shelf

I spent a good portion of my career moving storage from one array to another.   The driver is normally something like this:

  • Cost of older array (life cycle time)
  • New capacity, speed or feature

So off we went on another interruption migration of lun’s and data..  At one point I was sold on physical storage virtualization appliances.   They stood in front of the array and allowed me to move data between arrays without interruption to the WWID or application.   I loved them what a great solution.   Then storage vMotion became available and 95% of the workloads were running in VMware.   I no longer needed the storage virtualization appliance and my life became very VMware focused.

 

New Storage paradigm

With the advent of all flash arrays and HCI (all flash or mixed) performance(speed) has almost gone away as a reason for moving data off arrays.  Most arrays offer the same features; replication capability aside.   So now we are migrating to new arrays / storage shelf’s because of capacity or life cycle issues.   Storage arrays and their storage shelves have a real challenge with linear growth.   They expect you to make a bet on the next three years capacity.   HCI allows a much better linear growth model for storage.

My HCI Gripe

My greatest grip with HCI solutions is that everyone needs more storage that does not always mean you need more compute.   Vendors that provide hardware locked (engineered) platforms suffer from this challenge.   The small box provides 10TB, Medium 20TB and large 40TB.   Which do I buy if I need 30TB?   I am once again stuck in the making a bet problem from arrays (at least it’s a smaller bet).   The software based platforms including VSAN (full disclosure – At time of writing I work for VMware and have run VSAN in my home for three years) have the advantage of offering better mixed sizing and linear growth.

What about massive growth?

What happens when you need to double your storage with HCI and your don’t have spare drive bays available?   Do you buy a new set of compute and migrate to it?  That’s just a replacement of the storage array model…  Recently at some meetings a friend from the Storage and availability group let me know the VSAN solution to this problem.   Quite simply replace the drives in your compute with larger drives in a rolling fashion.   You should create uniform clusters but it’s totally possible to replace all current drives with new double capacity drives.   Double the size of your storage for only the cost of the drives.   (doubling the size of cache is a more complex operation)  Once the new capacity is available and out of maintenance mode data is migrated by VSAN on to the new disks.

What is the process?

It’s documented in chapter 11 of the VSAN administration guide : https://pubs.vmware.com/vsphere-60/topic/com.vmware.ICbase/PDF/virtual-san-600-administration-guide.pdf

A high level overview of the steps (please use the official documentation)

  1. Maintenance mode the host
  2. Remove the disk from the disk group
  3. Replace the disk you removed with the new capacity drive
  4. Rescan for drives
  5. Add disk back into the disk group

 

Migrating off a distributed virtual switch to standard switch Article 2

Normally people want to migrate from virtual standard switches to distributed switches.   I am a huge fan of the distributed switch and feel it should be used everywhere.   The distributed switch becomes a challenge when you want to migrate hosts to a new vCenter.   I have seen a lot of migrations to new vCenters via detaching the ESXi hosts and connecting to the new vCenter.   This process works great assuming you are not using the distributed switch.   Removing or working with VM’s on a ghosted VDS is a real challenge.   So remove it before you migrate to a new vCenter.

In this multi-article solution I’ll provide some steps to migrate off a VDS to a VSS.

Article 2:  Migrating the host off the VDS.  In the last article we moved all the virtual machines off the VDS to a VSS.   We now need to migrate the vMotion and management off the VDS to a VSS.   This step will cause interruption to the management of a ESXi host.   Virtual machines will not be interrupted but the management / will be.   You must have console access to the ESXi host for this to work.  Steps at a glance:

  1. Confirm that a switch port exists for management and vMotion
  2. Remove vMotion, etc.. from VDS and add to VSS
  3. Remove management from VDS and add to VSS
  4. Confirm settings

Confirm that a switch port exists for management and vMotion

Before you begin examine the VSS to confirm that management and vMotion port groups were created correctly by Article 1's script.   Once your sure the VLAN settings for the port group are correct then you can move to the next step.  You may want to confirm your host isolation settings it’s possible these steps will cause a HA failure if you take too long to switch over and don’t have independent datastore networking.  Best practice would be to disable HA or switch to leave powered on isolation response. 

Remove vMotion, etc.. from VDS and add to VSS

Login to the ESXi host via console and ssh.  (Comments are preceded with #) 

#use the following command to identify virtual adapters on your dvs

esxcfg-vswitch -l

# sample output from my home lab

DVS Name         Num Ports   Used Ports  Configured Ports  MTU     Uplinks

dvSwitch         1792        7           512               1600    vmnic1

 

  DVPort ID           In Use      Client

  675                 0

  676                 1           vmnic1

  677                 0

  678                 0

  679                 1           vmk0

  268                 1           vmk1

  139                 1           vmk2

 

# We can see we have three virtual adapters on our host use the following command to identify their use and IP addresses

esxcfg-vmknic -l

#Sample output from my home lab cut out some details to make it more readable

Interface  Port Group/DVPort   IP Family IP Address     

vmk0       679                 IPv4      192.168.10.16                

vmk1       268                 IPv4      192.168.10.26                   

vmk2       139                 IPv4      192.168.10.22     

 

Align you vmk# with vCenter to identify which adapter provides the function (vmk0 management, vmk1 vMotion, vmk2 FT)

 

# We can now move all adapter other than management which in my case is vmk0 #we will start with vmk1 on dvSwitch on port 268

esxcfg-vmknic -d -v 268 -s "dvSwitch"

 

# Then add to vSwitch0 vmk1

esxcfg-vmknic -a -i 192.168.10.26 -n 255.255.255.0 -p PG-vMotion

 

Remove FT

esxcfg-vmknic -d -v 139 -s "dvSwitch"

 

esxcfg-vmknic -a -i 192.168.10.22 -n 255.255.255.0 -p PG-FT

 

Remove management from VDS and add to VSS

Remove management (this stage will interrupt management access to ESXi host – make sure you have console access) You might want to pretype the add command in the console before you execute the remove.  If you are having trouble getting the shell on a ESXi host do the following:

  • You will need to login to the console go to troubleshooting options -> Enable ESXi Shell

  • Press Alt-Cntr-F1 to enter shell and login

 

Remove management:

esxcfg-vmknic -d -v 679 -s "dvSwitch"

 

Add management to VSS:

esxcfg-vmknic -a -i 192.168.10.16 -n 255.255.255.0 -p PG-Mgmt

 

Confirm settings

Ping the host to ensure networking has returned to management.   Ensure the host returns to vCenter by waiting 2 minutes.    After you move the host to a new vCenter you can remove via:

  • Go to the host in vCenter and select dvs it should provide a remove button.

 

 

 

Migrating off a distributed virtual switch to standard switch Article 1

Normally people want to migrate from virtual standard switches to distributed switches.   I am a huge fan of the distributed switch and feel it should be used everywhere.   The distributed switch becomes a challenge when you want to migrate hosts to a new vCenter.   I have seen a lot of migrations to new vCenters via detaching the ESXi hosts and connecting to the new vCenter.   This process works great assuming you are not using the distributed switch.   Removing or working with VM’s on a ghosted VDS is a real challenge.   So remove it before you migrate to a new vCenter.

In this multi-article solution I’ll provide some steps to migrate off a VDS to a VSS.

It’s important to understand that assuming that networking is correct this process should not interrupt customer virtual machines.   The movement from a distributed switch to a standard switch at most will lose a ping.   When you assign a new network adapter a gratuitous arp is sent out the new adapter.   If you only have two network adapters this process does remove network adapter redundancy while moving.

Step 1: Create a VSS with the same port groups

You need to create a standard switch with port groups on the correct VLAN ID’s.   You can do this manually but one of the challenges of the standard switch is the name must be exactly the same including case sensitivity to avoid vMotion errors.  (One great reason for the VDS)  So we need to use a script to create the standard switch and port groups.   Using PowerCLI (sorry orchestrator friends I didn’t do it in Orchestrator this time)

Code:

#Import modules for PowerCLI

    Import-Module -Name VMware.VimAutomation.Core

    Import-Module -Name VMware.VimAutomation.Vds

 

  #Variables to change

    $standardSwitchName = "StandardSwitch"

    $dvSwitchName = "dvSwitch"

    $cluster = "Basement"

    $vCenter = "192.168.10.14"

 

    #Connect to vCenter

    connect-viserver -server $vCenter

 

 

 

  $dvsPGs = Get-VirtualSwitch -Name $dvSwitchName | Get-VirtualPortGroup | Select Name, @{N="VLANId";E={$_.Extensiondata.Config.DefaultPortCOnfig.Vlan.VlanId}}, NumPorts

 

  #Get all ESXi hosts in a cluster

  $vmhosts = get-cluster -Name $cluster | get-vmhost

 

    #Loop ESXi hosts

    foreach ($vmhost in $vmhosts)

    {

        #Create new VSS

        $vswitch = New-VirtualSwitch -VMHost $vmhost -Name $standardSwitchName -Confirm:$false

 

        #Look port groups and create

        foreach ($dvsPG in $dvsPGs)

        {

            #Validate the port group is a number the DVUplink returns an array

            if ($dvsPg.VLANId -is [int] )

            {

                New-VirtualPortGroup -Name $dvsPG.Name -VirtualSwitch $vswitch -VlanId $dvsPG.VLANId -Confirm:$false

            }

 

        }

 

    } 

 

Explained:  

  • Provide variables

  • Connect to vCenter

  • Get all port groups into $dvsPGs

  • Get all ESXi hosts

  • Loop though ESXi hosts one at a time

  • Create the new standard switch

  • Loop through port groups and create them with same name as DVS and VLAN ID

 

This will create a virtual standard switch with the same VLAN and port group configuration as your DVS.    

 

I like to be able to validate that the source and destination are configured the same so this powercli script provides the checking:

Code:

#Validation check DVS vs VSS for differences

 

    $dvsPGs = Get-VirtualSwitch -Name $dvSwitchName | Get-VirtualPortGroup | Select Name, @{N="VLANId";E={$_.Extensiondata.Config.DefaultPortCOnfig.Vlan.VlanId}}, NumPorts

    #Get all ESXi hosts in a cluster

    $vmhosts = get-cluster -Name $cluster | get-vmhost

 

    #Loop ESXi hosts

    foreach ($vmhost in $vmhosts)

    {

        #Write-Host "Host: "$vmhost.Name "VSS: "$standardSwitchName

 

        #Get VSSPortgroups for this host

        $VSSPortGroups = $vmhost | Get-VirtualSwitch -Name $standardSwitchName | Get-VirtualPortGroup

            #Sort based upon name of VSS

            foreach ($dvsPG in $dvsPGs)

            {

                if ($dvsPg.VLANId -is [int] )

                {

                #Write "VSSPortGroup: " $VSSPortGroup.Name

                #Loop on DVS

                $match = $FALSE

                foreach ($VSSPortGroup in $VSSPortGroups)

                {

                    if ($dvsPG.Name -eq $VSSPortGroup.Name)

                    {

                        #Write-Host "Found a Match vss: "$VSSPortGroup.Name" to DVS: "$dvsPG.Name" Host: " $vmhost.name

                        $match = $TRUE

                        $missing = $dvsPG.Name

                    

                    }

 

                }

                if ($match -eq $FALSE)

                {

                    Write-Host "Did not find a match for DVS: "$missing " on "$vmhost.name

 

                }

 

            }

            }

 

    } 

 

Explained:

  • Get the VDS

  • Get all ESXi hosts

  • Loop through VM hosts

  • Get port groups on standard switch

  • Loop though the standard switch port groups and look for matches on DVS

  • If missing then output missing element

 

 

Now we need to give the standard switch an uplink (this is critical otherwise VM’s will fail when moved)

 

Once it has an uplink you can use the following script to move all virtual machines:

 

Code:

#Move Virtual machines to new Adapters

 

    $vms = get-vm 

 

    foreach ($vm in $vms)

      {

        #grab the virtual switch for the hosts 

        $vss = Get-VirtualSwitch -Name $standardswitchname -VMHost $vm.VMHost

        #check that the virtual switch has at least one physical adapter

        if ($vss.ExtensionData.Pnic.Count -gt 0)

        {

        #VMHost

        $adapters = $vm | Get-NetworkAdapter 

 

        #Loop through adapters

        foreach ($adapter in $adapters)

        {

            #Get VSS port group of same name returns port group on all hosts

            $VSSPortGroups = Get-VirtualPortGroup -Name $adapter.NetworkName -VirtualSwitch $standardSwitchName

   

            #Loop the hosts

            foreach ($VSSPortGroup in $VSSPortGroups)

            {

                #Search for the PortGroup on our host

                if ([string]$VSSPortGroup.VMHostId -eq [string]$vm.VMHost.Id)

                {

                    #Change network Adapter to standard switch

                    Set-NetworkAdapter -NetworkAdapter $adapter -Portgroup $VSSPortGroup -Confirm:$false

                }

            }

        }

        }

    } 

 

Explained:  

  • Used same variables from previous script

  • Get all virtual machines (you could use get-vm “name-of-vm” to test a single vm

  • Loop through all virtual machines one at a time

  • Get the VSS for the VM (host specific)

  • Check for at least one physical uplink to switch (gut / sanity check)

  • Loop though the adapters on a virtual machine 

  • For each adapter get VDS port group name and switch the adapter

 

 

 

 

 

Repoint 6.x vCenter to a new PSC

Since a vCenter has a connection to a single PSC it’s important to understand how to move between PSC’s and deploy new ones when old ones have failed.   This article details this mobility and process.   

 

Once installed check for working vCenter

Then login via ssh and check which PSC is being used

 

Let’s repoint it to psc2.griffiths.local

cmsso-util repoint --repoint-psc psc2.griffiths.local

 

Now we are pointing to psc2 at site1.  In 6.0 you were able to repoint a vCenter to different site PSC’s this is no longer available in 6.5 (Yep no longer possible remember this trying to repoint can cause some really bad stuff in 6.5).    

As you can see we have repointed the psc from 1 to 2 at the same site:

So what do you do when all your PSC’s at a site have failed?  (Don’t have a single PSC at a site first off..)   Or this:

Install a new PSC pointing to a remaining site psc we will use psc3 at site2 to create a new PSC5 at site1.  In order to test this I shutdown psc1 and psc2 to simulate failures.  


So we are creating: 

 

After the PSC is installed it will replicate with psc3.griffiths.local only.    We then can repoint vc1 to psc5 and rebuild missing psc’s at site1.    We have to make sure PSC5 was deployed correctly first via visiting it’s webpage:

Now we can repoint the vc to psc5 at site1.