Deep Dive Multi-nic vMotion

Update:  See the comments below this article is not in alignment with VMware support.  At this time VMware support recommends using active/standby for multi-nic vMotion fail over.   – Thanks David for bringing that to my attention.

What is vMotion?

Most of the IT world has heard of vMotion.  If this is new it’s the feature that really put VMware on the map.  It’s the ability to move a virtual machine workload between different compute nodes without interruption of the guest operating system.     Allowing you to migrate off failing hardware or update hardware without interruption to the customers machine.  Quite simply it is awesome.

How does vMotion work?

I can provide a basic overview a lot of the details are IP controlled by VMware. Normal vMotion takes advantage of the following things:

  • Shared storage between cluster members (Same lun’s or volumes)
  • Shared networking between cluster members (same VLAN’s)

The major portion of any virtual machine is data at rest on the shared storage.  The only portion of a virtual machine not on the storage is the execution state and active memory.   vMotion creates a copy of these states (memory active and execution called Shadow VM) and transfers it over the network.   When both copies are almost in sync VMware stuns the operating system for microseconds to transfer the workload to another compute node.   Once transferred to the new compute node the virtual machine sends out a gratuitous arp to update the physical switches with the virtual machines new location.  Memory and execution state are transferred over the vMotion interface.

What is storage vMotion?

Locking into shared storage became a problem for a lot of larger customers.   VMware addressed this issue by providing storage vMotion.  Storage vMotion allows a guest operating system to move between similar compute without shared storage between them or between different storage on the same compute.   The only common requirement was networking and similar execution environment (cpu instructions).   The process is similar except during the final stun both active state and final disk changes are moved.   SvMotion has two types of data : active and cold.   Active data are files that are read and writable.  Cold data applies to any data that is not currently writable.   Some examples of cold data are powered off virtual machines or parents files of the currently active snapshot (only the active snapshot is considered active).

In 5.5 storage vMotion data that is cold is moved across the management network while active data uses the vMotion network.  (If your management network and vMotion network share the same subnet then the lowest vmk nic will be used for all vMotions – which is always management – Always separate your vMotion and management traffic with VLAN’s)

In 6 the cold migration data is moved across the NFC protocol link (designated as Provisioning traffic in vSphere 6).  NFC uses the management network unless you have a designated NFC link (it can be the vMotion interface).  So design consideration create a NFC designated vmkernel nic to avoid having management used.  vMotion is used for all hot data.

Storage vMotion can be offloaded to the array when the array supports VAAI and the movement is on the same array.

What is multi-nic vMotion?

Multi-nic vMotion is the practice of using multiple nics to transfer vMotion data.  Why would you need more nics?

  • Really large memory VM’s (memory and execution state have to cross the wire)
  • Large storage vMotion jobs without shared storage (that will be across the network)
  • Long distance vMotion (vMotion across larger distance than traditional datacenter)

If multi-nic vMotion is configured correctly any vMotion job will be load balanced between all available links, thus increasing the bandwidth available to transfer data.   A single machine vMotion can take advantage of the the multiple links.  This can really help the speed of vMotions.  Multi-nic vMotion does have a cost.  The cost is if you are moving a really large virtual machine you could saturate your links.    The easiest way to understand this is with a overly simple graphic.

Autodeploy

For the sake of this explanation we have two ESXi hosts each connected via two 10Gbps link to the same network switch.   Both are configured to use both links for vMotion.  We initiate a vMotion between the source and destination.  The virtual machine is very large so the movement requires both links and load balances traffic on both links.  Lets follow the movement:

  • vMotion is negotiated between both sides
  • Lets assume that my vMotion requires 7Gbps of traffic on each link for a total of 14Gbps (not really possible but numbers used for example)
  • Source starts using both links and throws 14Gbps at the destination
  • Destination has multi-nic so it can receive 14Gbps without any major issues

This plan makes a pretty big assumption that the source and destination are both able to allocated 14Gbps for the vMotion without effecting current workload.   This is a really bad assumption.   This is why VMware introduced Network I/O control (NIOC).  NIOC provides a method for controlling outbound traffic across links when under contention.   Essentially you give each traffic type a share value (0-100) and during contention all traffic types that are active get their calculated share.   For example if I allocated the following:

  • Management 20
  • vMotion 20
  • VM 60

Assume that my ESXi host is only using management and VM traffic during a time of contention on a single 10GB link I would get:

  • Total shares (Management 20 + VM 60 = 80)
  •  Allocated bandwidth per share (10 / 80 = 0.125)
    • Management Allocated 2.5Gbps (0.125*20)
    • VM 7.5Gbps Allocated (0.125*60)

This is calculated per link not system wide.    This works really well to control traffic on the source but fails to protect the destination.   NIOC has no way to control incoming traffic.  For example:

magic

Let’s assume that the destination host is very busy and only have 1Gbps per link not in use while the source has 10Gbps available per link for the vMotion.   The source initiates the vMotion and floods the destination with 14Gbps of traffic.   Now packets are getting dropped for every time of traffic on the destination ESXi host.  This creates a critical problem.   You cannot control all sources of network traffic into your host.   In order to combat this issue VMware provided limits in network traffic.  This allows you to identify types of traffic and have ESXi throttle that traffic when it becomes too much.   This overloading is not unique to multi-nic vMotion but can be complicated quickly by the load multiple nics can provide.

 

How do I setup Multi-nic vMotion?

It is very much like iSCSI connections you setup each vmkernel interface with its own ip address and bind it to a single uplink.  So if you have two uplinks you need two ip addresses, two vmkernel interfaces for vMotion each bound to a single uplink with no fail over.  If a uplink is removed that vmkernel interface for vMotion will not be used.

Should I use Multi-nic vMotion?

This is a great question and the answer is it depends.   I personally think the configuration settings to implement multi-nic vMotion is minimal but could be a major problem in larger shops.   All of the problems with multi-nic vMotion are present with standard vMotion.   You really should consider NIOC and potentially limits for any design that is not grossly over sized.   If you plan on using multi-nic vMotion I think you need NIOC and limits at least a limit on vMotion traffic.    Let me know what you think and your experience with this feature.

 

6 Replies to “Deep Dive Multi-nic vMotion”

    1. Thanks for reading. I agreed with your statement that my recommendation is not the same as the KB. My concern is during a failure scenario. If you use fail over as per the KB then you have to ensure that limits are 100% set on vMotion links or you may find that your vMotion interfaces are using way too much bandwidth from a incomming vMotion. When a failure occurs and a single vMotion link is left the traffic limit can we set and predictability achieved. For example:

      Assume I have two links both 10GB and with two vMotion interfaces. vMotion1 is tied to uplink1 with uplink2 as standby. vMotion2 is tied to uplink2 with uplink1 as standby. I have a limit of 5GB on each vMotion vmk. Remember that NIOC only works for outgoing traffic the only way to control incoming traffic is via limits. So assume that I am vMotioning a large VM from another host. My destination host has a uplink1 failure. Now it’s possible for the incoming vMotion to eat up all bandwidth (10GB) and cause a VM’s and management to be unreachable. You could control this by putting a 2.5GB limit but then when both links are up you are limited to 5GB total vMotion instead of 10GB (5GB limit). Adding more links does not increase the number of vMotions it should increase the total bandwidth available. So if you do a 5GB limit and vMotion1 tied to uplink1 only, vMotion2 tied to uplink2 only you get the same results and more predictability and control.

      With all this being said there is a reason why VMware support has documented this setting and I would take it for a support reason if nothing else… You cannot point to my article as a source of supported truth when in opposition to VMware support KB’s.

  1. Do you recommend that we use separate vSwitches for each port group/uplink, or put both port groups/uplinks in the same vSwitches? For my multi-NIC iSCSI setup, I have a separate vSwitch for each port group/uplink which makes the configuration more straight forward. Can the same approach be used for Multi-NIC vMotion?

    1. Good question technically if you are sharing the uplinks with other traffic then you should use the same switch otherwise it’s just your choice for managability.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.