Deep Dive Multi-nic vMotion

Update:  See the comments below this article is not in alignment with VMware support.  At this time VMware support recommends using active/standby for multi-nic vMotion fail over.   – Thanks David for bringing that to my attention.

What is vMotion?

Most of the IT world has heard of vMotion.  If this is new it’s the feature that really put VMware on the map.  It’s the ability to move a virtual machine workload between different compute nodes without interruption of the guest operating system.     Allowing you to migrate off failing hardware or update hardware without interruption to the customers machine.  Quite simply it is awesome.

How does vMotion work?

I can provide a basic overview a lot of the details are IP controlled by VMware. Normal vMotion takes advantage of the following things:

  • Shared storage between cluster members (Same lun’s or volumes)
  • Shared networking between cluster members (same VLAN’s)

The major portion of any virtual machine is data at rest on the shared storage.  The only portion of a virtual machine not on the storage is the execution state and active memory.   vMotion creates a copy of these states (memory active and execution called Shadow VM) and transfers it over the network.   When both copies are almost in sync VMware stuns the operating system for microseconds to transfer the workload to another compute node.   Once transferred to the new compute node the virtual machine sends out a gratuitous arp to update the physical switches with the virtual machines new location.  Memory and execution state are transferred over the vMotion interface.

What is storage vMotion?

Locking into shared storage became a problem for a lot of larger customers.   VMware addressed this issue by providing storage vMotion.  Storage vMotion allows a guest operating system to move between similar compute without shared storage between them or between different storage on the same compute.   The only common requirement was networking and similar execution environment (cpu instructions).   The process is similar except during the final stun both active state and final disk changes are moved.   SvMotion has two types of data : active and cold.   Active data are files that are read and writable.  Cold data applies to any data that is not currently writable.   Some examples of cold data are powered off virtual machines or parents files of the currently active snapshot (only the active snapshot is considered active).

In 5.5 storage vMotion data that is cold is moved across the management network while active data uses the vMotion network.  (If your management network and vMotion network share the same subnet then the lowest vmk nic will be used for all vMotions – which is always management – Always separate your vMotion and management traffic with VLAN’s)

In 6 the cold migration data is moved across the NFC protocol link (designated as Provisioning traffic in vSphere 6).  NFC uses the management network unless you have a designated NFC link (it can be the vMotion interface).  So design consideration create a NFC designated vmkernel nic to avoid having management used.  vMotion is used for all hot data.

Storage vMotion can be offloaded to the array when the array supports VAAI and the movement is on the same array.

What is multi-nic vMotion?

Multi-nic vMotion is the practice of using multiple nics to transfer vMotion data.  Why would you need more nics?

  • Really large memory VM’s (memory and execution state have to cross the wire)
  • Large storage vMotion jobs without shared storage (that will be across the network)
  • Long distance vMotion (vMotion across larger distance than traditional datacenter)

If multi-nic vMotion is configured correctly any vMotion job will be load balanced between all available links, thus increasing the bandwidth available to transfer data.   A single machine vMotion can take advantage of the the multiple links.  This can really help the speed of vMotions.  Multi-nic vMotion does have a cost.  The cost is if you are moving a really large virtual machine you could saturate your links.    The easiest way to understand this is with a overly simple graphic.

Autodeploy

For the sake of this explanation we have two ESXi hosts each connected via two 10Gbps link to the same network switch.   Both are configured to use both links for vMotion.  We initiate a vMotion between the source and destination.  The virtual machine is very large so the movement requires both links and load balances traffic on both links.  Lets follow the movement:

  • vMotion is negotiated between both sides
  • Lets assume that my vMotion requires 7Gbps of traffic on each link for a total of 14Gbps (not really possible but numbers used for example)
  • Source starts using both links and throws 14Gbps at the destination
  • Destination has multi-nic so it can receive 14Gbps without any major issues

This plan makes a pretty big assumption that the source and destination are both able to allocated 14Gbps for the vMotion without effecting current workload.   This is a really bad assumption.   This is why VMware introduced Network I/O control (NIOC).  NIOC provides a method for controlling outbound traffic across links when under contention.   Essentially you give each traffic type a share value (0-100) and during contention all traffic types that are active get their calculated share.   For example if I allocated the following:

  • Management 20
  • vMotion 20
  • VM 60

Assume that my ESXi host is only using management and VM traffic during a time of contention on a single 10GB link I would get:

  • Total shares (Management 20 + VM 60 = 80)
  •  Allocated bandwidth per share (10 / 80 = 0.125)
    • Management Allocated 2.5Gbps (0.125*20)
    • VM 7.5Gbps Allocated (0.125*60)

This is calculated per link not system wide.    This works really well to control traffic on the source but fails to protect the destination.   NIOC has no way to control incoming traffic.  For example:

magic

Let’s assume that the destination host is very busy and only have 1Gbps per link not in use while the source has 10Gbps available per link for the vMotion.   The source initiates the vMotion and floods the destination with 14Gbps of traffic.   Now packets are getting dropped for every time of traffic on the destination ESXi host.  This creates a critical problem.   You cannot control all sources of network traffic into your host.   In order to combat this issue VMware provided limits in network traffic.  This allows you to identify types of traffic and have ESXi throttle that traffic when it becomes too much.   This overloading is not unique to multi-nic vMotion but can be complicated quickly by the load multiple nics can provide.

 

How do I setup Multi-nic vMotion?

It is very much like iSCSI connections you setup each vmkernel interface with its own ip address and bind it to a single uplink.  So if you have two uplinks you need two ip addresses, two vmkernel interfaces for vMotion each bound to a single uplink with no fail over.  If a uplink is removed that vmkernel interface for vMotion will not be used.

Should I use Multi-nic vMotion?

This is a great question and the answer is it depends.   I personally think the configuration settings to implement multi-nic vMotion is minimal but could be a major problem in larger shops.   All of the problems with multi-nic vMotion are present with standard vMotion.   You really should consider NIOC and potentially limits for any design that is not grossly over sized.   If you plan on using multi-nic vMotion I think you need NIOC and limits at least a limit on vMotion traffic.    Let me know what you think and your experience with this feature.

 

Memory Management in ESXi 6

A good friend recently reviewed the what’s new in vSphere 6 course and has some questions.  That generated a number of really great discussions and this blog article.   At about the same time I had a customer asking why their virtual machine was showing ballooning even thou there was no memory pressure.   This generated some research and though organization into this article:

Memory reclamation techniques

VMware uses a number of memory reclamation techniques when under pressure.   The implemented methods get worse in terms of overall cost as the pressure increases.  I don’t want to dedicate much time or duplicate others blog entries.  The following processes are in use:

  • Transparent Page Sharing – This (pre 5.5 U4) used to share common pages between all guests on the same host.  Now it’s shared pages on each VM only due to security concerns.  See this article
  • Guest Ballooning – This is a in guest driver that asks the guest operating system for system resources to force the guest to swap intelligently
  • Memory Compression – Pages are compressed and stored on a cache setup on main memory (set to 10% of total memory by default)
  • Hypervisor Swapping – Hypervisor swaps pages to disk… major effect on performance.

minFree the state to rule them all

minFree is an internal metric used by ESXi to denote when the reclamation techniques should be used.   Each technique is trigged once a limit of minFree is reached.  minFree is based upon the total RAM available on a ESXi host.  You determine minFree using these rules:

  1. First 28GB of physical memory in host = 899MB
  2. Add 1% of remaining physical memory to the 899MB value in step 1

 

For example here are some common minFree numbers:

Total RAM

MinFree

28GB

899MB

48GB

1099MB

72GB

1349.56MB

144GB

2086.84MB

244GB

3233.72MB

5.5 Overview with Memory

Each version of ESXi has memory states that are tied to which technique gets used in 5.5 the states are as follows:

High 100 % of minFree – TPS

Soft 64 % of minFree – Ballooning

Hard 32 % of minFree – Memory compression

Low 16% of minFree – Swapping

So on a 144GB host minFree =2086.84MB it would look like this:

Free

Used

State

Memory Reclamation method

2.03GB

141.97GB

High

TPS enabled

1.30GB

142.7GB

Soft

Ballooning

.65GB

143.3GB

Hard

Memory Compression

.32GB

143.68GB

Low

Swapping

This model worked well but as you can see the difference between hypervisor swapping and TPS is a very little 1.5GB.   Two virtual machines could consume this at the same time making it impossible for TPS to break large pages down and save space.   This was a common problem that your host would go from a high state directly to low and swapping.   Normally when you reach swapping things go bad for your applications.

 

Changes in ESXi 6

vSphere 6 added an additional state to allow memory pages to be broken down (from perhaps 2MB to 4 kb) a lot sooner I believe this is due in part to two factors:

  • The change to TPS to no longer be between VM’s but only on the same VM – meaning very few large pages will be TPS but breaking into 4KB’s might provide some savings
  • The small overhead that ESXi gives between states – even on large memory hosts the difference between TPS and hypervisor swapping is 2GB’s or a single virtual machine

 

So the change introduces the clear state to replace the 5.5 high state as detailed below:

High 400 % of minFree – Large page break

Clear 100% of minFree – TPS begins

Soft 64 % of minFree – Ballooning

Hard 32 % of minFree – Memory compression – compressed and swapped out

Low 16% of minFree – Swapping

This new clear state allows the pages to be broken much sooner before TPS is enacted.   The odd thing is every ESXi host I have ever seen is in a high state even before it gets to 400% or 100% of minFree.   These are the documented levels but the state seems to be static unless a lower status is achieved.

 

How to identify which state you are in

Use esxtop on the ESXi host.   Choose M for memory and the state is listed at the top right side:

Capture

You can identify current virtual machine ballooning, swapping and compression via esxtop:

Capture

Deep Dive: Network Health check

vSphere 5.1 introduced one of my favorite new features.  Network health check.  This feature is designed to identify problems with MTU and VLAN settings.   It is easy enough to set up MTU and VLAN’s in ESXi especially with a dVS.  In most environment the vSphere admins don’t control the physical switches making confirmation of upstream configuration hard.    The health check resolves these issues.  It is only available on dVS switches and only via the web client. (I know time to start using that web client.. your magical fat client is going away) If you have an upstream issue with MTU then you will get an alert in vCenter.   You can find the health check by selecting the dVS and clicking on the manage tab.  On the middle pane you will see Health check which you can edit and enable.   You came here because you want to know how it works.

 

MTU

MTU check is easy.   Each system sends out a ping message to the other nodes.  This ping message has a special header that tells the network not to fragment (split) the packet.   In addition it has a payload (empty data) to make the ping the size of the max MTU.   If the host get’s a return message from the ping it knows the MTU is correct.  If it fails then we know MTU is bad.   Each node checks it’s MTU at an interval.   You can manually check your MTU with vmkping but the syntax has changed between 5.0,5.1 and 5.5 so look up the latest syntax.

 

VLAN

Checking the VLAN is a little more complex.    Each VLAN has to be checked.   So one host on the same vDS (not sure which one but I am willing to bet it’s the master node) sends out a broadcast layer 2 packet on the VLAN.  Then it waits for each node to reply to the broadcast via unicast layer 2 packet.   You can determine which hosts have VLAN issues based upon who reports back.   I assume that host marked as bad then try’s to broadcast as a method to identify failed configuration or partitions.   This test is repeated on each VLAN and at regular intervals. It only works when two peers can connect.

Teaming policy

In ESXi 5.5 they added a check for teaming policy to physical switch.  This check identifies mismatches between IP Hash teaming and switches that are not configured in etherchannel/LACP.

 

Negative Effect of Health check

So why should I not use health check?  Well it does produce some traffic.  It does require you to use the web client to enable and determine which vlan’s are bad…  otherwise I cannot figure out a reason to not use it.   A simple and easy way to determine issues.

Design Advice on health check

Health check is a proactive way to determine upstream vlan or MTU issues before you deploy production to that VLAN.  It saves a ton of time when troubleshooting and fighting between networking and server teams.  I really cannot see a reason to not use it.    I have not tested the required bandwidth but it cannot be huge.   My two cents turn it on if you have a vDS… if you don’t have vDS I hope you only have ten or less VLAN’s.

Deep Dive: vSphere Traffic Shaping

Traffic Shaping is all about the bad actor scenario.  We have 100’s of virtual machines that all get along with each other.  The application team deploys a appliance that goes nuts and starts to use it’s link 100%.  Suddenly you get a call about database and website outages.  How do you deal with the application teams bad actor? According to SmartlyHeated.com, this is the most common reason why every apartment has it’s own water heater. My wife would be very unhappy if she could not take her hot shower in the morning because Bob upstairs took an extra long shower an hour ago, especially today, since we just installed the best shower head we could find! Sharing resources are great as long as resources are unlimited, not over provisioned or usage patterns stay static.  In a real world none of those things are true.  You are likely limited on resources, over provisioned and your traffic patterns change every single day.   Limits allow us to create constraints upon portions of resources in order control bad actors.

Limits (available on any type of switch)

Limits are as expected limits that a machine cannot cross.  This allows a machine to see a 10GB uplink but only use 1GB at most.  This injected slow down is into the communication stream via normal protocol methods.   The limit settings in VMware can be applied on the port group or on dvPort or dvPort Group.  Notice the difference on dVS switches we can apply limits on ports as well as port groups.  Limits can be applied on standard switches via outbound traffic while a dVS can be inbound and outbound.  There are three options on limits:

  • Average bandwidth = Average number of  bit’s per second to allow across the port
  • Peak bandwidth – Max bits per second to allow across a port when it’s utilizing it’s burst traffic, this limits the bandwidth used by the port when using it’s burst.
  • Burst Size – Max bits per second to allow in a burst.  This is the number of bytes allocated to burst when allocation over the average is required.  This can be viewed as a bank when you don’t use all your average bandwidth it can be stored up to the burst size to be used when needed.

 

Limits of the Limits

Limits produce some well… limits.   Limits are always enforced.  Meaning even if bandwidth is  available it will not be allocated to the port group/ port.  Limits on VSS’s are outbound only meaning you can still flood a switch.  Limits are not reservations.  Machines without limits can consume all available resources on a system.  So effectively limits are only useful to stop a bad actor from everyone else.  It is not a sharing method.  Limits on network do have their place but I would avoid general use if possible.

 

Network IO Control a better choice

Network IO Control (NIOC) is available only on the vDS switch.  It provides a solution to the bad actor symptom while providing flexibility.  NIOC is applied to outbound traffic.  NIOC works very much like resource pools with compute and memory.  You setup a NIOC share (resource pool) with a number between 1 and 100.   vSphere comes with some system defined NIOC shares like vMotion and management.  You can also defined new resource pools and assign them to port groups.  NIOC only comes into play during times of contention on the uplink.  All NIOC Shares are calculated on a uplink by uplink basis.  All the active traffic types on the uplink shares are added together.  For example assume my uplink has the following shares:

  • Management 10
  • vMotion 20
  • iSCSI 40
  • Virtual machines 50

If contention arises and only Management, iSCSI and virtual machines are active we would have 100 total shares.  This number is then used to divide the total available bandwidth on that uplink.  Let’s assume we have a 10GB uplink.  The each active traffic type would get based on shares:

  • Managment 1GB
  • iSCSI 4GB
  • Virtual machines 5GB

This example also assumes they are using 100% of their available links.  If management is only using 100MB the others will get it’s left over amount divided by their share amount (in this case 900mb/90 then 40 assigned to iSCSI and 50 assigned to virtual machine).   If a new traffic type comes into play then the shares are recalculated to meet the demands.   This allows you to create worst case scenarios to ensure traffic types for example:

  • Management will get at least 1GB
  • vMotion will get at least 2GB
  • iSCSI will get at least 4GB
  • Virtual machines will get at least 5GB

There is one wrinkle to this plan with multi-nic vMotion but I will address that in another post.

 

Design Choices

Limits have their uses.  They are hard to manage and really hard to diagnose… Imagine coming into a vSphere environment where limits are in place but you did not know.   It could take a week to figure out that was causing the issues.   My vote use them sparingly.   NIOC on the other hand should be used in almost every environment with Enterprise Plus licenses.   It really has no draw back and provides controls on traffic.

Deep Dive: vSphere Network Load Balancing

In vSphere load balancing is a hot topic.   As load size per physical host increases so does the need for more bandwidth.  In a traditional sense this was done with etherchannel or LACP.  This bonds together multiple links so they link and act like a single link.   This helps avoid loops.

What the heck is a loop?

A loop is anytime two layer 2 (ethernet) endpoints have multiple connections to each other.

 

It is possible with two virtual switches to create a bridged loop if care is not taken.   Virtual switches by default will not create loops.  On the physical switch side protocols like spanning tree were created to solve this link issue.  STP disables a link if a loop is detected.  If the enabled link goes down STP turns on the disabled link.   This process works for redundancy but does not do anything if link 1 is not a big enough pipe to handle the full load.    VMware has  provided a number of load balancing algorithms to provide more bandwidth.

Options

  • Route Based on Originating virtual port (Default)
  • Route Based on IP Hash
  • Route Based on Source MAC Hash
  • Route Based on Physical NIC Load (LBT)
  • Use Explicit Failover Order

 

In order to explain each of these options assume we have a ESXi host with two physical network cards called nic1 and nic2.   It’s important to understand that the load balancing options can be configured at the network switch or port group level allowing for lots of different load balancing on the same server.

Route Based on Originating virtual port (Default)

The physical nic to be used is determined by the ID of the virtual port to which the VM is connected.  Each virtual machine is connected to a virtual switch which has a number of virtual ports, each port has a number.   Once assigned the port does not change unless the host changes ESXi hosts.  This number is the virtual ID.   I don’t know the exact method used but I assume it’s something as simple and odd’s and evens for two nics.  Everything odd goes to port 1 while even goes to port 0.  This method has the lowest overhead from a virtual switch processing, and works with any network configuration.  It does not require any special physical switch configuration.  You can see though it does not really load balance.  Lets assume you have a lot of port groups with only virtual machine on port 0.  In this case all virtual machines would use the same uplink leaving the other unused.

Route Based on IP Hash

The physical nic to be used is determined by a hash of the source and destination IP address.   This method provides load balancing to multiple physical network cards from a single virtual machine.  It’s the only method that allows a single virtual machine to use the bandwidth of multiple physical nics.  It has one major draw back the physical switches must be configured to use etherchannel (802.3ad link aggregation) so they present both network links as a single link to avoid problems.   This is a major design choice.  It also does not provide perfect load balancing.  Lets assume that you have a application server that does 80% of it’s traffic with a database server.  Their communication will always happen across the same link.  They will never use the bandwidth of two links.  Their hash will always assign them the same link. In addition this method uses a lot of CPU.

  • When using etherchannel only a single switch may be used
  • Beacon probing is not supported on IP Hash
  • vDS is required for LACP
  • Troubleshooting is difficult because each destination/source combination may take a different path.  (Some virtual machine paths may work with others will not in a non-consistent pattern.)

Route Based on Source Mac Hash

The physical nic to be used is determined by a hash created from the virtual machines source address.  This method provides a more balanced approach to load balancing than originating virtual port.  Each virtual machine will always use only a single link but load will be distributed.  This method has a low CPU overhead and does not require any physical switch configuration

Route Based on Physical NIC Load (Distributed Virtual Switch Required also called LBT)

The physical nic to be used is determined by load.  The nics are used in order (nic1 then nic2)  No traffic will be moved to nic2 untile nic1 is utilized above 75% capacity for 30 seconds.  Once this is achieved traffic flows are moved to the next available nic.  They will stay at that nic until another LBT event happens moving traffic.   LBT does require the dVS and some CPU overhead.  It does not allow a single virtual machine to gain more than 100% of a single link speed.   It does balance traffic among all links during times of contention.

Use Explicit Fail over

The physical nic to be used is determined by being the highest nic on the list of available nics.  The others will not be used unless the first nic is unavailable.  This method does no load balancing and should only be used is very special cases (link multi-nic vMotion).

 

Design Advice

Which one should you use?  It depends on your need.  Recently a friend told me they never changed the default because they never get close to using a single link.   While this method has merit and I wish more people understood their network metrics you may need to plan for the future.  There are two questions I use to determine which to use:

  • Do you have any virtual machines that alone require more than a single links bandwidth? (If yes then the only option is IP Hash and LACP or etherchannel)
  • Do you have vDS’s? (If yes then use Route based on physical nic load, if no then use default or source MAC)

Simply put the LBT is a lot more manageable and easy to configure.

Deep Dive: Virtual Switch Security settings and Port Binding

Security Settings:

Three options are available on a virtual switch.  These settings can be set at the switch layer then overwritten on individual port groups.

  • Promiscuous Mode – This allows the guest adapter to detect all frames passed on the vSwitch that are in the same VLAN as the guest.  Allows for packet sniffing.  This is not port mirroring it only allows a host to see it’s own traffic and any broadcast traffic.
  • MAC Address Change – Allows the guest to change it’s mac address.  If set to reject all frames for the mac not in the .vmx file are dropped at the switch.
  • Forged Transmits – If set to reject all frames from the guest with a mac address that does not match the .vmx file are dropped.

Security settings advise:

Set all three to reject on the switch keeping your operating systems admins in a box while protecting shared resources.   Then add individual polices to each port group as needed.   If you are wondering where it’s needed one of the use cases is nested virtualization.. which requires all three to be set to accept.

Port Binding:

Port binding is a setting that allows you to determine how and when the ports on a virtual switch are allocated.  Currently there are three port binding options:

  • Static binding (default)
  • Dynamic binding
  • Ephemeral binding

Static Binding – means a port is allocated to a virtual machine when it is added to the port group.  Once allocated to the port group it continues to use the port until removed from the port group (via deletion or move to another port group).  Network stats with static binding is kept through power off and vMotion.

Dynamic Binding – will be removed in the near future. Ports are allocated only when a virtual machine is powered on and the virtual network card is connected.  They are dynamically allocated when needed.  Network stats are kept through vMotion but not power off.

Ephemeral Binding – Is a lot like a standard vSwitch it can be managed from the vCenter or ESXi host.  Ports are allocated when the host is powered on and nic is connected.  One major difference is that dvPorts are created on demand all other binding type creates them when the port group is created.  This process takes more RAM and processor power and so their are limits on the number of ephemeral ports available.  Ephemeral ports are used for recovery when vCenter is down and may help with vCenter availability.  All stats are lost when you vMotion or power off the virtual machine.

Port Group Type advice:

I would use static binding on almost everything.  Ephemeral has a high cost and does not scale.  I do personally use ephemeral for vCenter because I use 100% dVS switches.  If you are using standard switches just use static across the board.

 

Deep Dive: Standard Virtual Switch vs Distributed Virtual Switch

Let the wars begin.  This article will discuss the current state of affairs between virtual switches in ESXi.   I have not included any third party switches because I believe them to becoming quickly not part of the picture with NSX.

 

Whats the deal with these virtual switches?

Virtual switches are a lot like ethernet layer 2 switches.  They have a lot of the same common features.  Both switch types feature the following configurable items:

  • Uplinks – connections from the virtual switch to the outside world – physical network cards
  • Port Groups – groups of virtual ports with similar configuration

In addition both switch types support:

  • Layer 2 traffic handling
  • VLAN segmentation
  • 801.1 Q tagging
  • nic teaming
  • Outbound traffic shaping

So the first question everyone ask’s is if two virtual machines are in the same vlan and on the same server does their communication leave the server?

No… communication between the two vm’s on the same ESXi host can communicate without leaving the switch.

 

Port Groups what are they?

Much like the name suggests port groups are groups of ports..  They can be best described as a number of virtual ports (think physical port 1-10) that are configured the same.  Port groups can have a defined number of ports and expanded at will (like a 24 port switch or 48 port switch)  There are two generic types of port groups:

  • Virtual machine
  • VMkernel

Virtual machine port groups is for guest virtual machines.  VMkernel port groups are for ESXi management functions and storage. The follow are valid uses for VMkernel ports

  • Management Traffic
  • Fault Tolerance Traffic
  • IP based storage
  •  vMotion traffic

You can have one or many port groups for VMkernel but each requires a valid IP address that can reach other VMkernel ports in the cluster.

At time of writing (5.5) the follow maximum’s apply

  • Total switch ports per host: 4096
  • Maximum active ports: 1016
  • Port groups per standard switch:512
  • Port groups per distributed switch: 6500
  • VSS port groups per host: 1000

So as you can see vDS scales a lot higher.

Standard Virtual Switch

The standard switch has one real advantage.  It does not require enterprise plus licensing to use.  It has a lot less features and some draw backs including:

  • No configuration sync – you have to create all port groups exactly the same on each host or lots of things will fail (even upper case vs lower case will cause it to fail)

Where do standard switches make sense?  Small shops with a single port group they make a lot of sense.  If you need to host 10 virtual machine on the same subnet then standard switches will work fine.

Advice

  • Use scripts to deploy switches and keep them in sync to avoid manual errors
  • Always try vMotions between all hosts before after each change to ensure nothing is broken
  • Don’t go complex on your networking design – it will not pay off

Distributed Virtual Switch

Well the distributed virtual switch is a different animal.  It is configured by vCenter and deployed to each ESXi host.  The configuration is in sync.  It has the following additional features:

  • Inbound Traffic Shaping – Throttle incoming traffic to the switch – useful to slow down traffic to a bad neighbor
  • VM network port block – Block the port
  • Private VLAN’s – This feature requires switches that support PVLAN so you can create VLAN’s inbetween vlans
  • Load – Based teaming – Best possible load balancing (another article on this topic later)
  • Network vMotion – Because the dVS is owned by vCenter traffic stats and information can move between hosts when a virtual machine moves… on a standard switch that information is lost with a vMotion
  • Per port policy – dVS allows you to define policy at the port level instead of port group level
  • Link Layer Discoery Protocol – LLDP enables virtual to physical port discovery (your network admins can see info on your virtual switches and you can see network port info – great for troubleshooting and documentation)
  • User defined network i/o control – you can shape outgoing traffic to help avoid starvation
  • Netflow – dVS can output netflow traffic
  • Port Mirroring – ports can be configured to mirror for diagnostic and security purposes

As you can see there are a lot of features on the vDS with two draw backs:

  • Requires enterprise plus licensing
  • Require vCenter to make any changes

The last draw back has provided a number of hybrid solutions over the years.  At this point VMware has created a work around with the empherial port group type and the network recovery features of the console.

Advice in using:

  • Backup your switch with PowerCli (a number of good scripts out there)
  • Don’t go crazy just because you can if you don’t need the feature don’t use it
  • Test your vCenter to confirm you can recover from a failure

So get to the point which one should I use?

Well to take the VCDX model here are the elements of design:

Availability

  • VSS – deployed and defined on each ESXi host no external requirements + for availability
  • dVS – deployed and defined by vCenter and requires it to provision new ports/ port groups – for availability

Manageability

  • VSS – pain to manage in most environments and does not scale with lots of port groups or complex solutions – for manageability
  • dVS – Central management can be deployed to multiple hosts or clusters at the same datacenter + for manageability

Performance

  • VSS – performance is fine no effect on quality
  • dVS – performance is fine no effect on quality other than it can scale up a lot larger

Recoverability

  • VSS – is deployed to each host and stored on each host… if you loose it you have to rebuild from scratch and manually add vm’s to the new switch – for recoverability
  • dVS – is deployed from vCenter and you always have it as long as you have vCenter.  If you loose vCenter you have to start from scratch and cannot add new hosts.  (don’t remove your vCenter it’s a very bad idea) + as long as you have a way to never loose your vCenter (does not exist yet)

Security

  • VSS – Offers basic security features not much more
  • dVS – Wider range of security features + for security

 

End Result:

dVS is better is most ways but costs more money.   If you want to use dVS it might be best to host vCenter on another cluster or ensure it’s availability.