Deep Dive: vSphere Traffic Shaping

Traffic Shaping is all about the bad actor scenario.  We have 100’s of virtual machines that all get along with each other.  The application team deploys a appliance that goes nuts and starts to use it’s link 100%.  Suddenly you get a call about database and website outages.  How do you deal with the application teams bad actor? According to SmartlyHeated.com, this is the most common reason why every apartment has it’s own water heater. My wife would be very unhappy if she could not take her hot shower in the morning because Bob upstairs took an extra long shower an hour ago, especially today, since we just installed the best shower head we could find! Sharing resources are great as long as resources are unlimited, not over provisioned or usage patterns stay static.  In a real world none of those things are true.  You are likely limited on resources, over provisioned and your traffic patterns change every single day.   Limits allow us to create constraints upon portions of resources in order control bad actors.

Limits (available on any type of switch)

Limits are as expected limits that a machine cannot cross.  This allows a machine to see a 10GB uplink but only use 1GB at most.  This injected slow down is into the communication stream via normal protocol methods.   The limit settings in VMware can be applied on the port group or on dvPort or dvPort Group.  Notice the difference on dVS switches we can apply limits on ports as well as port groups.  Limits can be applied on standard switches via outbound traffic while a dVS can be inbound and outbound.  There are three options on limits:

  • Average bandwidth = Average number of  bit’s per second to allow across the port
  • Peak bandwidth – Max bits per second to allow across a port when it’s utilizing it’s burst traffic, this limits the bandwidth used by the port when using it’s burst.
  • Burst Size – Max bits per second to allow in a burst.  This is the number of bytes allocated to burst when allocation over the average is required.  This can be viewed as a bank when you don’t use all your average bandwidth it can be stored up to the burst size to be used when needed.

 

Limits of the Limits

Limits produce some well… limits.   Limits are always enforced.  Meaning even if bandwidth is  available it will not be allocated to the port group/ port.  Limits on VSS’s are outbound only meaning you can still flood a switch.  Limits are not reservations.  Machines without limits can consume all available resources on a system.  So effectively limits are only useful to stop a bad actor from everyone else.  It is not a sharing method.  Limits on network do have their place but I would avoid general use if possible.

 

Network IO Control a better choice

Network IO Control (NIOC) is available only on the vDS switch.  It provides a solution to the bad actor symptom while providing flexibility.  NIOC is applied to outbound traffic.  NIOC works very much like resource pools with compute and memory.  You setup a NIOC share (resource pool) with a number between 1 and 100.   vSphere comes with some system defined NIOC shares like vMotion and management.  You can also defined new resource pools and assign them to port groups.  NIOC only comes into play during times of contention on the uplink.  All NIOC Shares are calculated on a uplink by uplink basis.  All the active traffic types on the uplink shares are added together.  For example assume my uplink has the following shares:

  • Management 10
  • vMotion 20
  • iSCSI 40
  • Virtual machines 50

If contention arises and only Management, iSCSI and virtual machines are active we would have 100 total shares.  This number is then used to divide the total available bandwidth on that uplink.  Let’s assume we have a 10GB uplink.  The each active traffic type would get based on shares:

  • Managment 1GB
  • iSCSI 4GB
  • Virtual machines 5GB

This example also assumes they are using 100% of their available links.  If management is only using 100MB the others will get it’s left over amount divided by their share amount (in this case 900mb/90 then 40 assigned to iSCSI and 50 assigned to virtual machine).   If a new traffic type comes into play then the shares are recalculated to meet the demands.   This allows you to create worst case scenarios to ensure traffic types for example:

  • Management will get at least 1GB
  • vMotion will get at least 2GB
  • iSCSI will get at least 4GB
  • Virtual machines will get at least 5GB

There is one wrinkle to this plan with multi-nic vMotion but I will address that in another post.

 

Design Choices

Limits have their uses.  They are hard to manage and really hard to diagnose… Imagine coming into a vSphere environment where limits are in place but you did not know.   It could take a week to figure out that was causing the issues.   My vote use them sparingly.   NIOC on the other hand should be used in almost every environment with Enterprise Plus licenses.   It really has no draw back and provides controls on traffic.

2 Replies to “Deep Dive: vSphere Traffic Shaping”

  1. Joseph, this is good content. Very clearly written and touching the points of clarification missing in VMware docs. Please write more. I’m interested in what you have to say about multi-NIC vmotion, details of NFS load balancing, etc.

Leave a Reply to Joseph Griffiths Cancel reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.