How to operationalize NSX

I was recently invited to join Rene Van Den Bedem to discuss how to operationlize NSX and best practices.  It was a good Webcast you can view it here:

http://bit.ly/2aPbwaZ

 

Enjoy

vRO Scriptable task to kill idle vCenter sessions

Update: A new friend provided an update to the script to validate that terminate.keys is populated, I added it to script below.

This is a recommended best practice.  Kill vCenter sessions that are older than 24 hours.   This scriptable task will kill all idle sessions that are older than the variable maxidletime.   Cut and paste and go.

//get connection to gather sessions
var vcenters=VcPlugin.allSdkConnections;

//object to hold sessions we force close
var sessions_closed = [];

for each (vcenter in vcenters)
{
	// array for keys of sessions to terminate per vCenter
	var terminate_keys = [];
	//get session list
	sessions = vcenter.sessionManager.sessionList;
	//go through each session
	for each (session in sessions)
	{
		//Get current time since epoch in seconds
		var currenttime = new Date().getTime() / 1000;
		//Get lastActiveTime since epoch on this session
		var loginTime = session.lastActiveTime.getTime() / 1000;
		//Figure out the amount of idletime in minutes
		idletime = ((currenttime - loginTime)/60);
		// if idletime is longer than maxidletime
		if ( idletime > maxidletime)
		{
			//create a string of userName for compare to remove elements that should not be killed
			username = session.userName;
			
			
				//if a session is going to be killed push into an array for alerting
				sessions_closed.push({user: session.userName, loginTime: session.loginTime, idleTime: session.idleTime, agent: session.userAgent, extention: session.extention, callCount: session.callCount, vcenter: vcenter.id, session: sesstion.key});
				//Load sessions id into kill array
				terminate_keys.push(session.key);
		} // end of sessions that match idletime
	} // end of each session
	// kill all sessions in terminate_keys
        if (terminate_keys.length > 0 ) {
                     System.log("Killing sessions for " +vcenter.name);
                     kill = vcenter.sessionManager.terminateSession(terminate_keys);
        }
        else
        {
                      System.log("No sessions to kill for " + vcenter.name);
        }

} // end of for each vCenter


if (testing)
{
	//Logging to debug
	for each (s in sessions_closed)
	{
		System.log("Name: " + s.user + " agent " + s.agent + " CallCount " + s.callCount + " vcenter " + s.vcenter);

	}

}


Deep Dive Multi-nic vMotion

Update:  See the comments below this article is not in alignment with VMware support.  At this time VMware support recommends using active/standby for multi-nic vMotion fail over.   – Thanks David for bringing that to my attention.

What is vMotion?

Most of the IT world has heard of vMotion.  If this is new it’s the feature that really put VMware on the map.  It’s the ability to move a virtual machine workload between different compute nodes without interruption of the guest operating system.     Allowing you to migrate off failing hardware or update hardware without interruption to the customers machine.  Quite simply it is awesome.

How does vMotion work?

I can provide a basic overview a lot of the details are IP controlled by VMware. Normal vMotion takes advantage of the following things:

  • Shared storage between cluster members (Same lun’s or volumes)
  • Shared networking between cluster members (same VLAN’s)

The major portion of any virtual machine is data at rest on the shared storage.  The only portion of a virtual machine not on the storage is the execution state and active memory.   vMotion creates a copy of these states (memory active and execution called Shadow VM) and transfers it over the network.   When both copies are almost in sync VMware stuns the operating system for microseconds to transfer the workload to another compute node.   Once transferred to the new compute node the virtual machine sends out a gratuitous arp to update the physical switches with the virtual machines new location.  Memory and execution state are transferred over the vMotion interface.

What is storage vMotion?

Locking into shared storage became a problem for a lot of larger customers.   VMware addressed this issue by providing storage vMotion.  Storage vMotion allows a guest operating system to move between similar compute without shared storage between them or between different storage on the same compute.   The only common requirement was networking and similar execution environment (cpu instructions).   The process is similar except during the final stun both active state and final disk changes are moved.   SvMotion has two types of data : active and cold.   Active data are files that are read and writable.  Cold data applies to any data that is not currently writable.   Some examples of cold data are powered off virtual machines or parents files of the currently active snapshot (only the active snapshot is considered active).

In 5.5 storage vMotion data that is cold is moved across the management network while active data uses the vMotion network.  (If your management network and vMotion network share the same subnet then the lowest vmk nic will be used for all vMotions – which is always management – Always separate your vMotion and management traffic with VLAN’s)

In 6 the cold migration data is moved across the NFC protocol link (designated as Provisioning traffic in vSphere 6).  NFC uses the management network unless you have a designated NFC link (it can be the vMotion interface).  So design consideration create a NFC designated vmkernel nic to avoid having management used.  vMotion is used for all hot data.

Storage vMotion can be offloaded to the array when the array supports VAAI and the movement is on the same array.

What is multi-nic vMotion?

Multi-nic vMotion is the practice of using multiple nics to transfer vMotion data.  Why would you need more nics?

  • Really large memory VM’s (memory and execution state have to cross the wire)
  • Large storage vMotion jobs without shared storage (that will be across the network)
  • Long distance vMotion (vMotion across larger distance than traditional datacenter)

If multi-nic vMotion is configured correctly any vMotion job will be load balanced between all available links, thus increasing the bandwidth available to transfer data.   A single machine vMotion can take advantage of the the multiple links.  This can really help the speed of vMotions.  Multi-nic vMotion does have a cost.  The cost is if you are moving a really large virtual machine you could saturate your links.    The easiest way to understand this is with a overly simple graphic.

Autodeploy

For the sake of this explanation we have two ESXi hosts each connected via two 10Gbps link to the same network switch.   Both are configured to use both links for vMotion.  We initiate a vMotion between the source and destination.  The virtual machine is very large so the movement requires both links and load balances traffic on both links.  Lets follow the movement:

  • vMotion is negotiated between both sides
  • Lets assume that my vMotion requires 7Gbps of traffic on each link for a total of 14Gbps (not really possible but numbers used for example)
  • Source starts using both links and throws 14Gbps at the destination
  • Destination has multi-nic so it can receive 14Gbps without any major issues

This plan makes a pretty big assumption that the source and destination are both able to allocated 14Gbps for the vMotion without effecting current workload.   This is a really bad assumption.   This is why VMware introduced Network I/O control (NIOC).  NIOC provides a method for controlling outbound traffic across links when under contention.   Essentially you give each traffic type a share value (0-100) and during contention all traffic types that are active get their calculated share.   For example if I allocated the following:

  • Management 20
  • vMotion 20
  • VM 60

Assume that my ESXi host is only using management and VM traffic during a time of contention on a single 10GB link I would get:

  • Total shares (Management 20 + VM 60 = 80)
  •  Allocated bandwidth per share (10 / 80 = 0.125)
    • Management Allocated 2.5Gbps (0.125*20)
    • VM 7.5Gbps Allocated (0.125*60)

This is calculated per link not system wide.    This works really well to control traffic on the source but fails to protect the destination.   NIOC has no way to control incoming traffic.  For example:

magic

Let’s assume that the destination host is very busy and only have 1Gbps per link not in use while the source has 10Gbps available per link for the vMotion.   The source initiates the vMotion and floods the destination with 14Gbps of traffic.   Now packets are getting dropped for every time of traffic on the destination ESXi host.  This creates a critical problem.   You cannot control all sources of network traffic into your host.   In order to combat this issue VMware provided limits in network traffic.  This allows you to identify types of traffic and have ESXi throttle that traffic when it becomes too much.   This overloading is not unique to multi-nic vMotion but can be complicated quickly by the load multiple nics can provide.

 

How do I setup Multi-nic vMotion?

It is very much like iSCSI connections you setup each vmkernel interface with its own ip address and bind it to a single uplink.  So if you have two uplinks you need two ip addresses, two vmkernel interfaces for vMotion each bound to a single uplink with no fail over.  If a uplink is removed that vmkernel interface for vMotion will not be used.

Should I use Multi-nic vMotion?

This is a great question and the answer is it depends.   I personally think the configuration settings to implement multi-nic vMotion is minimal but could be a major problem in larger shops.   All of the problems with multi-nic vMotion are present with standard vMotion.   You really should consider NIOC and potentially limits for any design that is not grossly over sized.   If you plan on using multi-nic vMotion I think you need NIOC and limits at least a limit on vMotion traffic.    Let me know what you think and your experience with this feature.

 

Who is your hero?

A few weeks ago my family went on vacation to Disney World.  It was a fun-filled week for my two girls.   They were most excited about meeting their hero’s: Disney Princesses and characters.   The rides were of little interest compared to the opportunity to meet their hero.   We waited for long periods of time to get 20 seconds with these hero’s.   As I waited for my children to meet their hero my mind started to ponder the hero scenario.

Who is your hero?

We all have a lot of hero’s these are people we would really like to meet because they have done something really awesome.   Perhaps they are rich, famous, good-looking or just inspiring.  As a child my hero’s were larger than life… Cartoon characters who always did the right thing… Transformers, He-Man, and GI-Joes.   They faced evil and with a single mind never wavered from the correct path.  As I got older I found that media was pushing more conflicted hero’s… telling a story that we cannot be good all the time.   This appealed to a teenager who felt the world was a little too perfect.  As an adult I found my hero’s became people in my profession.   I placed these people on a pedestal,  I wanted to be like them.   I listened to their every word and studied it. I could name some names but it would not help my discussion.   The truth is that when I started to meet some of these hero’s I found I really didn’t have anything to say to them.   I ran into this issue during VMworld this year.  I was invited to join a VMware Certifed Design Expert only meeting before the conference.   All the VMware brass and other VCDX’s were present.  It was a great day that I enjoyed… but I didn’t have some great question to ask or experience to share.   One thing impressed me about all these people… they have invested time and effort into being where they are in their career.   They have sacrificed and are awesome people, but not because of certification or technical knowledge.   They were awesome because they did all those things with balance.   They have lives outside technology. Many technology and gadgets are made, SpotSee is in this industry providing us the security with this gadget.

Who should be your hero?

I believe the value of a person is their potential to become like god.   I have had the opportunity to meet people who do the same thing every day for very little money, but love their children and raise great families.   These people should be our hero’s.   My hero is the person who invest’s over a very long period in his family, friends and loved ones.  Not one who sacrifices everything to win.  I am convinced that hard things are hard because they are of value… and nothing of value comes easy.   We need to stop idealizing the rock star and start to value the person who makes sure we have clean backup’s every day.   Every person on this earth has the same potential and value and what job you have will not change that potential or value.   What we become is far more important than anything.   As this Christmas season is upon us perhaps we can refocus our sights off hero’s that are flashy and big and instead find the real hero’s in our life.

Good, better, best

Our world is constantly screaming at us for our attention, social media, commercials, notifiers, phones, etc..   When I was a missionary for two years I lived without TV, computers, cell phones etc..  I was concerned that this two years would make me invalid in the computer world loosing all my knowledge.   This was not true my efforts to learn after this experience were better and my career has benefited from this sacrifice.   More time does not mean more productive time.  We need to remove all bad things and instead focus on the good, better and best.   We then need to remove everything but the best.  For me this is work to support the family (only reason to work), Family events and time and activities that make me a better person like service to others.

NSX Controllers all show as disconnected

I was recently upgrading my home lab to the newest version of NSX.   Since it’s my home lab I didn’t backup or snapshot before I did the upgrade.  Don’t try this at work.  The upgrade of the NSX manager went fine but the controllers were all disconnected.   I logged into all three of the NSX controllers (running 6.0) and found them all to be in this state:

status

As you can see they are all showing waiting to join majority with no cluster id.  I attempted to force the first machine to join it’s self using

join control-cluster 192.168.10.29 force

 

This command rips out previous cluster configuration and reconfigures.   That node came back as normal and became the master.   I then tried to force the other nodes.  Once they finished everyone was disconnected again.   I then removed two controllers and tried to force the single into being the master.   This seemed to work but when I tried to add a controller it failed again.   This left me with a few choices:

  • Wipe out NSX and start from scratch
  • Try something else

 

I went for something else with a wipe out fall back.   I figured since the logical switch know their own config without the controllers they would be ok as long as nothing changed.  They were set to communicate updates via unicast mode.   I switched them to multicast (yes it works in my environment) and then ripped out my last controller (you can switch it on the transport zone instead of each switch).   I then deployed a new set of controllers one at a time.   I configured the transport zone back into unicast and everything seemed ok. I also redeployed the edge gateways to complete the upgrade (I don’t think this was essential to the process).  I hope it helps you if you failed to back up before an upgrade gone bad.

Can we judge a company by the quality of their documentation?

 

 

I had been thinking about this for a while.   Before I look into joining a software company I ask to see their documentation for products.  I have learned a lot about the future of the company and my interest from their documentation alone.  Here are some thoughts:

 But my product is so simple it does not require documentation

Yep I have heard that one before… allow me to translate that into my language… my product is so simple that it really should not be something you buy… nothing in IT is simple… you can write software to automate it and make it appear simple but it’s not.  Write the documentation… explain your technology. Open your doors so we can geek out with you.

We try to reduce the nerd knobs to keep it simple

Translation = we don’t want you messing with our product because it will break and we will not support it.  Fair point… if you know it breaks your product tell customers not to use it that way and why…  Does it expose a weakness? yes it does… is that a problem?  Depends on the weakness.  Do you have something to hide?  Sounds like it.  Do we want simple IT… sure but everything integrated.  I am a spider making a web of products connected.  No one buys your off the self total solution.

We cannot allow our competitors to steal our IP so we cannot explain our tech

Get a lawyer everyone else has one…   You don’t know they can reverse engineer your secret sauce in 10 minutes with the right people.  Heck save reverse engineering just buy your lead developer.. Get a lawyer and protect your tech like everyone else.

We need you to sign a NDA to see our documentation

Yes that happens when the company lawyers up… see previous post and sign the document.

We don’t want our customers discussing this technology

Educate me and then don’t allow me to become your advocate… not smart.  I am getting sick of tech gag orders.  They don’t help anyone… you turn your potential supporter into a enemy for life… neat idea.

Can we judge a company by the quality of their documentation?

Yes you can.  Too many companies take the view of you don’t need to know the secret sauce that makes our xxxx work.  This model is created from an attempt to protect intellectual property I hope.  The reality is it makes me mistrust your product.  Every product has limits why try to hide them.   Publish your products limits and strengths, explain you technology difference to help people make choices based upon what fits their needs.   Stop hiding how it works.  Stop making me go though a pay wall to get anything but white papers.   If your product is great you have nothing to hide.

Advise to Companies

Stop giving me feature webinars start educating me on your tech.  Product solid up to date technical documentation on your products.  Create a living documentation source like a knowledge bank.   Don’t put it all behind a customer only pay wall.   If your support organization cannot provide a customer a solution by pointing to your documentation or knowledge bank add it to the KB.   Create a community of customers via forums and social media and support them with rewards and assistance.   I have really been impressed over the last two years with two companies documentation even thou they are at odds with each other at times:

  • Nutanix – insane level of documentation on everything and awesome training program
  • VMware – Huge amount of products, very well-developed community, lots of documentation on everything including teaching how to troubleshoot, great community forums

In both cases the marketing runs most of the webinars and presentations I see.  They are all focused on the value proposition instead of the awesome tech.   It’s possible as I rant that I am the only person who really want to understand the tech.   Let me know if you agree or disagree.

 

Updating vCenter to 5.5 U2 or later reduces the java heap size

In the past VMware has had large customers adjust the java heap size for Inventory service, profile-driven storage, and web management.   Most commonly it’s the adjustment for Inventory to attempt to improve speed.   The patch for U2 and above replaces this setting with default.   So you will have to manually after each patch change this setting until you are on vSphere 6 which dynamically adjusts this setting.   You can read the VMware article here:

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2114669

Free Central Ohio Lunch and Learn session on HA and DRS April 1st

I have started up the Lunch and Learn sessions again.  We have sign up’s and a schedule.  I ask that you sign up so I can notify you if a session has been canceled.  Your information will not be shared with anyone at all.  You can sign up here:

goo.gl/JC6HlK

I will be conducting the discussion around HA and DRS on April 1st.

All sessions will be at:

Columbus Public Library Driving Park location

Small Meeting Room 2

1422 E Livingston Ave
Columbus, OH 43205
I am still looking for people willing to present on the other topics so please contact me if interested.

Design Scenario: Gigabit networking with 10GB for storage SMB setup

Yesterday I got a comment on a older blog article asking for some help.

Caution

While it would be a bad idea personally and professionally for me to give specific advise without a design engagement I thought I might provide some thoughts about the scenario here.  This will allow me to justify some design choices I might make in the situation.   In no way should this be taken as law.  In reality everyone situation is different and little requirements can really change the design.   Please do not blindly create this infrastructure these are only guidelines.  It does not take into account specific vendor best practices (because I am too lazy to look them up).

 

Information provided:

We are a SMB that’s starting to cross over to the world of virtualization. I could really use your help on our network design. This is the current equipment we have:

 

2 (ESXi Hosts) Dell R630 with 512GB Ram, 2×4 1GB port NICS each (8 Total each host) and 2 x dual port 10GB NIC(4 Total) on each host

 

Equal Logic PS6210XS SAN with Dual 10GB Controllers

 

2 Dell N4032F 10GbE switch

 

We are planning to use the 10GbE for the SAN(isolated) and use the remaining 8 x 1GB port for Management/vMotion and our Server Network.

 

How would you go about designing the network for our environment?

 

Requirements

  • Must use the current hardware

 

Constraints

  • The 10GB network adapters are for isolated SAN only

 

Assumptions

  • Since this customer is a SMB i doubt they will buy Enterprise plus licenses so we will design around standard switches
  • The virtual machine / management network ports are distributed on two different upsteam switches
  • Your storage solution supports some type of multipathing with two switches

 

The question was related to networking so here we go:

Virtual machine and vSphere networking

It’s hard to make a determination here without understanding the number of virtual machines and network bandwidth needs.   It is really tempting to use two of the 10Gb nic’s (total of 4)  for the vSphere and virtual machine networking.  Due to the constraints it we will avoid that temptation.

Management Network

Management is easy.  vCenter and console access I assume.  If this is true I would assign two network adapters to Management.  One active the other standby.   You really want two in order to assure it’s up and for host isolation.

vMotion network

Our hosts are large (512GB of RAM) which would lead me to believe we are going to have  a lot of virtual machines on each host.   With only two hosts I am very concerned about taking down one host to patch and how long it will take to move virtual machines between host with one single 1GB network adapter.  You might want to consider multi-nic vMotion, which introduces complexity in the vSphere design and managability.    You should weigh how often you are going to schedule downtime on a host against the complexity.  My guess is that you will not patch all that often on a SMB.   So I would assign two network adapters to vMotion.  One should be active the other standby, You can use the same network adapter as management just use opposite adapters.  (Nic1 active for management nic2 standby for management,  nic1 standby for vMotion nic2 active for vmotion)

Virtual machine networks

At this point we have 6 adapters left for virtual machines.  Assign them all to virtual machines.   What really matters is the load balancing we use for these adapters.  Let’s be clear you cannot provide more than 1GB of total bandwidth to an individual virtual machine with this configuration without using port channel or LACP configurations.   I assume you don’t want to mess with port channel or virtual port channel across two switches.  So we need to look at the remaining options for balancing and using these nics:

Options (taken from here.) with IP hash removed due to lack of port channel, Route based on physical nic load removed due to lack of enterprise plus

  • Route based on the originating port ID: Choose an uplink based on the virtual port where the traffic entered the virtual switch.
  • Route based on a source MAC hash: Choose an uplink based on a hash of the source Ethernet.
  • Use explicit failover order: Always use the highest order uplink from the list of Active adapters which passes failover detection criteria.

There is a holy war between factions of VMware on which one to choose.  None will balance traffic perfectly.  Personally I would go with the default load balancing method of Route based on originating port ID.

How many VLANS

If possible please use a different VLAN for at least the following: Management, vMotion and virtual machines.  Multiple virtual machine vlans are wonderful.   It is critical from a security perspective that vMotion not be shared.

How many virtual switches

Now to the question of virtual switches.   Remember no enterprise plus so we are using standard switches.  These have to have the same configuration including case sensitivity on each host (good thing we only have 2 hosts).   You might want to consider configuring them via a script (I have a older blog post on that somewhere.)   You have two sets of network adapters vMotion/Management and virtual machine.   I would connect them all to the same virtual switch just for ease of management.   So your setup would look like this assuming your 1GB nics come into ESXi as nic0 – nic7

vSwitch0

Port Group or PG

PG-vMotion

Active nic1

Standby nic0

PG-Management  

Active nic0

Standby nic1

Port groups for virtual machines (one port group per VLAN)

Active nic2-nic7

Storage networking

This choice is determined by the vendor best practices.  It’s been a while on Equal Logic and you should use Dell’s documentation 100% before doing anything.  Let me say that again consult Dell’s documentation before doing this and make sure it aligns.   Any EQLogic master is welcome to add via comments.   I assume you will be using software iSCSI to do these connections.   You have 4 total 10GB nic’s with two switches.   I would create another virtual standard switch for these connections (does it have to be another switch?  no but I would for ease of management)  So it’s pretty cut and dry two dual port nics like this:

Card 1 Port 1  – we will call it nic8

Card 1 Port 2 – we will call it nic9

Card 2 Port 1 – we will call it nic10

Card 2 Port 2 – we will call it nic11

We have the following switches

SwitchA

SwitchB

I would do the following physical connections:

SwitchA -nic8,nic10

SwitchB – nic9,nic11

 

Normally software iscsi has you setup a port group per uplink all on the same vlan or native if your switches are only doing iSCSI. So I would create the following port groups

PG-iSCSI-Nic8-SwitchA

PG-iSCSI-Nic9-SwitchB

PG-iSCSI-Nic10-SwitchA

PG-iSCSI-Nic11-SwitchB

 

Assign the nics to be active only on their designated port groups (nic8 active on PG-iSCSI-Nic8-SwitchA and unused on all others)  Then setup iSCSI storage.   Your multipathing on the port groups should be setup as explicit failover.

 

Last Thoughts

With limited information it’s hard to comment on additional options.  I would carefully consider and implement percentage based admission control (think 50% or more reserved on each host).  If possible monitor your network bandwidth usage to make sure your virtual machine are getting the required traffic.   I hope this rant is useful to someone.  Leave me your thoughts or questions.