Many months ago I posted some design tips on the VMware forums (I am Gortee there if you are wondering). Today a user updated the thread with a new scenario looking for some advise. While it would be a bad idea personally and professionally for me to give specific advise without a design engagement I thought I might provide some thoughts about the scenario here. This will allow me to justify some design choices I might make in the situation. In no way should this be taken as law. In reality everyone situation is different and little requirements can really change the design. The original post is here.
The scenario provided was the following:
3 ESXI hosts (2xDell R620,1xDell R720) each with 3×4 port NICS (12 ports total), 64GB RAM. (Wish I would have put more on them ;-))
1 Dell MD3200i iSCSI disk array with 12 x 450GB SAS 15K Drives (11+1 Spare) w/2 4 port GB Ethernet Ports
2 x Dell 5424 switches dedicated for traffic between the MD3200i and the 3 Hosts
Each host is connected to the iSCSI network though 4 dedicated NIC Ports across two different cards
Each Host has 1 dedicated VMotion Nic Port connected to its own VLAN connected to a stacked N3048 Dell Layer 3 switch
Each Host will have 2 dedicated (active\standby) Nic ports (2 different NIC Cards) for management
Each Hosts will have a dedicated NIC for backup traffic (Has its own Layer 3 dedicated network/switch)
Each host will use the remaining 4 Nic Ports (two different NIC cards) for the production/VM traffic)
would you be so kind to give me some recommendations based on our environment?
Requirements
- Support 150 virtual machines
- Do not interrupt systems during the design changes
Constraints
- Cannot buy new hardware
- Not all traffic is vlan segmented
- Lots of 1GB ports per server
Assumptions
- Standard Switches only (Assumed by me)
- Software iSCSI is in use (Assumed again by me)
- Not using Enterprise plus licenses
Storage
Dell MD3200i iSCSI disk array with 12 x 450GB SAS 15K Drives (11+1 Spare) w/2 4 port GB Ethernet Ports
2 x Dell 5424 switches dedicated for traffic between the MD3200i and the 3 Hosts
Each host is connected to the iSCSI network though 4 dedicated NIC Ports across two different cards
I personally have never used this array model, the vendor should be included on the design to make sure none of my suggestions here are not valid with this storage system. Looking at the VMware HCL we learn the following:
- Only supported on ESXi 4.1 U1 through 5.5 (no 5.5 U1 yet so don’t update)
- You should be using the VMW_PSP_RR (Round Robin) for path fail over
- The array supports the following VAAI natives Block Zero,Full Copy,HW Assisted Locking
The following suggestions should apply to physical cabling:
Looking at the diagram I made the following design choices:
- From my limited understanding the array the cabling follows the best practice guide I could find.
- Connection from the ESXi hosts to switches are done to create as much redundancy as possible including all available cards. It is critical that the storage be as redundant as possible.
- Each uplink (physical nic) should be configured to connect to an individual vmkernel port group. Each port group should be configured with only one uplink.
- Physical switches and port groups should be configured to use native port assuming these switches don’t so anything other than provide storage traffic between these four devices (three ESXi and one array) if the array and switch is providing storage to more things you should follow your vendor’s best practices for segmenting traffic.
- Port binding for iSCSI should be configured as per VMware document and vendor documents
New design considerations from storage:
- 4 1GB’s will be used to represent max traffic the system will provide
- The array does not support 5.5 U1 yet so don’t upgrade
- We have some VAAI natives to help speed up processes and avoid SCSI locks
- Software iSCSI requires that forged transmissions be allowed on the switch
Advise to speed up iSCSI storage
- Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.
- You might want to consider Storage DRS on your array to automatically balance load and IO metrics (requires enterprise plus license but saves so much time) – Also has an impact on CBT backups making them do a full backup.
- Hardware iSCSI adapters might also be worth the time… thou they have little real benefit in the 5.x generation of ESXi
Networking
We will assume that we now have 8 total 1GB ports available on each host. We have a current network architecture that looks like this (avoided the question of how many virtual switches):
I may have made mistakes from my reading a few items pop out to me:
- vMotion does not have any redundancy which means if that card fails we will have to power off VM’s to move them to another host.
- Backup also does not have redundancy which is less of an issue than the vMotion network
- All traffic does not have redundant switches creating single points of failure
A few assumptions have to be made:
- No single virtual machine will require more than 1Gb of traffic at any time (otherwise we have to be looking into LACP or etherchannel solutions.
- Management traffic, vMotion and virtual machine traffic can live on the same switches as long as they are segmented with VLAN’s
Recommended design:
- Combine the management switch and VM traffic switch into dual function switches to provide both types of traffic.
- This uses vlan tags to include vMotion and management traffic on the same two uplinks providing card redundancy (configured active / passive) Could also be configured with multi-nic vMotion but I would avoid due to complexity around management network starvation in your situation.
- Backup continues to have it’s own two adapters to avoid contention
This does require some careful planning and may not be the best possible use of links. I am not sure you need 6 links for your VM traffic but it cannot hurt.
Final Thoughts:
Is any design perfect? Nope lots of room for error and unknowns. Look at the design and let me know what I missed. Tell me how you would have done it differently… share so we can both learn. Either way I hope it helps.
Thanks for this!
I have a few questions and comments, I have a similar config 🙂 In my environment I have 4 nic ports connecting to the ESXi Hosts, 2 from Nic 0, and 2 from Nic 1 (Ports 1 and 2 from each) but I am not sure about how the ports on connected from the MD3200i (Active / Passive). Second I am already on ESXi 5.5 U1 But the latency issues I have been having were there long before the upgrade from 5.1.
Under New design considerations from storage:
“We have some VAAI natives to help speed up processes and avoid SCSI locks” Not sure what/how to implement these?
Already running 5.5 U1 now what?
Software iSCSI, can’t seem to find anything about Forged transmissions on the Dell 5424 switches.
Advise to speed up iSCSI storage:
“Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.” Can you elaborate???
“You might want to consider Storage DRS on your array to automatically balance load and IO metrics (requires enterprise plus license but saves so much time) – Also has an impact on CBT backups making them do a full backup.” I will consider that, although a call to support stated that DRS probably would not help much. CBT backups? Is that with or without DRS?
These are the points I am most concerned about so far, thanks again for your help and expertise!
Carl,
Thanks for reading.
“We have some VAAI natives to help speed up processes and avoid SCSI locks” Not sure what/how to implement these? – Should be enabled out of the box no work required by you. (click on the datastore to check and look for hardware acceleration: Supported
Already running 5.5 U1 now what? – Your on the latest version that is good.
Software iSCSI, can’t seem to find anything about Forged transmissions on the Dell 5424 switches. – This is a setting on your virtual switches. It should be enabled already by if you enabled software iSCSI in vmware… just something to keep in mind when doing work or moving virtual switches.
“Bind your bottle neck – is it switch speeds, array processors, ESXi software iSCSI and solve it.” Can you elaborate???
->Spelling error find your bottleneck. Basically you reported having some storage issues. Look at your storage ports on the switch and see if they are at 1GB all the time. Look at the storage processors on your array for max CPU. Look at esxtop with the storage commands and see if you see where the problem lies. You have to identify where the problem exists before you can resolve it. My guess it’s the storage processors on your array but you will not know without reviewing the source.
I will consider that, although a call to support stated that DRS probably would not help much.
->Support I assume reviewed logs and have more information than me about the situation. Storage DRS will only help if you have multiple luns to move between. I really doubt the cost is worth it. They money would be better spent on more storage.
One thing to consider is this is just general advise always review and consult your vendors I cannot provide advise to a environment without a chance to review it completely.
Storage DRS prob will not be an option for us as the licensing upgrade from what we currently have (originally Essentials Plus) having 6 CPU’s is line $34K!~!!!!!!!!
Agreed it does not make sense. Money would be better spent on more storage.
More comments, sorry. In our case the backup lan is using one NIC, but also has to have management Kernel as required by the backups system, it uses Management to create snapshots for backups. Do you see that being an issue?
The only problem would be if your backups use 100% of the traffic on the link killing management traffic. You cannot avoid this with your backup product so it’s ok with me.
Also as a version identification it looks like we are a couple versions past U1, we are on ESXi 5.5 Patch 2 2014-07-01 1892794
Thank you so very much, the information you provided helps us out a lot! It seems that based on the esxtop review we have a few VM’s that are causing a lot of writes to the data stores and the fact that the MD3200i MAX throughput is 4,000MBs is also a problem, even with 8 1GB connections we seem to bump the top of that 4GB threshold, and according to the Dell support agent we spoke to it is physically impossible for it to achieve the full 4GB, So now I need to review new storage. Agin thank you for your help, this is very useful information!
I am glad you were able to identify the source of the issue… Storage will kill a esxi environment faster than anything else. Remember you don’t have to throw away your current array it can still run some machines while another array provides additional performance.
Hi Joseph,
I know this is an old post but I found your insights to vmware designs very helpful. We are a SMB that’s starting to cross over to the world of Virtualization. I could really use your help on our network design. This is the current equipment we have:
2 (ESXi Hosts) Dell R630 with 512GB Ram, 2×4 1GB port NICS each (8 Total each host) and 2 x dual port 10GB NIC(4 Total) on each host
Equal Logic PS6210XS SAN with Dual 10GB Controllers
2 Dell N4032F 10GbE switch
We are planning to use the 10GbE for the SAN(isolated) and use the remaining 8 x 1GB port for Management/vMotion and our Server Network.
How would you go about designing the network for our environment?
Any help would be great
Thank You
Andrew Lee
Evening,
Thanks for reading. I took a stab at the design. You can read about it here: http://blog.jgriffiths.org/?p=1019.
Thanks,
Joseph
Hello,
I have a similar scenario.
My question is: Can/Should I do a Stack between both san-5424a and san-5424b?
Is there any benefit on doing it?
Would the system still work normally?
Stacking between links just adds complexity to the solution vSphere would be redundant between them. Then again most customers just stack them anyways 🙂