Double your storage capacity without buying a new storage shelf

I spent a good portion of my career moving storage from one array to another.   The driver is normally something like this:

  • Cost of older array (life cycle time)
  • New capacity, speed or feature

So off we went on another interruption migration of lun’s and data..  At one point I was sold on physical storage virtualization appliances.   They stood in front of the array and allowed me to move data between arrays without interruption to the WWID or application.   I loved them what a great solution.   Then storage vMotion became available and 95% of the workloads were running in VMware.   I no longer needed the storage virtualization appliance and my life became very VMware focused.


New Storage paradigm

With the advent of all flash arrays and HCI (all flash or mixed) performance(speed) has almost gone away as a reason for moving data off arrays.  Most arrays offer the same features; replication capability aside.   So now we are migrating to new arrays / storage shelf’s because of capacity or life cycle issues.   Storage arrays and their storage shelves have a real challenge with linear growth.   They expect you to make a bet on the next three years capacity.   HCI allows a much better linear growth model for storage.

My HCI Gripe

My greatest grip with HCI solutions is that everyone needs more storage that does not always mean you need more compute.   Vendors that provide hardware locked (engineered) platforms suffer from this challenge.   The small box provides 10TB, Medium 20TB and large 40TB.   Which do I buy if I need 30TB?   I am once again stuck in the making a bet problem from arrays (at least it’s a smaller bet).   The software based platforms including VSAN (full disclosure – At time of writing I work for VMware and have run VSAN in my home for three years) have the advantage of offering better mixed sizing and linear growth.

What about massive growth?

What happens when you need to double your storage with HCI and your don’t have spare drive bays available?   Do you buy a new set of compute and migrate to it?  That’s just a replacement of the storage array model…  Recently at some meetings a friend from the Storage and availability group let me know the VSAN solution to this problem.   Quite simply replace the drives in your compute with larger drives in a rolling fashion.   You should create uniform clusters but it’s totally possible to replace all current drives with new double capacity drives.   Double the size of your storage for only the cost of the drives.   (doubling the size of cache is a more complex operation)  Once the new capacity is available and out of maintenance mode data is migrated by VSAN on to the new disks.

What is the process?

It’s documented in chapter 11 of the VSAN administration guide :

A high level overview of the steps (please use the official documentation)

  1. Maintenance mode the host
  2. Remove the disk from the disk group
  3. Replace the disk you removed with the new capacity drive
  4. Rescan for drives
  5. Add disk back into the disk group


vSphere 6.5 features that are exciting to me

Well yesterday VMware announced vSphere 6.5 and VSAN 6.5  both are huge leaps forward in technology.   They address some major challenges my customers face and I wanted to share a few features that I think are awesome:

vSphere 6.5

  • High Availability in vCenter Appliance – if you wanted a reason to switch to the appliance this has to be it… for years I have asked for high availability for vCenter.   Now we have it.   I look forward to testing and blogging about failure scenarios with this new version.  This has to be my #1 ask for the platform for the last three years!  – We are not talking about VMware HA we are talking about active / standby appliances.
  • VM EncryptionNotice this is a feature of vSphere not VSAN – this is huge the hypervisor can encrypt virtual machines at rest and while being vMotioned.   This is a huge enabler for public cloud allowing you to ensure your data is secure with your encryption keys.   This is going to make a lot of compliance folks happen and enable some serious hybrid cloud.
  • Integrated Containers – Docker compatible interface for containers in vSphere allowing you to spawn stateless containers while enforcing security, compliance and monitoring using vSphere tools (NSX etc..) – this allows you to run traditional and next generation applications side by side.

VSAN 6.5

  • iSCSI support – VSAN will be able to be a iSCSI target for physical workloads – a.k.a SQL failover clustering and Oracle RAC.   This is huge VSAN can now be a iSCSI server that has easy policy based management and scaleable performance.

There are a lot more annoucements but these features are just awesome.    You can read more about vSphere 6.5 here and VSAN 6.5 here.

Pernix Data 30 days later

I have been interested in Pernix data since its initial release the idea of using flash to accelerate storage is not new to me.   Anyone who reads my blog based rants have found that I am a huge supporter of larger cache on storage arrays.   I have always found having more cache will make up for any speed issues on drives.   My thoughts on this are simple if 90% of my storage writes and reads come from cache I run at near the speed of the cache.   Spend your money on larger cache instead of faster spinning disks and performance is improved.   Almost every storage array vendor has been using ssd to speed up arrays for the last four years.   They all suffer from the same problem, they treat all I/O’s equal without any knowledge of workload.

Netfliks problem

The best way to explain this problem is using netflicks.   They have implemented a system where you rate a show based upon stars.   Then it compares your ratings against everyone else ratings and locates users with similar ratings to yours.   Once located it uses these user’s recommendations to locate new shows for you.   This is great… assuming you have 100% the same taste in shows as that user.   This has been advanced a lot in the last five years and is much more accurate and complex algorithm.  It’s pretty accurate for me except for one problem… My wife and kids share the same netfliks account and my children love to rate everything.    This produces the world worst set of recommendations… I get little girl TV shows mixed with Downton Abbey and Sci-Fi movies.   It’s a mess… Netflix literally has no idea how to recommend show to me.    This problem exists for storage arrays with cache.   Choosing which data should be cached for reads is hard, because there are lots of different workloads competing for cache.    I don’t want to devalue the algorithms used by storage vendors, they much like Netflix are a work of evolving art.  With everyone being profiled into one mass everyone’s performance suffers.   Netflix understood this problem and created user profiles to solve the problem.  They added simple versions of localized intelligence to the process.   These pockets of intelligent ratings are used to provide recommendations for the local needs.

Pernix is the intelligent user profile

Pernix is just like Netflix user profiles, it’s installed locally on each ESXi server.  It caches for that ESXi host (and replicates writes for others).   It can be configured to cache everything on the host, datastore or virtual machine.   It provides the following features:

  • The only local SSD write cache that I know of outside hyper-converged solutions
  • Local SSD read cache
  • Great management interface
  • Metrics on usage
  • Replication of writes to multiple SSD’s for data protection


Pernix is built for vSphere

Pernix installs as a VIB into the kernel and does not require a reboot.   It has a Web client interface and C# client interface.   It does require a Windows server and SQL server for reporting.   It is quick and easy to install and operate.  The cache can be SSD’s or memory for pure speed.    Pernix works only in vSphere so it’s 100% customized for vSphere.


My local Pernix SE’s were kind enough to provide me a download and license for Pernix Data.   My home lab has been documented on this blog before but the current solution is 3 hp nodes with 32Gb of RAM each as shown below:


I added a 120GB san disk SSD to each node for this test.    My storage ‘array’ is an older Synology nas with two mirrored 2TB 7,200 RPM disks via iSCSI and NFS.  My rough math says I should be getting about 80 IOPS total from this solution which really sucks, oddly it’s always worked for me.  I didn’t have any desire to create artificial workloads for my tests, I just wanted to see how it accelerated my every day workload.   All of these tests were done in vSphere 5.5 U2.

Pernix Look and feel

Pernix provides a simple and powerful user interface.  I really like the experience even in the web client.   They use pictures to quickly show you where problems exist.



As you can see lots of data is presented in a great graphical interface.   They also provide performance charts on every resource using Pernix.  Without reading any manual other than Pernix quick start guide I was able to install their solution in 15 minutes and have it caching my whole environment, it was awesome.

How do we determine storage performance?

This is a constant question, every vendor has a different metric they want to use to determine why their solution is better.   If it’s a fiber channel array they want to talk about latency then IOPS.   If it’s all flash NAS its IOPS then latency.    So we will use these two metrics for the tests:

  • Latency – time it takes to commit a write or get a read
  • IOPS – Input / Outputs per second

I wanted to avoid using Pernix’s awesome graphs for my tests so I chose to use vRealize Operations to provide all recorded metrics.



The VM that gives my environment the biggest storage workout is vRealize log insight.   It has been known to have recorded IOP’s of 300 in the environment.    Generating IOP’s is easy just click around the interface prebuild dashboards with the time slider set for all time.   Read IOP’s fly up like crazy.   So my average information before Pernix is as follows:

  • Max IOPS 350
  • Max Latency: 19 ms
  • Average Latency: 4 ms


Now with Pernix

I setup Pernix to cache all virtual machines in my datacenter.  With pernix I clicked around on multiple days and performed lots of random searches.  I loaded down a SQL server with lots of garbage inserts to create writes.   Nothing perfectly scientific with control groups I just wanted to kick the tires.   After a month with pernix I got the following metrics:

  • Max IOPS: 4,000
  • Max Latency: 14 ms
  • Average Latency: 1.2 ms


So the results clearly denote a massive increase in IOP’s.  Some may say sure you are using SSD’s for the first time, which is true.   The increase is not just SSD’s speed because the latency is greatly improved as well which is representative of the local cache.   Imagine using enterprise worthy SSD’s with much larger capacity.  Simple answer will Pernix improve storage performance… the answer is it depends but there is a very good chance.

Use Cases

With my home lab hat removed I need to talk about some enterprise use cases:

  • Any environment or workload where you need to reduce latency
  • Any environment where workload needs every more IOP’s than most of the solutions

Both of these use cases should be implemented where less latency or IOP’s is a direct cost.   Pernix can be used as a general speed enhancer on some slower environments or to improve legacy arrays.   It does push toward a scale up approach to clustering.   Larger cluster nodes with larger SSD’s will cost less than lots of nodes.  Pernix is licensed per node.   Putting in larger nodes does have a big impact on failure domains that should be taken to account.

My only Gripe

My only gripe with Pernix is the cost.  Compared to large storage array’s it is really cheap.  The problem is budgets… I need more storage performance which means the storage team buys more storage arrays not the compute team.  Getting that budget transferred is hard because storage budgets are thin already.     This will change hyper-converged is becoming very accepted and Pernix will really shine in this world.   Pernix just released the read cache for free making it a very tempting product.   They are a smart company with a great product. They are on the right path bringing storage performance as close to the workload as possible with an added element of intelligence.

Change in VMware 5.5 U2 ATS can cause storage outages!

Update:  VMware has posted the following KB and there is a really good article by Comac Hogan on the matter.   I have also posted a PowerCLI script to resolve the issue.


Yesterday I was alerted to the fact that there was a change in the VMware 5.5 U2 heartbeat method.  In U2 and vSphere 6 it now uses ATS on VAAI enabled arrays to do heartbeats.   Some arrays are experiencing outages due to this change.   It’s not clear to me what array are exactly effected other than IBM has posted an article here.   It seems to cause one of the following symptoms : Host disconnects from vCenter or storage disconnects from host.  As you can see one of these (storage) is a critical problem creating an all paths down situation potentially.

The fix suggested by IBM disabled the ATS lock method and returns it to pre U2 methods.   It’s my understanding that this is an advanced setting that can be applied without a reboot.  I have also been told that if you create this advanced setting it will be applied via host profile or powercli.

It is very early in the process in all accounts you should open a VMware ticket to get their advice on how to deal with this issue.   They are working on the problem and should produce a KB when possible with more information.   I personally would not apply this setting unless you are experiencing the issue as identified by VMware.   I wish I had more information but it has not happened in my environment.


Post comments if you are experiencing this issue with more information.  I will update the article once the KB is posted.

Storage vMotion and Change block Tracking are now friends

A lot of readers may be aware that storage vMotion is a awesome feature that I love.  It had one nasty side effect it reset change block tracking (CBT).  This is a huge problem for any backup product that uses CBT (Veeam, anything else that does an image level backup).   It means that after a storage vMotion you now have to do a full backup.  It’s painful and wasteful.   It means that most enterprise environments refuse to use Automation storage DRS moves (one of my favorite features) because of it’s impact on backups.    Well the pain is now over… and has been for a while I just missed it :).   If you look at the release notes for ESXi 5.5 U2 you will find the following note:


  • Changed Block Tracking is reset by storage vMotion
    Performing storage vMotion operation on vSphere 5.x resets Change Block Tracking(CBT).
    For more information, see KB 2048201

    This issue is resolved in this release.


I wish I told us more about the change or why it is no longer reset but I guess I’ll accept it as is.   The previous work around was don’t use storage vMotion or do a full backup after… which is not a work around but an effect.  Either way enjoy moving the world again.

Enable Stateless Cache on Auto Deploy

Auto Deploy Really?

Yes a big autodeploy post is going to be following up soon.   I can really seen the benefit of auto deploy in larger environments.  I’ll be posting the architectural recommendations and failure scenarios soon.   Today I am posting about stateless cache and USB.

What is stateless cache and why do I care?

Stateless cache allows your auto deployed ESXi host (TFTP image running in memory) to be installed on a local hard drive.  This enables you to boot the last running configuration without the presence of the TFTP server.  It’s a really good protection method.   It is enabled by editing the host profile and in 5.5 it can be enabled using the fat client:

  1. Select the profile and right click on it
  2. Select Edit
  3. Expand System Image Cache Configuration
  4. Click on System Image Cache Profile Settings
  5. Select the drop down and choose the stateless caching mode you want.


This all sounds great but we had a heck of a time trying to get it to stateless cache to SD cards on our UCS gear.   A coworker discovered that SDcards are seen as USB devices.  Once we select “Enable stateless caching to a USB disk on the host”  everything worked.

Design Constraints

Using stateless caching will protect you against a failure of TFTP and even vCenter but DHCP and DNS are both still required for the following reasons:

  • DHCP to get IP address information
  • DNS to get hostname of ESXi host


Stateless does not remove all dependencies but it does allow quick provisioning.

Warning to all readers using Snapshot CBT based backups

Over the last few days I have become aware of a pretty nasty bug with VMware Snapshot API based backups (Any Image based solutions that is not array based and use change block tracking I will not give names).  This bug has been around for a while but has recently been fixed.   The problem happens when you expand a currently presented drive by 128GB’s or larger.   This expansion causes a bug in the CBT that will make all CBT based backups junk.  You will not be able to restore them.   It’s a major pain in the butt.   What is worse you cannot detect this issue until your restore.  So here is how you create the bug:

  • Expand a currently presented drive 128GB’s or more
  • Do a CBT backup
  • Try to restore that backup or any following backup

You can work around this issue with the following process:

  • Expand a currently presented drive 128GB’s or more
  • Disable CBT
  • Re-enable CBT
  • Do a new full backup

This bug has been around since the 4.1 days and I have never run into it.  I believe this is because I have mostly worked in Linux heavy shops.  We always added a new drive and use logical volume management to expand the mount points thus avoiding this issue.

Please give me some good news

Well today I can this problem is fixed in 5.5 U4 so patch away.  It does not fix machines that are incorrectly backing up just avoids future occurrences.  You can read more about it here.