As a VMUG leader and a double VCDX I have seen one technology trend only increase over the years. It’s the number of storage vendors! Last year at our VMUG UserCon every sponsor looking for a presentation slot was a storage vendor. We had to choose between storage vendors and other storage vendors I would have killed for another type of vendor. In past years we had presentations from Backup vendors, management tools, monitoring tools and IT service companies. Now it’s all storage companies. As a double VCDX I get contacted by start-up companies looking to sell their products to VMware customers. Some are well known company’s others are still in stealth but they all have the same request… how do we get VMware guys to buy our awesome technology. Almost all of these companies are using Super Micro white box solution with some secret sauce. The sauce is what makes them different, some are web-scale while others are all flash or awesome dedupe ratios. All attempting to address some segment of storage problems. It really begs to question is there a storage problem?
What does storage provide?
Storage essentially provides two things that virtualization professionals care about:
- Capacity (Space to store information)
- Performance (divided into IOPS and latency)
- IOPS – input/output per second number of commands you can shovel into the system
- Latency – how long it takes to shovel each IOP end to end
There are subsections of software that each vendor provides in order to improve these metrics for example dedupe for capacity or Hot blocking for performance. Essentially this is the role of storage systems to provide these functions.
How has virtualization made it worse?
Virtualization has made management of these metrics a challenge. In traditional storage a single entity controls a LUN or mount. It runs an application that has certain predictable patterns for usage of the lun. For example, a web server does a lot of reads and a few writes. We can identify and classify this usage pattern and thus “right size” the lun to meet these needs. This right sizing can take the form of both capacity and performance metrics. Virtualization created a new pattern lots of guest servers with different applications sharing the same lun. This makes the usage metrics pretty wild. The storage system has not idea what the virtual machines are doing beyond a bulk understanding of reads and writes. This seems like a problem but in reality the storage system just see’s reads and write and does not care, unless capacity or performance for that lun are exhausted. This issue might drive the acquisition of more performance storage in order to meet the needs of our new “super luns” but in most cases it just takes advantage of unused capacity on a storage array.
What does desktop virtualization have to do with storage?
Desktop virtualization taught us a very important lesson about storage. During boot operating systems do a lot of IOPS. Operating systems are 90% idle except during a boot. During boot lots of reads and some writes happen putting pressure on disk. Desktop virtualization introduced a new pattern of pressure on disk. At eight and nine AM everyone would boot up their virtualized desktop (spawning new desktops and booting the OS’s) putting massive pressure on storage. The caused storage systems to fail and if shared with traditional server virtualization everything failed. Traditional storage vendor’s solution to this problem was buy a bigger array with more cache and capacity. This created stranded capacity and was a huge CapEx expenditure when desktop virtualization was “supposed” to save us money.
Role of Cache
The rise of SSD has provided a dramatic improvement to the size of cache available in arrays. Cache provide ultra-fast disk for initial write and common reads thus reducing latency and improving IOPS. I remember the days when 1GB of cache was awesome these days’ arrays can have 800GB cache solutions or more. Cache allows you to buy larger and slower capacity disks while getting better performance to the virtualized application. Cache is a critical component in today’s storage solutions.
How to solve desktop virtualization
Vendors saw a gap in technology with desktop virtualization not being filled with traditional array vendors. This gap can be defined as:
- The array was not meeting my performance needs without buying more arrays
- I need to separate my IOPS for desktop virtualization away from servers
This gave rise to two solutions:
- Hyper-converged infrastructure
- All Flash arrays
Hyper-converged infrastructure has many different definitions depending on who you ask. For the purpose of this article it’s a combination of x86 hardware with local hard drives. This combination provides the compute and software based clustered storage solution for virtualization. The local hard drives on each compute node contribute to the required cluster file system. This model has long been used by large service providers like Google and Amazon. These are normally implemented for ESXi over NFS. The market leader at this time is Nutanix who really cut their teeth solving desktop virtualization problems. They have since moved successfully into traditional server virtualization. Their success has encouraged other vendors to enter the market to compete including Simplivity (OmniCube) and VMware (Virtual SAN). Each vendor has some mix of the secret sauce to address a perceived problem. It’s beyond the scope of this article to compare these solutions but they all take advantage of at least one SSD drive as a per compute cache. This local cache can be very large compared to traditional arrays with some solution using 1TB or more local cache. Each compute node serves as a storage controller allowing for a scale up approach to capacity and performance. Hyper converged solution have seen huge growth in the market and does effectively resolve the desktop problem depending on scale. Hyper converged solutions do introduce a new problem; balanced scalability. Simply put I may need additional storage without needing more controllers or compute capacity, but in order to get more storage I have to buy more nodes. This balanced scale issue is addressed by vendors providing different mixes of storage / compute nodes.
All Flash Arrays
With the rise of SSD the cost keeps getting lower. So traditional array vendors starting producing all flash arrays. Flash provided insane amounts of IOPS per disk, but lower capacity. Each month the capacity increases and the cost reduces on SSD making the All flash array (AFA) a very real cost effective solution. Years ago I was asked to demo a newly emerging Flash solution called RamSAN. The initial implementation was 150,000 IOPS in a single 2 U unit. I was tasked with testing its limits. I wanted to avoid artificial testing so I threw a lot of VMware database workloads at the array (all test of course). I quickly found out that the solution may be able to do 150,000 IOPS but that my HBA’s (2 per host) did not have enough queue depth to fulfill the 150,000 IOPS. All flash arrays introduced some new problems:
- Performance bottleneck moved from the disk to the controller on the array
- Capacity was costly
- New bottlenecks like queue depth could be an issue
I remember buying 40TB’s of ssd in more recent array. The SSD drives combined was capable of 300K IOPS while the controllers could not push more than 120K IOPS. A single controller was able to do 60K IOPS. Quickly the controller became my problem, one that I could not overcome beyond buying a new array with additional controllers. Traditional array vendors struggled with this setup bound by their controller architecture. A number of startup vendors entered the market with scale up controllers. All flash based solution can potentially solve the desktop problem but at a steep cost.
Problem with both solutions
All solutions suffer from the same problems:
- Stranded capacity in IOPS or storage capacity (more of either than you need)
- Storage controllers cannot meet performance needs
All of these issues happen because of a lack of understanding of the true application metrics. vCenter understands the application metrics the array understands reads and writes at a lun level. This lack of understanding of each virtual machine as an independent element does not allow the administrator to increase priority or preference of individual machines. Hyper converged have two additional challenges:
- Increased network bandwidth for data replication (assuming Fiber arrays NAS have this issue)
- Blades rarely have enough space for multiple hard drives
The value proposition for hyper converged is that you can replace your costly array with just compute with hard drives. This is a real cost savings but only if you are due for a technology refresh on both compute and storage and your budgets are aligned and agreed to spend on hyper converged. Getting storage to give up funds for a compute hard drive can be a hard proposition.
How to understand the smallest atomic unit
Lots of vendors understand this problem and have different ways of approaching this problem including:
- Local compute cache
Essentially to understand the small you have to understand the individual files and how they are connected. VMFS file system handles all this information, block based arrays only understand block based reads and write. Individual files are invisible to the block based file system.
Developed by VMware VVol’s provide a translation method between block based storage systems using protocol endpoints. These protocol endpoints run on the storage controllers or in-line with controllers to allow the array to understand the file system and individual files. This translation allows the array to act upon a single virtual machine on a lun instead of running on the whole lun. We can apply performance, snapshots and all array operations on the individual virtual machines. This is a great solution but has two problems:
- The protocol endpoints much like controllers have scalability issues if not implemented correctly
- Vendor adoption has been very slow
Local compute cache
This process adds SSD or RAM and creates a cache for virtual machine reads and writes. This cache can be assigned to individual machines or shared between the whole compute node. This method has an understanding of individual machines and accelerate reads and writes. In order to cache writes it’s critical that the writes be redundant so normally the writes have to be committed to at least two different compute nodes cache before acknowledged to the operating system. This ensures that the data is protected during a single compute node failure. The current leader providing read and write cache solutions like this is PernixData. This process ensures local performance enhancement at the lowest atomic level but does endure some common challenges with hyper converged including:
- Every compute node must have local SSD to accelerate solution
- Network bandwidth for replication is used (meaning your need more 10GB or you have to share it)
NFS has been around for years. It’s a method for sharing a file system to Linux and Unix hosts. VMware supports it natively and it’s the only supported file system (other than VMware VSan) that is not running VMFS. VM’s on NFS are files on the NFS file system. This allows the storage array / server full understanding of the individual files. This exposure can be a huge advantage when looking at backup products and site to site replication. Until NFS version 4 support (vSphere 6) there were a number of draw backs to NFS including multipathing. They have been removed and NFS provides the full object based storage solution that VVols promise. Scalability can be a problem with a maximum number of virtual machines and objects on a single lun, or with capacity around controllers. NFS based solution are network based and thus create network workload. In addition natively NFS does not provide any performance by file enhancement method it just deals with IO in and out. Lots of vendors have implemented solutions to enhance NFS.
What is best and does it solve the issue?
I started this post with the question is there a problem with storage… well lots of vendors seem to think so and want to sell us stuff to solve the issue. I suggest that from my experience we have a few issues:
- Backup is a major mess, in vSphere it’s hard to manage and keep working without constant care and feeding
- Storage arrays don’t have any understanding of the lowest atomic unit and thus cannot protect us from bad neighbors on the same lun, this becomes more of an issue in large hosting environments.
- Performance (IOPS) is rarely the issue except in specific use cases or small business thanks to oversized arrays
- Queue Depth is rarely the problem except in specific use cases
- Capacity seems to be the buzz problem and the price per year just keeps getting lower
I believe we need to get to object based storage so we can solve the backup problem. Doing VDP backups or lun snapshots does not allow management at the lowest atomic unit. The current model causes crashes and outages and struggles to work well. It’s not a product issue it’s an implementation and technology issue that needs a dramatic change to resolve.
Local knowledge at the lowest level
The object I manage is a virtual machine. My storage array friend manages a lun with multiple virtual machines (sometimes 100’s – yes I am looking at your NFS). Until we manage at the same atomic level we will have problems aligning policies and performance. I think policy based enforcement with shares is a great way to go… something like SIOC that is enforced by the array. Hot blocking, all flash etc… are all fixes to attempt to get around the essential communication issue between arrays. Future storage cannot be bound by two storage controllers it needs to scale to meet needs. The hyper converged folks have a big advantage on this problem. Future of storage is not block, except in mixed enterprise environments (I am looking at you mainframe). You need to get comfortable with network based storage and architect for it. Buy switches and interfaces on your compute just for storage traffic don’t mix it. Architect a super highway to your storage that is separate from your normal network traffic.
If performance is your issue, then solve it locally don’t buy another array. Local cache will save you a lot. Scale up solutions in arrays or hyper converged are both options but local SSD will be a lot cheaper than a rip and replace. It’s easier on management cost.
What should I choose?
It depends on your needs. If I was presented with a green field that is going to be running all virtualized workloads today I would seriously consider hyper converged. Storage arrays are more mature but move a lot slower on updates. I would move toward a more software defined solution instead of hardware installed. I think that central understanding of the lowest atomic unit is critical going forward. If you have a mixed storage environment or an investment in fiber channel large arrays with cache makes sense. If you are looking for solve VDI issues I would consider hyper converged or lots of cache. The future is going to hold some interesting times. I need storage to provide the following:
- No controller lock in I need it to scale to meet my needs
- It needs to understand the virtual machine individual identity
- It should include backup and restore capabilities to the VM level
- It has to include data at rest encryption (yes I didn’t mention this but it’s huge)
- Policy based performance (allocate shares, limits and reservations)
- Include methods to move the data between multiple providers (move in and out of cloud)
Does it sound like a unicorn… yep it is… Someone go invent it and sell it to me.