Operational aspects of HCI

Hyperconverged infrastructure (HCI) is natively software defined proving a shift of operations away from the traditional storage management paradigm. Many of my customers have struggled with the paradigm shift when adopting storage HCI. HCI has been very successful in addressing specific use cases. Many of these use cases have been successful because they represent workloads that have not been traditionally managed by storage teams for example VDI. Adoption of HCI beyond these use cases requires large organizations to implement people and process transformation to be successful. Discussions with customers has shown that fear about the operational transition has created a lack of adoption. The net gain of HCI in the datacenter is a significant reduction in the total cost of ownership for storage.

What is your storage strategy?

When looking at your storage strategy you are likely to see a mixture of solutions to meet your needs. I have found that the following questions help people identify their requirements which ultimately lead to strategy:

What storage requirements do your applications have and how are they measured?
How is storage involved in your business continuity, disaster recovery, backup and availability strategy?
What data security requirements does your organization have?
What is your storage strategy for the cloud?

Once you identify your storage requirements the strategy can be aligned with functional needs. Functions that may be important to your organization around storage may include:

Capacity
Performance
Redundancy
Data security
Ease of management
Cost
Replication capabilities

Assigning measurements to these functions allow you to identify the correct storage “profiles” to be used in your organization. These profiles can then be aligned with your storage strategy.

Differences in HCI

HCI does present some differences from many traditional storage arrays. Four common elements of difference are capacity management, scalability, policy based management and roles.

Capacity Management

Capacity management in most traditional systems require a measurement based on historical usage metrics. Historical data is taken into account then a “bet” is made about required capacity for the next X years. The large “bet” on storage array capacity and performance does not allow IT to be agile to business changes. Growth beyond the initial implementation is possible by adding additional storage shelves or buying new arrays. HCI by contrast takes a linear model. You can scale up and out incrementally. You add capacity by adding additional drives to your current servers or add additional servers to increase available controllers and drive bays. I find that customer who adopt HCI are:

Able to procure storage in incremental blocks instead of via large capital expense “bets”
Able to have a predictable outcome on capacity management
Able to adopt new technology faster
Able to utilize storage resources without depreciation “bets”

Once a storage becomes aligned with HCI based capacity management they find that storage capacity growth is no longer a “flak jacket” exercise. The business can accept that their new project requires some incremental increase in cost instead of requiring a large CapEx spend. The integrated nature of HCI means that compute capacity sizing is integrated in part with storage capacity. This simplified capacity management allows the IT budget to stretch farther. Best practices for HCI include:

Design for scale, but build incrementally
Overall capacity management process is the same as traditional arrays but lead times are shorter and potentially more frequent
Choose servers with maximum available drive bays

Traditional storage capacity management requires procurement at roughly 60% usage to allow for growth. In large environment this means that large amounts of capacity will never be used increasing to total cost per GB of storage usage. HCI’s lower capacity expansion cost should allow large organizations to utilize 80% or more of capacity before buying expansions.

Some capacity metrics that you should monitor include:

Total available space
Used space
Used capacity breakdown including (VM’s, Swap, Snapshots etc.)
Dedupe and compression savings

Scalability

A common concern with HCI is scalability. Independent scalability is touted as one of the primary benefits of traditional three tier infrastructure: compute, storage, and networking. When considering the scalability of traditional storage systems the follow are considered:

Capacity in TB’s
Required IOPs
Throughput of storage systems (link speed)
Throughput of controllers

The adoption of flash drives has changed the scalability painpoint, IOPs are no longer a concern for most enterprises. Flash drives have increased the pressure on link speed and controller throughput forcing architecture changes in traditional arrays. When adopting HCI controllers and link speed becomes distributed removing both bottlenecks leaving only capacity to be considered assuming all flash arrays. HCI addresses capacity scalability in two ways: adding additional drives and increasing the capacity of existing drives. It is considered a best practice when implementing HCI to get servers with as many drive bays as possible. This allows you to increase capacity across the cluster by adding drives. The explosive adoption of HCI and flash has driven manufactures to provide increasing larger capacity drives. With VMware VSAN you can replace existing drives with larger drives without interrupting operations Customers can double storage capacity without adding additional compute nodes. HCI scales in a distributed fashion for linear growth. Some best practices to consider around scalability are:

Consider using traditional servers instead of blades to increase the available drive bays
Consider using all flash drives to remove all potential performance concerns
HCI does implement a flash cache which greatly improves performance without having to implement all flash

Policy Based Management

Many traditional arrays availability and performance is tied to logical unit number (LUN). These capabilities are set in stone at time of creation. In order to change these capabilities moving the data is required. This type of allocation creates challenges for capacity management and increases the number of day two operations required in order to meet business needs. HCI takes a policy based approach and removes the constraints of LUNs. There is a single datastore provided by HCI radically simplifying traditional storage management. Policies define availability and performance requirements and the HCI system enforces the policies. To increase the performance of a specific workload a new policy is defined and assigned to the workload. The HCI system works to ensure policy compliance without interruption to the workload. Policy based management provides large operational efficiencies. An IDC study has shown that HCI can lower the OpEx cost of storage by 50% or more. In VSAN there are two key elements in a policy: stripe count and failures to tolerate (FTT). Stripe count denotes the number of drives an object needs to be striped across thus improving performance. Each object will have its data spread across X number of disks on the same compute note. Failure to tolerate denotes the number of compute nodes that can fail before data access is affected. A FTT setting of 1 is essentially a mirror each object must have one duplicate copy on another node. FTT of 2 provides two copies of the data across three total nodes. FTT has a direct effect on the amount of storage used in the HCI implementation. Policies should be designed to meet the business needs of the application. A few best practices to consider:

Do not use FTT of 0 unless you truly don’t care about loss of the data (stateless services)
Depending on the type of disks backing the HCI solution additional stripes may not provide performance boosts

Some general VSAN performance guidance is provided below:

Some general VSAN availability guidance provided below:

The policies should align with organizational application requirements. Management by policy provide the greatest flexibility and reduces the management cost.

Roles

Many organizations have struggled to adopt HCI because of the change in skills and process required to be successful. The best case scenario for HCI bridges the world of compute, storage, networking and security together into a single platform. This single platform provides operational synergy and encourages standards. Organizations that have been successful in adoption of HCI have learned that it requires a cross functional skills set. The current reality of siloed teams struggle to adopt HCI. Creation of cross functional teams with blended skills allows accelerated adoption of HCI.

Some best practices for successful HCI adoption include:

Cross functional training
Blended teams
Rotating subject matter experts who are expected to own a product but train others
Outcome-oriented teams and compensation instead of activity-oriented

Many of my customers have adopted a plan, build run methodology in these organizations it is recommended that teams at each tier be blended. I recommend that members of each silo of plan, build and run rotate though plan, build and run to better understand each role.

Benefits of HCI

HCI can provide many benefits required by modern datacenters. I have observed customers successfully adopting HCI have achieved the following outcomes:

Hyper Scalability
Operation agility
Operation efficiency
Simplified operations and support
Improved availability and performance

I truly believe it’s time to adopt HCI in your datacenter and realize the operational and cost benefits.

Operational aspects of HCI

Related

Leave a Reply Cancel reply

Share this:

Related

Leave a Reply Cancel reply