Cloud – Joseph Griffiths

February 1, 2018May 16, 2019

Learning vRealize Orchestrator

Recently I provided a lab to learn vRealize Orchestrator to the Austin VMUG. It’s been too long since I attended a VMUG meeting due to moving to Dallas. It was great to meet new peers in the Austin area. The purpose of the lab was to provide people hands on experience using Orchestrator with some real world helpful examples. I was present to answer questions and help the learning process.

I am working to bring the same lab to Dallas and Houston in the next few months but wanted to share the labs here. It’s mostly possible to do the labs in the hands on lab environment of hol-1821-05-CMP partially made by my friend Ed Bontempo. You will have to type a lot of code examples since HOL does not support cut and paste. You can do it all in your home lab. Contact me if you want to know about the next session we are presenting to a live VMUG if you want some instructor help.

Code: code_text

Lab Manual: VMUG_VRO_LAB_Manual

Enjoy!

November 17, 2016

Should IT build a castle or a mobile home?

So I have many hobbies to keep my mind busy during idle times… like when driving a car. One of my favorite hobbies is to identify the best candidate locations to live in if the Zombie apocalypse was to happen. As I drive in my car between locations I see many different buildings and I attempt to rate large buildings by their Zombie proof nature. There are many things to consider in the perfect Zombie defense location for example:

Avoiding buildings with large amounts of windows or first floor windows
Building made of materials that cannot be bludgeoned open for example stone
More than one exit but not too many exits
A location that can be defended on all sides and allows visible approach

There are many other considerations like proximity to water and food etc.. but basically I am looking for the modern equivalent of a castle:

OK what does this have to do with IT

Traditional infrastructure is architected like a castle its primary goal is to secure at the perimeter and be very imposing to keep people out. During a zombie attack this model is great until they get in then it becomes a grave yard. IT architects myself include spend a lot of time considering all the factors that are required to build the perfect castle. There are considerations like:

Availability
Recoverability
Manageability
Performance
Security

That all have to be considered and as you add another wing to your castle every one of these elements of design must be considered for the whole castle. We cannot add a new wing that bridges the moat without extending the moat etc.. Our design to build the perfect castle has created a monolithic drag. While development teams move from annual releases to quarters or weeks or days we continue to attempt to control the world from a perimeter design perspective. If we could identify all possible additions to the castle at the beginning we could potentially account for them. This was true in the castle days: there were only so many ways to get into the castle and so many methods to break in. Even worse the castle provided lots of nooks and locations for zombies to hide and attack me when not expecting it.. This is the challenge with the Zombie attack they don’t follow the rules they just might create a ladder out of zombie bodies and get into your castle (World War Z style). If we compare zombies to the challenges being thrown at IT today the story becomes valid. How do we deal with constant change and unknown? How do we become agile to change? Is it from building a better castle?

Introducing the mobile home

Today I realized that the perfect solution to my Zombie question was the mobile home. We can all assume that I need a place to sleep. Something that I can secure with reasonable assurance. I can re-enforce the walls and windows on a mobile home and I gain something I don’t have with a castle: mobility. I can move my secured location and goods to new locations. My mobile home is large enough to provide for my needs without providing too many places for zombies to hide. IT needs this type of mobility. Cloud has provided faster time to market for many enterprises but in reality you are only renting space in someone else’s castle. There are all types of methods to secure your valuables from mine but in reality we are at the mercy of the castle owner. What if my service could become a secured mobile home… that would provide the agility I need in the long run. The roach motel is very alive and well in cloud providers today. Many providers have no cross provider capabilities while others provide tools to transform the data between formats. My mobile home needs to be secure and not reconfigured each time I move between locations while looking for resources or avoiding attack. We need to reconsider IT as a secured mobile home and start to build this model. Some functions to consider in my mobile home:

Small enough to provide the required functions (bathroom, kitchen and sleeping space or in IT terms business value) and not an inch larger than required
Self contained security the encircles the service
Mobility without interruption of services

Thanks for reading my rant. Please feel free to provide your favorite zombie hiding location or your thoughts on the future of IT.

January 21, 2016

Perfect deployments of OS with automation

I have spend the last few years working in enterprise shops and enjoying the challenges they bring. I find a number of my peers are hired for a single use case or implementation and then leave. Staying with an infrastructure past a single implementation allows me to enjoy all that brownfield IT has to offer. It’s a completely different challenge. Almost everyone I talk to and everywhere I work they are trying to solve the same basic problem. Do more with less and more automation. Everyone wants Amazon easy button without the security or off premises challenges of AWS. In order to make it into the cloud they need organizational change and operational. The first place almost everyone focuses is upon operating system deployments. There are a number of models available and I though I would share some of my thoughts on them.

Cloning

This model has been made available by VMware. It’s a combination of creating a golden template and some guest customization. It’s very easy to manage and produces very similar results every time during provisioning. You have to focus on core shared elements or create a template for each use. It does have some challenges:

How much of our software should we load on to it? Security software, monitoring agents etc.. How can we identify only core shared elements
It does not scale to lots of different templates – keeping application templates for every application kills you. Imagine monthly updating 100 templates and ensuring they are not broken with application teams
It is a virtual only solution making physical machine builds manual or a different process
It’s a provisioning only process it has no idea of state after initial implementation

It’s a provisioning only process

This is a big problem for me with a lot of provisioning solutions not just cloning. They do initial provisioning and not steady state of operating system. This lack of life cycle management does not solve my brownfield issues. Sure you have an awesome initially consistent implementation but five minutes later you are now out of sync with the initial template. This problem has led me to configuration management in almost every shop I have worked in. I wish that everywhere I worked was a netflix with a re-deploy the micro-service if failed model. The truth is none of the shops I have worked in have that model. I have monolithic multi-tier applications that are not going away this year or in the future.

Do I have a life cycle problem or provisioning problem?

Yes both. I do not believe that the days of fire and forget operating systems are available to us anymore. Every server is under a constant state of change from attackers to patches. Everything changes. Changes bring outages when assumptions are made about configuration of servers. Early in my career I cannot count the number of outages that were cause by incorrect DNS settings or host files. These are simple configuration items that were expected to be correct but found after an outage to be changed. ITIL would have us believe it’s all about change management. We need a CAB and approves to avoid these issues. While I am all about documented processes and procedures, I have not found that most of the host file changes get done via CAB, they get changed ad-hoc or during an outage. We have to be able to provision, configure and ensure the configuration stays.

Configuration management and provisioning

Take a look at this scenario:

Provisioning agent clones, provisions, duplicates a base operating system
Provisioning agent does initial configuration of OS (IP address, sysprep etc..)
Provisioning agent based upon customer select provides some unique information to configuration management that enables the understanding of server role (this is a SQL server, this is Apache etc..)
Provisioning agent installs configuration management agent
Configuration management agent checks in with configuration management system and changes all settings (both base settings and server role settings)
Configuration management agent continues to ensure that role and base settings are correct for the life of the server
Server administrator / application administrator etc uses configuration management agent to adjust settings

This model provides for initial configuration and consistent life cycle management. It does mean your configuration management agent does the heavy lifting instead of your provisioning agent.

What about physical?

The model above also works for physical. You have to move away from cloning and back into provisioning an operating system from PXE boot but it works very well. Now you can provision both physical and virtual from the same cloud agent using consistent life cycle management.

What is the challenge?

For me the challenge has been whenever I discuss configuration management it gets confused with compliance management. I believe that configuration management can and should be used for compliance management but it’s not the primary role. Compliance is about meeting security standards. Configuration is about ensuring configuration settings are correct and if not correcting. I can identify compliance issued and apply the resolution via configuration management. I can use the configuration management engine to identify things out of compliance that I have now changed to meet compliance.

November 18, 2015

Nutanix and Acropolis test drive for $1 an hour

Catchy title eh? Well I have been wanting to test drive Nutanix community edition for a while now. It allows you to set up a Nutanix cluster on almost any hardware. It also runs Nutanix’s new hypervisor Acropolis (KVM based). My desired to set this up has always been limited by my time and the need to clear out some hardware for the test. Nutanix was also kind enough to provide me with their training portal access for free. So I am able to learn about their products using their interactive learning system. It is by far one of the most advanced online teaching situations I have ever experienced. But no teaching tool is the same as playing with the real thing. We have Nutanix at work but I was not present for the original setup and don’t do much of the day-to-day configuration. So I wanted a low-cost play ground. When looking for online casino game like lucky irish slots, visit www.slotsbaby.com for more information.

Enter Ravello systems

Ravello made a huge smash this year at VMworld taking some of the best in show awards, in addition they offered all vExperts a free 1,000 hours per month access. Mix that with the 1,000 free hours I get as a RHCE and I have a lot of cloud space available. For those who have not used Ravello they provide a cloud front end to public cloud providers Amazon and google. Allowing you to set up nested hypervisors including Nutanix Acropolis and ESXi. You do have to bring your own licenses. The interface is simple and clean but very powerful. A perfect example of what cloud provisioning should be. In addition people can share templates via libraries for you to use.

Enter the Nutanix Library

It’s available here. So you login to your ravello account then visit that page and click add to library. Now you can deploy a complete community edition of Nutanix in the cloud with two clicks. It’s really impressive. Here are the steps to deploy:

Deployment

Select Library -> Blueprints
Select Nutanix Community Edition
Select Create application
Name your application
Click on the Nutanix CE icon in the center of the screen

On the right side information will be presented about the virtual machine (that will run everything included nested virtual machines
I wanted to make mine accessible via the internet (yes it’s very insecure)
Click on services
Under each service select Advanced and Enable SNAT
After they are all enabled click save at bottom
Now click the publish button
Select optimize for performance
You can select to auto power down after xx hours to avoid costs
You can see your billing rate per hour for your server (this is per hour of turned on server not deployed)

So you can see I am able to run this virtualized Hypervisor for $1.0131 per hour.

It will take a few minutes to boot up and generate all required SSL keys for the first time (mine took about 20 minutes before it was totally ready to go)
Once it’s booted up select application and your application name
The summary tab will show the status and ports
Click open on the 9440 and you should be presented with Nutanix login

First time login is admin:admin and will require you change the password. If all goes well you are now able to deploy nest virtual machines on your hypervisor. It does require that you have a free Nutanix account… it even offers to let you sign up live. For bonus points deploy two of them and get them to replicate the virtual machine. The sky is the limit.

Wrap up

I am personally really amazed at how easy it all worked. It took literally hours of configuration and fiddling (which would be a great learning experience) into 20 minutes. Both Nutanix and Ravello should be commended for these awesome services. I wish that other vendors could provide a complex POC in 20 minutes. I think Ravello has a major future in the market. Give it a try it’s worth $20 to play with it. Let me know what crazy things you try. In know in the future weeks replication is in my mind. A quick guide to Nutanix community edition can be found here.

June 21, 2015

Journey to an Automated Cloud Part 1

Are you ready to automate everything? Does you boss want some of that cloud? Well everywhere I turn people want to get into the cloud. They all want a vendor product to provide the cloud. Every vendor show I to go to has hundreds of products to solve that problem. In my experience it is not a product problem that limits our journey to the cloud. In these series of articles I will explore some of my thoughts on your journey to the cloud.

Part 1 – Where am I and what do I want?

My thoughts for this part are best explained by an exchange from Alice in Wonderland:

“Would you tell me, please, which way I ought to go from here?”
“That depends a good deal on where you want to get to,” said the Cat.
“I don’t much care where–” said Alice.
“Then it doesn’t matter which way you go,” said the Cat.
“–so long as I get SOMEWHERE,” Alice added as an explanation.
“Oh, you’re sure to do that,” said the Cat, “if you only walk long enough.”

The cat completely covers my feelings if you don’t know what you want it does not matter which way you go. Most engineers are stuck with the problem of lack of definition. We want to get into cloud provisioning. We want to get to 20 minute deployments of servers. We need to be more like Amazon.

Let examine these statements a little:

We want to get into cloud provisioning.

Does this mean you want to use public cloud for servers?
Does this mean you want a web portal for provisioning servers?
What does this mean? It’s like me telling you I want to enforce family values… We all want family values but every family has a different definition

We want to get to 20 minute deployments of servers.

What type of server do you want deployed in 20 minutes? OS, Application, three-tier?
What does deployment mean? Powered on? Able to talk on the network? Internet facing?

We need to be more like Amazon.

You want to deploy unsecured operating systems without backup quickly?
You want to have pay as you go for our customers?

I want to be fair and emphasize that all these statements are valid but without definition and that every one of these offerings have positive sizes as well. What they lack is business definition. In almost every cloud situation the business wants IT to be more agile. They want processes to go more quickly. Universally they cannot understand why provisioning a server takes so long and is so complex. Honestly nether can I. I have made a career out of complex servers and it has to stop. So before you start down some unknown paths with the cat ask some critical business questions (no engineer likes them but you need to ask them).

For example:

What pain point are we trying to solve with the cloud
What specific expectations do you have for the cloud
What is the timeline for the cloud

If during any of this conversation products come out know it’s normal. Business people explain technology in terms of products. (For example I want it to be like an ipad with dropbox) These statements are not locking you into a product they are helping you define requirements. Ask questions about the product to help define requirements. It is critical that you translate their products into requirements and constraints. Once you have translated their needs into requirements statements get them to sign off on it.

Where am I?

In almost all cloud deployments it’s really about adding automation to all aspects of the service. This allows you to be more agile to change. Before you can begin your transformation you need to define your starting point.

Is your current environment like the above picture? Do you have many hands touching the configuration of your servers and applications. Have you provided some basic automation like server cloning or configuration management? This approach is common and really a growth of the virtualization era. Let me give an example of this process:

Server request is provided to server team
Server team clones an operating system
Firewall team does firewall rules
Server team deploys application
Developers deploy code
Security team reviews server and approves
Server team release to production

This process seems simple and should be easy. This is where the people problems start. The development team has a project. The server,firewall and security team have tasks. They do their tasks without knowledge of the development teams project. Which means that bolts will not be where they are expected and in the end something will require rework. There is tons of room for human error and mistakes. Each project built this way will be unique because people are executing the steps. It gets worse as you scale up. Assume that the normal firewall worker is out sick, now we have a stand-in who cannot do the job as well. More errors and problems are introduced. So to review:

Each team treats a project as a task
Each team executes the tasks with different priorities causing delays
There is lots of room for human error and mistakes hurting the timeline
It does not scale it’s mostly human capital

The fun part is this process is pretty good. At least they have a defined process.

Do you have a process and is it followed?

It’s simple individuals have processes they natively follow. We naturally assume that other people think and act just like us so naturally they will follow the same process right? Wrong. Everyone is different and does it a little differently. So many IT shops have poorly defined processes and even when they do they are rarely followed. In order to make it into the cloud you have to define your manual processes. Get them on paper with the following details:

What information is required to work this process
What information is expected to return from this process
Who can work this process
What choices need to be made as part of this process
What happens if a process fails in an unexpected way

Does this sound like software development to anyone else? Well it’s is. Welcome to the rest of your career as a software developer or what I like to call a process engineer. Once you have defined the process management needs to enforce the manual process to find out where it breaks… this is the hard part. You can write down a process… you can send people to training on the process but you cannot make them drink. All manual processes will be slower and worse at first. Change is hard (That’s part two). You have to practice the process to find the holes. Here is a logical outline to define your process:

Have a subject matter expert define the process on paper (electronic or otherwise)
Have the SME train others on the process
Have management encourage others to do the process
Have people other than SME do the process and report back problems
Improve the process until it works in all situations encountered

Does it seem simple? Yep it is.. Does it seem common sense… right again. I should change the names of everything to something like points or teeshirt sizes so I can sell it but that not me. It is simple to understand hard to implement.

February 3, 2015

vSphere 6.0 What excites me

Yesterday VMware announced the general release of vSphere 6.0. It is not yet available for download or install but it’s exciting. There are lots of really awesome new features and improvements. I want to focus on what I consider to be the two most exciting features:

vMotion improvements
VVol’s

vMotion improvements:

Long Distance vMotion – You can now live vMotion across distances of up to 100ms RTT (up from 10ms RTT) – Removing the distance factors
Cross vSwitch and vCenter vMotion – Move virtual machines live between different virtual switches and different vCenters removing almost all boundaries on vMotion
vMotion of MSCS VMs using pRDMs – I could not find much on this I really want to know more here.
vMotion L2 adjacency restrictions removed – vMotion no longer requires layer 2 stretch you can now route vMotion – something I have been trying to get a RPQ on for a while. (When combined with NSX you now no longer require any spanning tree protocols, you could route everything)

What does all this mean. Better design removing past layer 2 requirements and the true ability to migrate active work loads into the cloud. Make no mistake all these changes are about flexible movement into the cloud. Combine with NSX and your workload can move anywhere. These changes have a big impact on vSphere metro clusters. I would love to see HA domains include more than one site using multiple vCenters and site replication for fail over of load. (I expect it to come next version just my personal thoughts)

VVol’s

This is the holy grail of software defined storage. Giving the storage system awareness to the individual object or virtual machine. This enabled granular level performance. It can enable flash clone/copy style backup removing the current kludgy process. I have been super excited about VVol’s for a while now. I saw a demo at VMworld 2013 done by HDS that was awesome. This object level storage solution enables awesome features like replication, deduplication, compression, storage tiering at the hard drive level, real performance metrics on the complete storage system and stack. This is really going to blow up the hyper-converged storage vendors solutions. Mark my words the first vendor to adopt and correctly market VVol’s will be huge.

Here is the problem: storage vendors have to enable and support the features. Expect some of the start up storage vendors to be supporting it right away. While larger vendors may take a while to solve the issue.

There are a ton of improvements to the new version including a solution to the lack of HA for vCenter (multiple processor FT).

Future

It is clear that VMware feels that the future is in hybrid cloud and mobility. I am sure they have lots of smart research to prove it. Compliance and configuration management continue to be my largest problem with public cloud. I think solutions like NSX start to resolve the public cloud problems. I look forward to the future with my friend VMware. If anyone from VMware ever reads this article please consider HA across vCenters with replication as an option, it would be just perfect (maybe combined with VMware VSAN to up sales).

December 10, 2014

Managing your family cloud in a industrialized world

Managing your family farm in an industrialized world

I have spent a good portion of the last year attempting to work with customers to convert their infrastructure from each application being unique to a streamlined automated power house. There is a dream that every shop has to become like google or Netflix. To run an automated, agile well monitored environment. Every company is racing to provide products that facilitate their new cloud era management.

Game Changers

Every so often there is a product that innovates the so well that is changes the standards. When I was in college being a systems administrator was racking and stacking servers, large storage arrays and huge buildings with cooling units. X86 virtualization changed this game, a large part thanks for VMware innovation. The very fabric of server compute architecture was changed when the first vMotion was completed. VMware virtualization was a game changer. When I look for the game changer to make all my customers’ requirements easy to deploy within five cookie cutter models I cannot find it.

The American Way

The simple answer is my customers are American’s they want it customized every time. They each control their own budgets without any central control. Much like American government each business unit cannot agree on a color let alone a web development platform. The technical teams have always been a second partner in the discussion because they don’t control the budget. There has been many great articles about making IT a business. I won’t go into that side subject because I believe it’s only part of the problem. The simple problem is even if we control the budget we would still have customers who want it their way.

Why can’t you be like Amazon or [..insert cloud provider name here].

This is thrown around a lot. We are compared to Amazon when we fail to have agility. It’s a valid statement if the company is really willing to buy from Amazon. I argue that if you want IaaS like Amazon provides most of IT could provide that really easy… but the business unit wants managed IT. They want that website now and don’t care about the management issues, but when the site fails it’s your fault. When the business buys a site on Amazon and it fails it’s the businesses fault. Your answer to this question really should be it’s a different business model.

Enough ranting what is the solution

To avoid making this post 100% rant I would like to share some of the things I have used to manage the family farm. If you give up the idea that today you can have Amazon (at least until technology game changes come along – Docker or something else like CoreOS) here are my suggestions for managing the farm:

Monitoring should be automatic – You need a monitoring solution that includes discovery of new assets and monitoring. It should be able to discover services running and monitor them with minimal customer interaction. Monitoring should also include historical and individualized thresholds. If possible monitoring should automatically open up tickets to be worked and resolved. (for VMware virtual vRa and VIN does a decent jobs of this – with some limits)
Configuration Management – This is the key component that is missing in so many shops. You need to manage your life cycle with configuration management. You should use configuration management to spawn and configure new servers. It should ensure that these servers are in compliance and remediate servers that are out of compliance. This type of configuration management reduces troubleshooting and allows you to manage at least a portion of your infrastructure as a single entity. (There are lots of products I have the most experience with puppet and it works very well)
Central Log location – Putting all your logs, system, firewall, network, vCenter etc… into a single location allows you to alert across your infrastructure and do discovery in a single pane of glass.
Documentation – The whole IT industry is really bad at documentation. They have a major epic fail. I cannot count the number of times I have googled for an issue I am facing to find the solution on my own blog or a forum posting I made. We all forget, find a location for documentation that is searchable and share it with the whole team. This will really cut down on repeated wasted time and get your into a thoughtful method of practice. This documentation should include at the most basic level what a server / switch / whatever does and common commands or issues.
Change Management – The dreaded term… IT shops hate it. Best case scenario changes should be recorded automatically but we do need a single location to locate and answer the grant question what has changed… Some people use wiki’s, some use text files, some use configuration management that logs change tickets. It does not matter you need to have some change management.
Identity Management – Until you have a single Meta directory for all authentication and authorization any efforts to be agile will fail.

As you can see the field is ripe and ready for vendors to harvest products. While I am waiting for the next game changing technology I am sure I will have more than a few family farms to manage. These products can help. If you want to make the journey off the family farm into a mega business you might want to consider these steps:

Virtualize everything – removing your dependency on specific hardware or storage vendors
Consolidate all your services to a common management platform (monitoring, logging, change, hardware, virtualization, etc..)
Consolidate your operating systems to as few as possible
Choose a database platform and force all new development into that platform
Choose an application development platform and force all new development into that platform
Wait five to ten years to force all applications into the common platform and hope that management has the strength of will to make it that long

I hope my crazy nodes have helped you on your family farm projects. Please add additional suggestions.

November 15, 2013

Vcloud Director virtual machine unable to TCP to internet

Wierd one here I was working inside a vcloud org and I had a new virtual machine unable to browse the internet. DNS would work… traceroute would work… at first I figured it was a configuration issue as mentioned in the KB here. But it was not the same issue existed as long as the virtual machine was behind the vshield edge appliance. If I took it in front (exposed to internet) everything was fine. I ended up calling vmware support and it turned out to be IPV6 on the Windows virtual machine. IPV6 is not supported on vshield Edge. As documented here. Disabling IPV6 in windows is a registry modification and a reboot so it’s a pain… make sure you turn it off or it will be a fun fail.