I have spend the last few years working in enterprise shops and enjoying the challenges they bring. I find a number of my peers are hired for a single use case or implementation and then leave. Staying with an infrastructure past a single implementation allows me to enjoy all that brownfield IT has to offer. It’s a completely different challenge. Almost everyone I talk to and everywhere I work they are trying to solve the same basic problem. Do more with less and more automation. Everyone wants Amazon easy button without the security or off premises challenges of AWS. In order to make it into the cloud they need organizational change and operational. The first place almost everyone focuses is upon operating system deployments. There are a number of models available and I though I would share some of my thoughts on them.
Cloning
This model has been made available by VMware. It’s a combination of creating a golden template and some guest customization. It’s very easy to manage and produces very similar results every time during provisioning. You have to focus on core shared elements or create a template for each use. It does have some challenges:
- How much of our software should we load on to it? Security software, monitoring agents etc.. How can we identify only core shared elements
- It does not scale to lots of different templates – keeping application templates for every application kills you. Imagine monthly updating 100 templates and ensuring they are not broken with application teams
- It is a virtual only solution making physical machine builds manual or a different process
- It’s a provisioning only process it has no idea of state after initial implementation
It’s a provisioning only process
This is a big problem for me with a lot of provisioning solutions not just cloning. They do initial provisioning and not steady state of operating system. This lack of life cycle management does not solve my brownfield issues. Sure you have an awesome initially consistent implementation but five minutes later you are now out of sync with the initial template. This problem has led me to configuration management in almost every shop I have worked in. I wish that everywhere I worked was a netflix with a re-deploy the micro-service if failed model. The truth is none of the shops I have worked in have that model. I have monolithic multi-tier applications that are not going away this year or in the future.
Do I have a life cycle problem or provisioning problem?
Yes both. I do not believe that the days of fire and forget operating systems are available to us anymore. Every server is under a constant state of change from attackers to patches. Everything changes. Changes bring outages when assumptions are made about configuration of servers. Early in my career I cannot count the number of outages that were cause by incorrect DNS settings or host files. These are simple configuration items that were expected to be correct but found after an outage to be changed. ITIL would have us believe it’s all about change management. We need a CAB and approves to avoid these issues. While I am all about documented processes and procedures, I have not found that most of the host file changes get done via CAB, they get changed ad-hoc or during an outage. We have to be able to provision, configure and ensure the configuration stays.
Configuration management and provisioning
Take a look at this scenario:
- Provisioning agent clones, provisions, duplicates a base operating system
- Provisioning agent does initial configuration of OS (IP address, sysprep etc..)
- Provisioning agent based upon customer select provides some unique information to configuration management that enables the understanding of server role (this is a SQL server, this is Apache etc..)
- Provisioning agent installs configuration management agent
- Configuration management agent checks in with configuration management system and changes all settings (both base settings and server role settings)
- Configuration management agent continues to ensure that role and base settings are correct for the life of the server
- Server administrator / application administrator etc uses configuration management agent to adjust settings
This model provides for initial configuration and consistent life cycle management. It does mean your configuration management agent does the heavy lifting instead of your provisioning agent.
What about physical?
The model above also works for physical. You have to move away from cloning and back into provisioning an operating system from PXE boot but it works very well. Now you can provision both physical and virtual from the same cloud agent using consistent life cycle management.
What is the challenge?
For me the challenge has been whenever I discuss configuration management it gets confused with compliance management. I believe that configuration management can and should be used for compliance management but it’s not the primary role. Compliance is about meeting security standards. Configuration is about ensuring configuration settings are correct and if not correcting. I can identify compliance issued and apply the resolution via configuration management. I can use the configuration management engine to identify things out of compliance that I have now changed to meet compliance.