Operational Debt the lead weight around IT’s neck

Over the past three years I have had the opportunity to work on IT strategy with many different Fortune 500 companies in my role as a Staff Solutions Architect at VMware.   Inside this wide exposure a few patterns have emerged.   One consistent pattern is the focus on initial state.   Much of IT’s efforts are placed upon creating a consistent, stable and secure initial state.   This is an heirloom for the days before technology was prevalent in every aspect of business.    Initial state is a critical aspect of time to market it does not represent the realities of IT.    Every time IT releases an initial state, they create operational debt.  This operational debt is complicated by the dynamic nature of any product thus creating a compounding interest of debt.   This mountain of obligation accounts for roughly 70% of operational spend in the average IT shop.   The cost of operations quickly becomes the true limiter to innovation and agility.  There are three things that exponentially bring this problem to a new breaking point:

  • Automation of provisioning
  • Public cloud
  • Dynamic nature of containers

Automation of provisioning

Being agile to the business is top of mind for most IT executives.   This is addressed by some type of provisioning automation.   This effort seeks to reduce the total time to delivery of assets to development or production.  Focused effort can produce dramatic reductions in total process time.   The determined focus on enabling self-service initial state may miss the required governance to be successful long term.   Accelerated self-service and increased consumers only expedites the compounding operational debt.  

Public Cloud

Public cloud adoption initially began as a cost savings measure with benefits.   After initial waves of adoption many organizations recoiled from their stated “all in” cloud strategy with a discovery of the real cost.   Today organizations are adopting public cloud for three reasons: 

  • Removal of operational debt of infrastructure
  • Public Cloud unique services (machine learning, FaaS, WaF etc.)
  • Data gravity (their data exists in the cloud)

All valid business reasons for public cloud adoption.   The removal of operational debt is enabled by a software abstracted infrastructure and limited catalog options.    The clouds accelerated speed of consumption for infrastructure services naturally puts pressure on private cloud.  The public cloud trades cost for reduced infrastructure debt.   Public cloud introduces unique features designed to limit your ability to move away from the chosen cloud.  Public clouds are not incentivized to ease your movement away from their service.   The real benefit of public cloud is the abstraction of infrastructure components into software allowing for automation.   Once again public cloud today is highly focused on initial state and largely ignores the long-term operational cost of a service. 

Dynamic Nature of containers

Containers and their immutability concepts immediately look like a liberation from operational debt.   Throwing away misbehaving resources and replacing them with perfect copies immediately does appear a solution to operational debt.    Moving to a declarative approach to IT continues to extract the individual value of infrastructure components.   Container orchestrators add an element of maintaining designated declarative state which address some of the operational debt.   Containers lack of continuance does not remove all debt but instead illustrate how much debt is ignored.   Removal of patching and troubleshooting does not balance the increased observability and complexity.    The average three tier application replaced by micro-services can balloon from twenty managed entities to hundreds.    The average life of a container is measured in minutes and hours making the operational challenge even more acute.   When Google first started deploying containers at scale, they quickly identified that their operational team could not scale to meet their new operational demand.   In order to resolve this scale issue, they created the site reliability engineer role.   The SRE is a developer who spends half their time working operations and half automating operations.  This has allowed Google to operate their global platforms with only 1,500 operational staff.   This is the first factor that is not singularly focused on initial state. 

Lessons learned from the three factors:

The three factors provide some valuable capabilities to be considered while attempting to solve operational debt:

  • Automation is required for both initial and operational state
  • Software abstraction of infrastructure is critical to enable automation
  • Declarative models allow us to enforce initial state post deployment

Moving forward

With these new climate pressures, the time to address the amplifying operational debt is now.   Your strategy cannot simply include automating operations you have to create the correct landscape to overcome your organizational inertia.   Operational debt investments scale independently of location or size of organization.  They are the only investment that will continue to reduce operating expense year over year.   Operational debt can be divided into two categories common and organizationally unique.   Common operational debt includes credential management, patching, hardware refreshes, code promotion to production, monitoring, architecture changes and break fix.  Common operation debt is accounts for 70% of operational cost and is common across all application.   These tasks represent toil work that normally has zero value when done by a human operator they only add latency to the work.   

Quantifying toil tasks

Identification of toil tasks should be evaluated based upon three factors:

  • Repeated – use your ticket system to track commonality of the task and identify how much time is spend on the task each iteration
  • Requires no human judgement – Many organizations throw out potential toil work due to their complex processes that seem to require humans when in fact this is a non-essential step in the process.  Do no evaluate the current process instead focus on the desired outcome
  • Is interrupt driven – do you take the action after receiving a ticket or notification then it is likely toil work

Once you have identified toil tasks use frequency as a guide to create your hit list of things to automation.  

Service Orientation

Service orientated models approach the operation debt as part of the whole service.   Operations should be automated as part of the service deployment thus removing all toil work before consumption.    While I believe service orientation models and governance are the key to removing future operational debt, they don’t address the current legacy challenge. 

Roadmap

The steps include:

  1. Implementation of software abstraction – as mentioned the key element of public clouds ability to deliver infrastructure without debt is use of software abstraction.    Without this base for automation your debt reduction efforts will hit inertia that cannot be overcome.  
  2. Identify repeat toil – Use your ticketing system to identify commonality of toil tasks this provides a prioritized hit list of tasks
  3. Automate repeat toil – begin automating toil tasks by removing all human interaction it is very simple to create a que in your ticket system that is serviced by automation as you transition toil from humans to automation
  4. Move to declarative models – Declarative models provide the ability to check for and potentially enforce expected state this shift for future development is enabled by software abstracted infrastructure and reduces future debt
  5. Continue to remove toil – declarative models do not remove all toil they only make software enforcement easier.   Continued effort to remove debt is required to avoid the exponential increase

Organizations who have implemented this operation debt reduction strategy have seen a reduction of operational cost by up to 50%.   This reduction allows for innovation and agility driving additional revenue.   Debt reduction is a key step that moves IT from cost center to business partner. 

Operational Docker Removing the extra layers when you build containers

Building containers makes lots of different layers. When you make a change to an element in the build all following layers have to be rebuilt because they cannot be taken from cache. A simple example I have used before that has way too many layers:

docker build -t test .
Sending build context to Docker daemon  5.632kB
Step 1/7 : FROM ubuntu:latest
 ---> 7698f282e524
Step 2/7 : RUN echo "Test"
 ---> Using cache
 ---> 75ac3bfbeaba
Step 3/7 : COPY 1 .
 ---> Using cache
 ---> d457a7492d2c
Step 4/7 : ADD 2 .
 ---> Using cache
 ---> 1c3284c1e6a0
Step 5/7 : ADD 3 .
 ---> Using cache
 ---> 96ab91bcf3df
Step 6/7 : ADD 4 .
 ---> Using cache
 ---> 2889643b631b
Step 7/7 : ADD 5 .
 ---> Using cache
 ---> adb6797fe48a
Successfully built adb6797fe48a
Successfully tagged test:latest

If you examine the number of dangling layers (layers not connected to an active image) for my build there are none:

docker images -f dangling=true
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE

Now I am going to modify one of the files that are being added at step 3/7 so it triggers a rebuild:

docker build -t test .
Sending build context to Docker daemon   5.12kB
Step 1/7 : FROM ubuntu:latest
 ---> 7698f282e524
Step 2/7 : RUN echo "Test"
 ---> Using cache
 ---> 75ac3bfbeaba
Step 3/7 : COPY 1 .
 ---> Using cache
 ---> d457a7492d2c
Step 4/7 : ADD 2 .
 ---> a93a35a37ebe
Step 5/7 : ADD 3 .
 ---> b90d01ee0806
Step 6/7 : ADD 4 .
 ---> 43af309b28d8
Step 7/7 : ADD 5 .
 ---> 540007e7c833
Successfully built 540007e7c833
Successfully tagged test:latest

Now if we check for dangling layers we now have layer that was replaced 3/7:

docker images -f dangling=true
REPOSITORY          TAG                 IMAGE ID            CREATED             SIZE
<none>              <none>              adb6797fe48a        10 minutes ago      69.9MB

You can identify where these dangling layers are stored by doing docker image inspect:

docker image inspect test
[
    {
        "Id": "sha256:540007e7c833d09da0edaad933d4126075bffa32badb1170da363e1e1f220c4c",
        "RepoTags": [
            "test:latest"
        ],
        "RepoDigests": [],
        "Parent": "sha256:43af309b28d81faa983b87d2db2f64b27b7658f93639f10b07d53b50dded7c45",
        "Comment": "",
        "Created": "2019-06-19T02:46:38.5065201Z",
        "Container": "",
        "ContainerConfig": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/sh",
                "-c",
                "#(nop) ADD file:f5b9e73db3de1fef2d430837d0d31af0af9c9405c64349def90530b2fc8ca6d2 in . "
            ],
            "ArgsEscaped": true,
            "Image": "sha256:43af309b28d81faa983b87d2db2f64b27b7658f93639f10b07d53b50dded7c45",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "DockerVersion": "18.09.2",
        "Author": "",
        "Config": {
            "Hostname": "",
            "Domainname": "",
            "User": "",
            "AttachStdin": false,
            "AttachStdout": false,
            "AttachStderr": false,
            "Tty": false,
            "OpenStdin": false,
            "StdinOnce": false,
            "Env": [
                "PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ],
            "Cmd": [
                "/bin/bash"
            ],
            "ArgsEscaped": true,
            "Image": "sha256:43af309b28d81faa983b87d2db2f64b27b7658f93639f10b07d53b50dded7c45",
            "Volumes": null,
            "WorkingDir": "",
            "Entrypoint": null,
            "OnBuild": null,
            "Labels": null
        },
        "Architecture": "amd64",
        "Os": "linux",
        "Size": 69859108,
        "VirtualSize": 69859108,
        "GraphDriver": {
            "Data": {
                "LowerDir": "/var/lib/docker/overlay2/77fcf68e04407dd297d93203e242e52cff942b2d8508b3b6e4db2f62d53f38bc/diff:/var/lib/docker/overlay2/dd3f244fb32b847a5d8b2b18e219142bbe3b2e61fe107e290c144154b4513d5e/diff:/var/lib/docker/overlay2/839011eb0edb75c6fedb5e4a9155de2e4bd6305d233ed085d40a4e5166328736/diff:/var/lib/docker/overlay2/3dc3a1f4d37b525f672196b55033c6b82e4ddfc20de16aa803c36fe2358bcb32/diff:/var/lib/docker/overlay2/6e11b07d20377b78ee134a037fc6e661364d273d861419eb77126d0d228abbf0/diff:/var/lib/docker/overlay2/f8c5f20e6ebd1ec759101d926a5101a36ef2378af828ef57a0f8e4a8a467f76f/diff:/var/lib/docker/overlay2/77a101af01c69427ced57be20f01d4a6a688ff2b13d50260be7a7fda1bd7fbf5/diff",
                "MergedDir": "/var/lib/docker/overlay2/28f3e078ee98a327274c59c653858559ad81b866a40e62a70cca989ee403f2a6/merged",
                "UpperDir": "/var/lib/docker/overlay2/28f3e078ee98a327274c59c653858559ad81b866a40e62a70cca989ee403f2a6/diff",
                "WorkDir": "/var/lib/docker/overlay2/28f3e078ee98a327274c59c653858559ad81b866a40e62a70cca989ee403f2a6/work"
            },
            "Name": "overlay2"
        },
        "RootFS": {
            "Type": "layers",
            "Layers": [
                "sha256:02571d034293cb241c078d7ecbf7a84b83a5df2508f11a91de26ec38eb6122f1",
                "sha256:270f934787edf0135132b6780cead0f12ca11690c5d6a5d395e44d290912100a",
                "sha256:8d267010480fed7e616b9b7861854042aad4ef5e55f8771f2c738061640d2cb0",
                "sha256:ea9703e9d50c6fdd693103fee05c65e8cc25be44c6e6587dd89c6559d8df2de7",
                "sha256:69d3f4708a57a9355cf65a99274e6b79788a052564c4fb0fd90f5283c109946a",
                "sha256:d18953dc7e1eef0e19b52db05c2ff34089e9f1166766c8f57b8475db5a3c79b8",
                "sha256:f1ce2d9ca96cc9cd13caab945986580eae2404e87d81b1b485b12ee242c37889",
                "sha256:aeb58c1f315c5baacbe4c2db1745dec548753197e2b251a958704addfd33a8c2"
            ]
        },
        "Metadata": {
            "LastTagTime": "2019-06-19T02:46:38.5547836Z"
        }
    }
]

You can see all the layers are stored in /var/lib/docker/overlay2 you can use the dangling layer id to locate how much space is now wasted on your hard drive. You can remove these dangling layers with:

docker rmi $(docker images -f dangling=true)

These dangling images can eat up tons of space on your build machine. So you need to automate the cleanup process to avoid wasting space.

Learning Docker Image Layers and Cache Best practices

If you Google Dockerfile or learning docker you will be assaulted with lots of examples of Dockerfiles to run on your environment.    Many are missing the basic understanding of how Dockerfile operates.  It’s laying technology and cache provide a host of best practices to consider when building your ideal state.  

Layers:

Each layer of a container is readonly except the final layer which is applied during the docker run command.   In older versions of Docker it was critical to minimize the layers to ensure performance. Layers are added by the following commands:

  • RUN, COPY, ADD, FROM

All other commands just create intermediate images which are thrown away post build.  You can also use multi-stage builds to just copy the required artifacts into the end image.   A few examples to illustrate the impact of layers:

First start with a simple Dockerfile

FROM
ubuntu:latest

Create an image from this file:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/1 : FROM ubuntu:latest

 —> 7698f282e524

Successfully built 7698f282e524

Successfully tagged test:latest

We have a single step and that means only one layer and that layer became our final image.  Time to add one more layer:

FROM ubuntu:latest

RUN echo “Test”

Creating the image we now have two steps and two layers:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/2 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/2 : RUN echo “Test”

 —> Running in 7f4aba5459b1

Test

Removing intermediate container 7f4aba5459b1

 —> 57fda831491f

Successfully built 57fda831491f

Successfully tagged test:latest

I created a number of zero-byte files using touch

touch 1 2 3 4 5

Adding these one at a time using ADD or COPY creates multiple layers:

FROM ubuntu:latest

RUN echo “Test”

COPY 1 .

ADD 2 .

ADD 3 .

ADD 4 .

ADD 5 .

Building the image:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/7 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/7 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/7 : COPY 1 .

 —> 1025060f36d4

Step 4/7 : ADD 2 .

 —> 35cff57055a1

Step 5/7 : ADD 3 .

 —> 0357c97e0c37

Step 6/7 : ADD 4 .

 —> 389612774b90

Step 7/7 : ADD 5 .

 —> de67547a97df

Successfully built de67547a97df

Successfully tagged test:latest

We now have seven layers of images.  These statements can be consolidated down to reduce the layers.  For this example I will only consolidate 4 and 5. 

FROM ubuntu:latest

RUN echo “Test”

COPY 1 .

ADD 2 .

ADD 3 .

ADD 4 5 /

Build image:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/6 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/6 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/6 : COPY 1 .

 —> Using cache

 —> 1025060f36d4

Step 4/6 : ADD 2 .

 —> Using cache

 —> 35cff57055a1

Step 5/6 : ADD 3 .

 —> Using cache

 —> 0357c97e0c37

Step 6/6 : ADD 4 5 /

 —> 856f9a3a90d8

Successfully built 856f9a3a90d8

Successfully tagged test:latest

As you can see we have one less inbetween layer by combining the last two.   Many of the layers were pulled from cache (didn’t change).   When we use a COPY command we have to be careful because the cache will expire if the file changes.   I am going to add the text “hello” to the file 1 that is being added via COPY.   Notice the impact on the other layers:

docker build -t test .

Sending build context to Docker daemon   5.12kB

Step 1/6 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/6 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/6 : COPY 1 .

 —> 2e9f2b068ab4

Step 4/6 : ADD 2 .

 —> 7a8132435424

Step 5/6 : ADD 3 .

 —> d6ced004f0e1

Step 6/6 : ADD 4 5 /

 —> 1b2b9be67d0f

Successfully built 1b2b9be67d0f

Successfully tagged test:latest

Notice that every layer after 3 cannot be built from cache because the COPY file has changed invalidating all later layers.  For this reason, you should place COPY and ADD lines to the end of a Dockerfile.  Building the layers is an expensive time-consuming operation so we need to limit the number of layers that change.   The best version of this Dockerfile is this:

FROM ubuntu:latest

RUN echo “Test”

COPY 1 2 3 4 5 /

I combined add and copy because neither was doing something different (use COPY when it’s a local file / ADD when it’s remote or a tar archive).   When you build the image you have the least amount of layers:

docker build -t test .

Sending build context to Docker daemon  5.632kB

Step 1/3 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/3 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/3 : COPY 1 2 3 4 5 /

 —> 0f500aea029d

Successfully built 0f500aea029d

Successfully tagged test:latest

Now we only have three layers doing the same thing as seven before.  

Cache:

In the previous section we demonstrated how cache gets used but it’s important to understand what type of actions trigger a rebuild instead of cache usage:

  • All cached layers are invalidated if the higher up layer is considered changed (cascaded down)
  • Change in RUN instructions force a invalid cache (RUN apt-get install bob -y and RUN apt-get install bob -yq force a rebuild)
  • For ADD and COPY the contents of files are examined against checksum and last-accessed and modified times are considered to trigger an invalidation of cache
  • Only RUN, COPY, ADD create layers all others create temporary intermediate images

This list illustrates one of the largest problems with cache.   Using Ubuntu:latest will change depending on the current latest version but if you have it cached it will not be updated from the repository.   RUN commands that have not had a syntax change will not be updated.  For example if you have the following in your Dockerfile:

RUN apt-get upgrade -qy

On the first run will executive that command on the container and cache the output layer.   This is a point in time cached layer.   If you run upgrade a week from today the image should change yet because it’s a cached layer you don’t get the new updates.   This is the danger of the cache.   You can force a rebuild of cache layers with:

–no-cache

One command that can help you understand the inner workings of your docker images is the history command:

docker history test

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT

5b55f21f1701        21 minutes ago      /bin/sh -c #(nop) COPY multi:3b9dfb231e0b141…   12B                

75ac3bfbeaba        21 minutes ago      /bin/sh -c echo “Test”                          0B                 

7698f282e524        4 weeks ago         /bin/sh -c #(nop)  CMD [“/bin/bash”]            0B                 

<missing>           4 weeks ago         /bin/sh -c mkdir -p /run/systemd && echo ‘do…   7B                 

<missing>           4 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                 

<missing>           4 weeks ago         /bin/sh -c set -xe   && echo ‘#!/bin/sh’ > /…   745B               

<missing>           4 weeks ago         /bin/sh -c #(nop) ADD file:1f4fdc61e133d2f90…   69.9MB             

Learning Docker create your own micro-image

In the last article I wrote about how you can create your own small image using docker scratch image. The scratch image has the ability to execute basic binary files. I assume that you will have some code base that is then compiled to inserted into the scratch image. In order to do this, you can have a build machine that you maintain to create Linux executables, or you can use another docker image to create the binary and copy it to the scratch image. This is known as a multi-stage build that produces the smallest possible end state container. This whole process can be done in a single Dockerfile. Let’s start with a basic c program that prints out Hello from Docker when executed:

#include <stdio.h>
  
int main() {
    printf("Hello from Docker\n");
    return 0;
}

This should be saved in the current director as hello.c. We then need to build a machine with gcc to compile the c program into a binary. We will call this machine builder. The Dockerfile for builder looks like this:

FROM ubuntu:latest AS builder
# Install gcc
RUN apt-get update -qy
RUN apt-get upgrade -qy
RUN apt-get install build-essential -qy
COPY hello.c .
# Build binary saved as a.out
RUN gcc -o hello -static hello.c

This does the following:

  • Use ubuntu:latest as the image
  • RUN the commands to update and upgrade base operating system (-qy is to run quiet (-q) and answer yes (-y) to all questions)
  • RUN the command to install build-essential which includes the gcc binary and libraries
  • COPY the file hello.c from the local file system into current directory
  • RUN gcc to compile hello.c into hello – This step is critical because we are using the compiler to include all required libraries with the static line without this the executable will fail while looking for a dynamically liked library

Let’s manually build this container to test the static linking using a small docker file:

FROM ubuntu:latest

Now let’s turn this into a container and test our commands to ensure we have the correct commands and order to create our builder container:

docker build -t builder .

This will build the container image called builder from ubuntu:latest from docker hub. Now lets run an instance of this container and give it a try.

docker run -it builder /bin/bash

You are now connected to the container and you can test all your commands to ensure they work

apt-get update -qy
apt-get upgrade -qy
apt-get install build-essential -qy
#We cannot run the next command we need to copy the code using vi so we will install vi only in our test case
apt-get install vim -qy
#Copy the contents of hello.c into file named hello.c
#COPY hello.c .
# Build binary saved as a.out
gcc -o hello hello.c

Let’s check if hello has dependancies on dynamic linked libraries:

root@917d6b3c9ea9:/# ldd hello
	linux-vdso.so.1 (0x00007ffc35dbe000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fa76c376000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa76c969000)

As you can see it has dynamically linked libraries those will not work in scratch because they will not exist. Lets static link them using this command:

gcc -o hello -static hello.c
root@917d6b3c9ea9:/# ldd hello
	not a dynamic executable

As you can see making sure we are not dynamically linking executables is critical. Now we know we have a working builder we can just take the executable and copy it to the scratch container for a very small container job. As you can see this process could be used to make very fast acting functions as a service on demand.

FROM scratch
# Copy our static executable.
COPY --from=builder hello /
# Run the hello binary.
ENTRYPOINT ["/hello"]

This takes the hello binary from the builder and puts it into our final image. Put them together in a single Dockerfile like this:

FROM ubuntu:latest AS builder
# Install gcc
RUN apt-get update -qy
RUN apt-get upgrade -qy
RUN apt-get install build-essential -qy
#COPY the hello.c file from OS
COPY hello.c .
# Build the binary.
RUN gcc -o hello -static hello.c
FROM scratch
# Copy our static executable.
COPY --from=builder hello /
# Run the hello binary.
ENTRYPOINT ["/hello"]

Build the container which we will call csample using this command:

docker build -t csample .

Sending build context to Docker daemon  3.584kB
Step 1/9 : FROM ubuntu:latest AS builder
 ---> 7698f282e524
Step 2/9 : RUN apt-get update -qy
 ---> Using cache
 ---> 04915027a821
Step 3/9 : RUN apt-get upgrade -qy
 ---> Using cache
 ---> 998ea043503f
Step 4/9 : RUN apt-get install build-essential -qy
 ---> Using cache
 ---> e8e3631eaba6
Step 5/9 : COPY hello.c .
 ---> Using cache
 ---> 406ad6aafe8f
Step 6/9 : RUN gcc -o hello -static hello.c
 ---> Using cache
 ---> 3ebd38451f71
Step 7/9 : FROM scratch
 ---> 
Step 8/9 : COPY --from=builder hello /
 ---> Using cache
 ---> 8e1bcbc0d012
Step 9/9 : ENTRYPOINT ["/hello"]
 ---> Using cache
 ---> 5beac5519b31
Successfully built 5beac5519b31
Successfully tagged csample:latest

Try starting csample with docker:

docker run csample
Hello from Docker

As you can see we have now used a container to build the executable for our container.

Learning Docker creating your own base image

Docker images are compiled in layers using a set of instructions contained in a text file called Dockerfile. Every container image starts with a base in many cases this base image is pulled from Dockerhub or your own repository. When creating your own base image you have two choices build one or use scratch.

Scratch

Scratch is build into docker and is provided as a minimal linux environment that cannot do anything. If you have a compiled binary that will work in the container scratch may be a perfect minimal container. Do not expect scratch to have a package manager or even command line. For our example lets assume we have a basic c program called hello-world and is compiled:

FROM Scratch
ADD hello /
CMD ["/hello"]

This would start the container run the executable and end the container. My base container size with scratch alone is 1.84 kilobytes.

Building your own Image

Building your own image starts with install of the target operating system. Since Redhat and Ubuntu seem to be the most common operating systems available today I’ll provide instructions for both. In these instructions it’s possible to build minimal containers without package managers but these are multi-purpose base images. In both cases the process installs a minimal version of the operating system in a subdirectory then compiles the docker image from this directory.

Ubuntu

Debian based systems make it really easy with the debootstrap command which is installed by default on Ubuntu. We will setup the image using Ubuntu 19.04 Disco Dingo.

sudo debootstrap disco disco > /dev/null
sudo tar -C disco -c . | docker import - disco

You now have a docker image called disco that is a minimal Ubuntu 19.04.

docker images

Redhat / Centos

I’ll use centos since I don’t personally own any Redhat licenses but the process is exactly the same. This will build the same version of OS you are currently running. You will need to change the RPM-GPG-KEY with your version of Centos. I read about the CentOS process in this article.

# Create a folder for our new root structure
export centos_root='/image/rootfs'

mkdir -p $centos_root

rpm --root $centos_root --initdb

yum reinstall --downloadonly --downloaddir . centos-release

rpm --root $centos_root -ivh --nodeps centos-release*.rpm

rpm --root $centos_root --import  $centos_root/etc/pki/rpm-gpg/RPM-GPG-KEY-CentOS-7

yum -y --installroot=$centos_root --setopt=tsflags='nodocs' --setopt=override_install_langs=en_US.utf8 install yum  

sed -i "/distroverpkg=centos-release/a override_install_langs=en_US.utf8\ntsflags=nodocs" $centos_root/etc/yum.conf

cp /etc/resolv.conf $centos_root/etc

mount -o bind /dev $centos_root/dev

#Enter the file system for our image to do a yum clean (remove cached stuff) type exit to leave the chroot 
chroot $centos_root /bin/bash 

# Run this command then type exit to leave the chroot
yum clean all

rm -f $centos_root/etc/resolv.conf

umount $centos_root/dev

#Create the docker image
tar -C $centos_root -c . | docker import - centos

You now have a image called centos.

Put it together

Building your own images assures that no one can put something into your image that is unexpected. Scratch is a great way to run very minimal containers that are very small. If you need a fuller operating system you can use Ubuntu or CentOS.

Installing Docker on Linux

There are literally hundreds of guides on the internet to install Docker on Linux. I wanted to provide brief guides on how to install on CentOS and Ubuntu. You can always download the latest guides from docs.docker.com. This guide will provide the method to install the community edition of docker. In some cases you might want to install the vendor provided version (Redhat) in that case use your vendors recommendations.

Install Docker CE on CentOS (RedHat)

First install dependencies:

sudo yum install -y yum-utils device-mapper-persistent-data lvm2

Then add the repository for Docker CE:

sudo yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo

Now install the server, cli and drivers

sudo yum install docker-ce docker-ce-cli containerd.io

Start up the server

sudo systemctl start docker

Enable server to start at boot time

sudo systemctl enable docker

Installing Docker CE on Ubuntu

First install dependencies:

sudo apt-get install apt-transport-https ca-certificates curl software-properties-common

Then add the repository for Docker CE:

curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add –
sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu  $(lsb_release -cs)  stable" 

Now install the server, cli and drivers

sudo apt-get install docker-ce

Start up the server

sudo systemctl start docker

Enable server to start at boot time

sudo systemctl enable docker

Testing Docker

Test docker by checking the installed version

docker --version

and running a basic container

docker run hello-world
Hello world allows you to review if docker is working

Architecture of Docker

I have been spending a lot of my free time the last few months learning Kubernetes. Currently most implementations of Kubernetes use Docker as their container runtime. I wanted to share some of my knowledge gained as I learned. Since I claim to be a architecture I wanted to share the basic architecture of Docker.

What is a container?

It is a segmented process that contains only the required elements to complete it’s expected job. While a normal operating system has many libraries available to make it flexible container only has the required runtime and libraries to do it’s function. This reduced scope makes containers small and independent from operating systems. The segmentation is enforced by the container server. The container server runs as a process on another operating system.

Architecture of Docker

Docker is a server that runs a process called dockerd. This server provides a REST API for the creation, management and running of containers. For ease of management docker provides the docker command line interface to interact with the REST API. There is a company called Docker that provide a supported version of Docker called Docker Enterprise. Most people seem to use Docker community edition which is licensed under the Apache 2.0 license.

What is a registry?

Registry is a place to store container images. Docker maintains Docker Hub a huge public registry. Anyone can provide a image to Docker Hub allowing anyone else to consume it. Many companies choose to use a private registry to protect their company data and applications. Docker has two functions for registry a push and a pull:

  • Push – sends a local image to the registry
  • Pull – asks for the image to be stored locally

What is a docker image?

Docker images are built using layers and are read-only. Each layer in an image could be based upon a previous image or some unique customization. Images are compiled sets of instructions stored in a file called Dockerfile.

Basic Dockerfile

This Dockerfile defines a basic image that does nothing but ping google.com forever. When compiled this image has three layers:

Layer 1: FROM ubuntu:latest

  • Use the ubuntu base operating system with the tag of latest

Layer 2: RUN apt-get update -q && apt-get install -qy iputils-ping

  • Execute the command listed above that updates the operating system and installs iputils-ping

Layer 3: CMD [“ping”, “google.com”]

  • Run the command ping google.com forever

Once compiled this new image can be uploaded to a repository as a new container.

What is a container?

It is a runable image. They can be stored locally or in a remote repository. Once you start running an image is becomes a unique container and writable. All changes are unique to that instance of the container and not changed on the image. You can spawn hundreds or thousands of containers from a single image.

What about isolation?

Isolation is critical otherwise the container is just a process on an operating system. This isolation in docker is provided by three things:

  • namespaces – makes a container look and feel like a separate machine
  • cgroups – A way to group processes together and apply resource limits
  • capabilities – superuser privileges that can be enabled or disabled for a process

So cgroups are used to group together processes into namespaces. Namespaces creates isolated instances of different resources like network etc.. This provided the impression of being isolated.

What about networking?

For containers to talk to the outside world is critical networking is implemented along with the other seven namespaces as part of Docker. Initial docker networking was very limited. As an active open source project it continues to get better. I will skip the deep dive on Docker networking since it is mostly not part of Kubernetes.

Why do I care?

An honest question. Containers enable very rapid deployment of new code. They allow the implementation of micro-services which in turn should improve the rate of new features in code. So it’s really about speed. A simple comparison is the fact that I could setup this wordpress blog in 15 seconds with docker should help you understand the speed capabilities.

Imperative vs Declarative IT

This seems to come up a lot in discussions so I wanted to provide my view on the differences. Imperative is focused on the steps required to meet an outcome. Declarative is focused on defining the end state without understanding of the steps. To illustrate the differences I like to use visuals.

Imperative

In the imperative model we assemble our lunch by assembling the various components together. In this model we can have many specialists involved to assure we have the best product in our lunch. We have a cheese specialist ensuring awesome cheese. We have a meat specialist choosing prime cuts. When someone comes to the bar to assemble their lunch chaos becomes reality. I may not fully understand the flavors provided by the specialist and choose to assemble a mess. If I include every specialist in the assembly I am likely to get a great sandwich but this process cannot scale. The imperative model thus is focused on the individual steps to produce an outcome.

Declarative

In the declarative model the end state is defined and the system is trusted to produce the outcome. The meal above represents my request for a good dinner. I was not concerned with platting or cooking I just wanted a meal.

Why should you care about my lunch/dinner?

Allow me to illustrate the value of declarative models in detail. Let us assume you have two switches and two routers in your network:

In imperative models we are hyper focused on the steps to make this networking construct redundant. It may involve linking switches and using BGP or OSPF to ensure optimal paths. This added human complexity provides a job for many people. Now lets examine the functional differences between two options:

Or

Functionally assuming the user can detect upstream failures there is no difference. You avoid a single point of failure and communication continues. In a declarative model all we would need to define is IP method to get from point user to router and no single point of failure. Kubernetes implements a declarative model that creates initial state and ensures that desired state continues until changed which yields the real power of declarative models. For example lets look at this application definition:

apiVersion: extensions/v1
 kind: Deployment
 metadata:
   name: site
 spec:
   replicas: 2
   template:
     metadata:
       labels:
         app: web
     spec:
       containers:
         - name: front-end
           image: nginx
           ports:
             - containerPort: 80
         - name: reader
           image: nginx
           ports:
             - containerPort: 88

This declarative yaml creates four pods in a deployment (2 front-end, 2 reader). If you manually remove one of these pods a new pod is redeployed to ensure the declarative state exists. So when we implement declarative models we can ensure desired state long past imperative models.

How to recover a manually deleted worker node in Enterprise PKS

One of the most powerful features of Enterprise PKS is its capability to be desired state management for Kubernetes clusters.   This capability is provided in part by BOSH.  A simple node failure like a kubelet agent or power issue can be automatically recovered by the PKS system. You can simulate this recovery by powering off a worker node in vSphere. I wanted to push the limits of the PKS system by manually deleting a worker node and see what happens. I have to provide a caution before I begin:

Caution: DON’T MANUALLY DELETE ANY NODES MANAGED BY PKS. DELETING THE MASTER NODES MAY RESULT IN DATA LOSS.

Enterprise PKS automatically removes worker nodes that have failed as part of its desired state management.   Enterprise PKS is a full platform management suite for Kubernetes based workloads.  Operators should not manually modify Kubernetes constructs inside vSphere.   While testing the desired state management capabilities of Enterprise PKS we ran into a slight problem if you manually delete a worker node.   Manually deleting a worker node creates a situation where Enterprise PKS is unable to recover without manual intervention.  

Start out with a healthy three node cluster:

root@cli-vm:~/PKS-Lab# kubectl get nodes
NAME                                   STATUS   ROLES    AGE    VERSION
55b8512f-7469-4562-90c1-e4f133cd333a   Ready    <none>   19m    v1.12.4
9c8f3f5c-c9d8-478d-9784-a13b3a128dbe   Ready    <none>   11m    v1.12.4
c14736b9-2b54-484c-b783-a79453e28804   Ready    <none>   166m   v1.12.4

Locating a worker node, we powered it off and delete it after confirming twice that we want to take this action against BOSH.   Inside Kubernetes there is a problem:

root@cli-vm:~/PKS-Lab# kubectl get nodes
NAME                                   STATUS   ROLES    AGE   VERSION
55b8512f-7469-4562-90c1-e4f133cd333a   Ready    <none>   21m   v1.12.4
9c8f3f5c-c9d8-478d-9784-a13b3a128dbe   Ready    <none>   13m   v1.12.4

We are missing a node.   Normally we would expect a replacement node to be deployed by BOSH after the five-minute timeout.   In this case BOSH will not recreate the node no matter how long you wait.  The failure to automatically resolve the situation is caused because each worker node has a persistent volume attached.   When BOSH replaced a powered off worker node it detaches the persistent storage volume before deleting the virtual machine.   The detached volume is then mounted to the new node.   The persistent volume is not required for Kubernetes worker nodes but more an artifact of how BOSH operates.   BOSH will not recreate the deleted node because it is concerned about data loss on persistent volume.    You can safely manually deploy a new worker node using BOSH commands.   If you remove storage from the powered off worker before you delete it BOSH will automatically deploy a new worker node. 

Process to manually deploy a deleted persistent volume

Since BOSH is responsible for the desired state management of the cluster you use BOSH command to recreate the deleted volume and node.  

Gather the bosh Uaa Admin User Credentials

  • Login to Opsman via the web console
  • Click on the BOSH tile
  • Click on credentials tab
  • Locate the Uaa Admin User Credentials
  • Click on get credentials
  • Cut and paste the password section in my case it’s HYmb4WAuvnWuGLzmAFoSTlrSv4_Qj4Vk

Resolve using the Opsman virtual machine and BOSH commands

  • Use ssh to login to Opsman virtual machine as the user ubuntu
  • Create a new alias for the environment using the following command on a single line(replace the ip address with the ip address or DNS name for your PKS server)
bosh alias-env pks -e 172.31.0.2 --ca-cert /var/tempest/workspaces/default/root_ca_certificate

Using environment '172.31.0.2' as anonymous user

Name      p-bosh
UUID      ee537142-1370-4fee-a6c2-741c0cf66fdf
Version   268.2.1 (00000000)
CPI       vsphere_cpi
Features  compiled_package_cache: disabled
          config_server: enabled
          local_dns: enabled
          power_dns: disabled
          snapshots: disabled
User      (not logged in)

Succeeded
  • Use BOSH and the alias to login to the PKS environment using the Username: admin Password: Uaa Admin User Credentials
bosh -e pks login

Email (): admin
Password ():

Successfully authenticated with UAA

Succeeded

Use BOSH commands to locate current deployments:
bosh -e pks deployments
  • Identify your failed deployment using the deployments command (you need the service name)
ubuntu@opsman-corp-local:~$ bosh -e pks deployments
Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Name                                                   Release(s)                               Stemcell(s)                                      Team(s)
harbor-container-registry-99b2c77d387b6caae53b         bosh-dns/1.10.0                          bosh-vsphere-esxi-ubuntu-xenial-go_agent/97.52   -
                                                       harbor-container-registry/1.6.3-build.3
pivotal-container-service-bf45f9e2177d5da24998         backup-and-restore-sdk/1.8.0             bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.15  -
                                                       bosh-dns/1.10.0
                                                       bpm/0.13.0
                                                       cf-mysql/36.14.0
                                                       cfcr-etcd/1.8.0
                                                       docker/33.0.2
                                                       harbor-container-registry/1.6.3-build.3
                                                       kubo/0.25.8
                                                       kubo-service-adapter/1.3.0-build.129
                                                       nsx-cf-cni/2.3.1.10693410
                                                       on-demand-service-broker/0.24.0
                                                       pks-api/1.3.0-build.129
                                                       pks-helpers/50.0.0
                                                       pks-nsx-t/1.19.0
                                                       pks-telemetry/2.0.0-build.113
                                                       pks-vrli/0.7.0
                                                       sink-resources-release/0.1.15
                                                       syslog/11.4.0
                                                       uaa/64.0
                                                       wavefront-proxy/0.9.0
service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e  bosh-dns/1.10.0                          bosh-vsphere-esxi-ubuntu-xenial-go_agent/170.15  pivotal-container-service-bf45f9e2177d5da24998
                                                       bpm/0.13.0
                                                       cfcr-etcd/1.8.0
                                                       docker/33.0.2
                                                       harbor-container-registry/1.6.3-build.3
                                                       kubo/0.25.8
                                                       nsx-cf-cni/2.3.1.10693410
                                                       pks-helpers/50.0.0
                                                       pks-nsx-t/1.19.0
                                                       pks-telemetry/2.0.0-build.113
                                                       pks-vrli/0.7.0
                                                       sink-resources-release/0.1.15
                                                       syslog/11.4.0
                                                       wavefront-proxy/0.9.0
  • There are three deployments listed on my system (PKS management, Harbor, PKS cluster)  we will be using service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e which is the PKS cluster with a deleted node
  • Review the virtual machines involved in the service instance:
ubuntu@opsman-corp-local:~$ bosh -e pks -d service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e vms
Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Task 6913. Done

Deployment 'service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e'

Instance                                     Process State  AZ        IPs         VM CID                                   VM Type  Active
master/e24ccbc1-8b3b-460c-9162-7199d4d67674  running        PKS-COMP  172.15.0.2  vm-2ca9f83c-8d80-4e92-a1d5-ff0b3446c624  medium   true
worker/22be3ec4-7eae-4370-b6cc-d59bd7071f01  running        PKS-COMP  172.15.0.3  vm-bad946f5-5b51-40f4-acd4-29bcf3ad7e6a  medium   true
worker/35026d4b-fb24-4b05-8f33-a71dbebf03e7  running        PKS-COMP  172.15.0.4  vm-b54970b7-1984-4d74-9285-48d28f308c0b  medium   true
  • BOSH is aware of three total nodes one master and two workers our expect state is three worker nodes
  • Running a BOSH consistency check allows us to clean out the persistent disk metadata
ubuntu@opsman-corp-local:~$ bosh -e pks -d service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e cck
Using environment '172.31.0.2' as user 'admin' (bosh.*.read, openid, bosh.*.admin, bosh.read, bosh.admin)

Using deployment 'service-instance_84bc5c87-e480-4b17-97bc-afed45ab4a6e'

Task 6920

Task 6920 | 19:26:07 | Scanning 4 VMs: Checking VM states (00:00:18)
Task 6920 | 19:26:25 | Scanning 4 VMs: 3 OK, 0 unresponsive, 1 missing, 0 unbound (00:00:00)
Task 6920 | 19:26:25 | Scanning 4 persistent disks: Looking for inactive disks (00:00:38)
Task 6920 | 19:27:03 | Scanning 4 persistent disks: 3 OK, 1 missing, 0 inactive, 0 mount-info mismatch (00:00:00)

Task 6920 Started  Tue May 21 19:26:07 UTC 2019
Task 6920 Finished Tue May 21 19:27:03 UTC 2019
Task 6920 Duration 00:00:56
Task 6920 done

#   Type          Description
48  missing_vm    VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing.
49  missing_disk  Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing

2 problems

1: Skip for now
2: Recreate VM without waiting for processes to start
3: Recreate VM and wait for processes to start
4: Delete VM reference
VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing. (1): 4

1: Skip for now
2: Delete disk reference (DANGEROUS!)
Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing (1): 2

Continue? [yN]: y

Task 6928

Task 6928 | 19:29:49 | Applying problem resolutions: VM for 'worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2 (2)' missing. (missing_vm 13): Delete VM reference (00:00:00)
Task 6928 | 19:29:49 | Applying problem resolutions: Disk 'disk-0500a7de-10e2-414c-8b19-091147c58a98' (worker/8eef54b7-eef0-4d95-b09d-8aeb551846c2, 102400M) is missing (missing_disk 6): Delete disk reference (DANGEROUS!) (00:00:07)

Task 6928 Started  Tue May 21 19:29:49 UTC 2019
Task 6928 Finished Tue May 21 19:29:56 UTC 2019
Task 6928 Duration 00:00:07
Task 6928 done
  • The process requires that we delete the entry for the worker node and the missing disk.   Notice the big warning around data loss when deleting a volume.   In this case we are deleting BOSH metadata because the volume is already gone.  

Once the BOSH metadata is removed it will automatically deploy a new worker node and join it to the cluster.    Enterprise PKS is flexible enough to handle normal operational tasks of managing and scaling Kubernetes in the enterprise while ensuring you don’t loose data.  

Thanks to Matt Cowger from Pivotal for helping with the recovery process.