Learning Docker Image Layers and Cache Best practices

If you Google Dockerfile or learning docker you will be assaulted with lots of examples of Dockerfiles to run on your environment.    Many are missing the basic understanding of how Dockerfile operates.  It’s laying technology and cache provide a host of best practices to consider when building your ideal state.  

Layers:

Each layer of a container is readonly except the final layer which is applied during the docker run command.   In older versions of Docker it was critical to minimize the layers to ensure performance. Layers are added by the following commands:

  • RUN, COPY, ADD, FROM

All other commands just create intermediate images which are thrown away post build.  You can also use multi-stage builds to just copy the required artifacts into the end image.   A few examples to illustrate the impact of layers:

First start with a simple Dockerfile

FROM
ubuntu:latest

Create an image from this file:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/1 : FROM ubuntu:latest

 —> 7698f282e524

Successfully built 7698f282e524

Successfully tagged test:latest

We have a single step and that means only one layer and that layer became our final image.  Time to add one more layer:

FROM ubuntu:latest

RUN echo “Test”

Creating the image we now have two steps and two layers:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/2 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/2 : RUN echo “Test”

 —> Running in 7f4aba5459b1

Test

Removing intermediate container 7f4aba5459b1

 —> 57fda831491f

Successfully built 57fda831491f

Successfully tagged test:latest

I created a number of zero-byte files using touch

touch 1 2 3 4 5

Adding these one at a time using ADD or COPY creates multiple layers:

FROM ubuntu:latest

RUN echo “Test”

COPY 1 .

ADD 2 .

ADD 3 .

ADD 4 .

ADD 5 .

Building the image:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/7 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/7 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/7 : COPY 1 .

 —> 1025060f36d4

Step 4/7 : ADD 2 .

 —> 35cff57055a1

Step 5/7 : ADD 3 .

 —> 0357c97e0c37

Step 6/7 : ADD 4 .

 —> 389612774b90

Step 7/7 : ADD 5 .

 —> de67547a97df

Successfully built de67547a97df

Successfully tagged test:latest

We now have seven layers of images.  These statements can be consolidated down to reduce the layers.  For this example I will only consolidate 4 and 5. 

FROM ubuntu:latest

RUN echo “Test”

COPY 1 .

ADD 2 .

ADD 3 .

ADD 4 5 /

Build image:

docker build -t test .

Sending build context to Docker daemon  4.608kB

Step 1/6 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/6 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/6 : COPY 1 .

 —> Using cache

 —> 1025060f36d4

Step 4/6 : ADD 2 .

 —> Using cache

 —> 35cff57055a1

Step 5/6 : ADD 3 .

 —> Using cache

 —> 0357c97e0c37

Step 6/6 : ADD 4 5 /

 —> 856f9a3a90d8

Successfully built 856f9a3a90d8

Successfully tagged test:latest

As you can see we have one less inbetween layer by combining the last two.   Many of the layers were pulled from cache (didn’t change).   When we use a COPY command we have to be careful because the cache will expire if the file changes.   I am going to add the text “hello” to the file 1 that is being added via COPY.   Notice the impact on the other layers:

docker build -t test .

Sending build context to Docker daemon   5.12kB

Step 1/6 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/6 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/6 : COPY 1 .

 —> 2e9f2b068ab4

Step 4/6 : ADD 2 .

 —> 7a8132435424

Step 5/6 : ADD 3 .

 —> d6ced004f0e1

Step 6/6 : ADD 4 5 /

 —> 1b2b9be67d0f

Successfully built 1b2b9be67d0f

Successfully tagged test:latest

Notice that every layer after 3 cannot be built from cache because the COPY file has changed invalidating all later layers.  For this reason, you should place COPY and ADD lines to the end of a Dockerfile.  Building the layers is an expensive time-consuming operation so we need to limit the number of layers that change.   The best version of this Dockerfile is this:

FROM ubuntu:latest

RUN echo “Test”

COPY 1 2 3 4 5 /

I combined add and copy because neither was doing something different (use COPY when it’s a local file / ADD when it’s remote or a tar archive).   When you build the image you have the least amount of layers:

docker build -t test .

Sending build context to Docker daemon  5.632kB

Step 1/3 : FROM ubuntu:latest

 —> 7698f282e524

Step 2/3 : RUN echo “Test”

 —> Using cache

 —> 57fda831491f

Step 3/3 : COPY 1 2 3 4 5 /

 —> 0f500aea029d

Successfully built 0f500aea029d

Successfully tagged test:latest

Now we only have three layers doing the same thing as seven before.  

Cache:

In the previous section we demonstrated how cache gets used but it’s important to understand what type of actions trigger a rebuild instead of cache usage:

  • All cached layers are invalidated if the higher up layer is considered changed (cascaded down)
  • Change in RUN instructions force a invalid cache (RUN apt-get install bob -y and RUN apt-get install bob -yq force a rebuild)
  • For ADD and COPY the contents of files are examined against checksum and last-accessed and modified times are considered to trigger an invalidation of cache
  • Only RUN, COPY, ADD create layers all others create temporary intermediate images

This list illustrates one of the largest problems with cache.   Using Ubuntu:latest will change depending on the current latest version but if you have it cached it will not be updated from the repository.   RUN commands that have not had a syntax change will not be updated.  For example if you have the following in your Dockerfile:

RUN apt-get upgrade -qy

On the first run will executive that command on the container and cache the output layer.   This is a point in time cached layer.   If you run upgrade a week from today the image should change yet because it’s a cached layer you don’t get the new updates.   This is the danger of the cache.   You can force a rebuild of cache layers with:

–no-cache

One command that can help you understand the inner workings of your docker images is the history command:

docker history test

IMAGE               CREATED             CREATED BY                                      SIZE                COMMENT

5b55f21f1701        21 minutes ago      /bin/sh -c #(nop) COPY multi:3b9dfb231e0b141…   12B                

75ac3bfbeaba        21 minutes ago      /bin/sh -c echo “Test”                          0B                 

7698f282e524        4 weeks ago         /bin/sh -c #(nop)  CMD [“/bin/bash”]            0B                 

<missing>           4 weeks ago         /bin/sh -c mkdir -p /run/systemd && echo ‘do…   7B                 

<missing>           4 weeks ago         /bin/sh -c rm -rf /var/lib/apt/lists/*          0B                 

<missing>           4 weeks ago         /bin/sh -c set -xe   && echo ‘#!/bin/sh’ > /…   745B               

<missing>           4 weeks ago         /bin/sh -c #(nop) ADD file:1f4fdc61e133d2f90…   69.9MB             

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.