Using Docker for Reproducible Build Environments



While most people think of Docker as a deployment environment, I’ve found that it’s a wonderful tool to interactively create and maintain build environments as well. Maintaining build machines for applications is not only challenging, but also time-consuming. The process usually begins with someone creating a Linux build VM, and then installing all the required tools and libraries needed to compile the application or interpreter as needed. While the initial exercise is relatively painless, the subsequent management becomes a nightmare fairly quickly. Someone comes up with a change request for a library or a module in the system, and then the build environment needs to be backed up, cloned, modified, and a build created and tested before the old one can be let go—if at all. And since legacy application versions are still being maintained, the older machines need to live on as well.

Things quickly get complicated when we have multiple teams adding things to different copies of the initial build environment. Soon we have versions of code tied to specific variants of the build box, and converging code means converging the modified build environments as well.

Another problem I’ve been trying to solve is determining what exactly is installed on a given build machine, in terms of libraries, modules etc. Since the machine is created and maintained by different developers over time, it becomes extremely difficult to audit.

Not just deployment

At work, this kind of build machine maintenance simply wasn’t sustainable, and we started looking for answers in our newfound love for Docker. We had recently moved our application to Docker. Although we had migrated our cloud application over to Docker, we were still using our existing build environments to create some artifacts needed for the Docker build, and we decided it was time to clean up our build workflow.

For the record, we use Python in some of our application containers, and we have some specific requirements to use custom-built Python linked against custom-built OpenSSL. We also use some homegrown Python modules written in C, which made our build environment a little tricky to create.

The typical process for creating our build machine is this:

  1. Create a VM with the required OS (Ubuntu 14.04 in our case).
  2. Install the compiler tools and libraries needed to compile OpenSSL from sources.
  3. Compile OpenSSL and install it at a custom location.
  4. Install the required libraries and -dev packages needed to build Python from sources.
  5. Compile Python from sources, pointing it to our custom-installed OpenSSL.
  6. Compile all needed Python modules and dependent libraries that are linked against OpenSSL, again pointing it to our custom OpenSSL.
  7. Build the application and verify that it is correctly linked to our custom OpenSSL.

Along the way, some libraries are installed via sources, some pre-compiled and some via the system packages. Doing this manually each time is error-prone. If you make a mistake somewhere early in the build creation phase, and you discover it later, reverting back all changes to fix it is sometimes impossible. It’s an interactive process, and once you get it right, you could potentially script it completely—but getting it right takes time.

Building using Docker

Although Docker is technically a deployment environment and a toolset around that, it’s almost tailor-made to interactively create build environments. Docker images are typically built using a file that acts as a recipe of sorts to create the image. Each line in the Dockerfile translates to image metadata or one filesystem layer in the final Docker image. Here’s what a simple Dockerfile looks like:

FROM ubuntu:14.04
RUN apt-get update
RUN apt-get install -y python python-pip
RUN pip install cherrypy

You start from a base machine layer, and you get to control everything that goes into it. Each line in the Dockerfile is potentially a layer in the Docker image. (Please see the Dockerfile documentation for all Dockerfile directives.) If you have spent some time writing Dockerfiles, this will appear to be a rather inefficient Dockerfile. But for a build environment, it is actually more efficient. An optimized Dockerfile would typically roll up multiple RUN lines into a single RUN directive to reduce the number of layers in your final Docker image, but in our case, we don’t care about the number of layers in our built image because we don’t ever ship it. Instead, we put all the tools and libraries we need to actually build the application into the Docker image and use a container spawned from that image as our build machine to build and produce the compiled artifacts.

The best part is that you can create the Dockerfile line by line, testing it at each phase, without incurring the cost of redoing what’s already done, thanks to Docker’s fantastic caching mechanism. Each time you add a new line to the Dockerfile, you can fire a Docker build. If you haven’t changed the lines before, Docker will typically use the last built cached filesystem layers for all but the last line of the Dockerfile.  If you do make  a mistake in one of the earlier lines of the Dockerfile that needs changing, you can simply go and edit the line and re-trigger the build. Because Docker checks the file against the image cache, line by line, the line that changed will force Docker to re-execute all lines after the change, effectively invalidating the cache from the changed line.

Build vs. runtime containers

Since our build requirements pulled in lots of libraries and development tools which we didn’t need when actually running the application, our build and runtime environments became two separate container images altogether. The runtime didn’t need any of the development headers for libraries, or even a compiler for that matter. It’s a lot lighter than the build container. Here’s what a simplified version of our build environment Dockerfile looks like:

FROM ubuntu:14.04
ENV HOME /root
ENV DEBIAN_FRONTEND noninteractive
ENV TERM=xterm
ENV CONFIGURE_OPTS=”CFLAGS=-I/usr/local/ssl/include CPPFLAGS=-I/usr/local/ssl/include LDFLAGS=-L/usr/local/ssl/lib –enable-shared”
ENV PYENV_ROOT=”/usr/local/pyenv”
ENV PYENV_BUILD_ROOT=”/usr/local/src”
ENV PATH=”/usr/local/pyenv/bin:$PATH”
WORKDIR /usr/local/src
RUN apt-get -qqy update
RUN apt-get -qqy install wget build-essential git libreadline-dev
RUN apt-get -qqy install libz-dev libbz2-dev libjpeg-dev
RUN cd /usr/local/src/openssl &&\
wget -qO –$OPENSSL_VERSION.tar.gz | tar -zxf – &&\
cd openssl-$OPENSSL_VERSION &&\
./config shared –openssldir=$OPENSSL_INSTALLDIR  &&\
make depend && make all && make install
RUN CFLAGS=-I/usr/local/ssl/include \
CPPFLAGS=-I/usr/local/ssl/include \
LDFLAGS=-L/usr/local/ssl/lib \
LD_LIBRARY_PATH=”/usr/local/pyenv/versions/2.7.8/lib” \
pyenv install 2.7.8
RUN . /usr/local/pyenv.vars && \
cd /bld && pip install -r requirements.txt

At the end of the Docker build, inside the image, /usr/local/ssl should contain a custom-built openssl, /usr/local/pyenv should contain a python environment with all the modules needed to compile and run the final application.

Using the built container

The next steps are to map our sources into the created build container, build any custom modules there, and then extract the built binaries out of the container. The extracted binaries can then be injected into the runtime container image while creating it.

We use something like this to build the binaries:

docker run -it -v 'pwd'/../../src:/src -v 'pwd':/out pyenv mybuildenv:sometag /src/

Here, mybuildenv is the name of the just-created build container and sometag is a timestamp passed to the Docker build command in the previous step. The script  configures and builds the binaries from sources and tars up the artifacts including our custom openssl library as well as our Python environment and drops it into /out which maps to where the docker run is invoked from. All of this is scripted as part of a top level build script.

Not just Python

Although this example looked at a Python environment, it should work for any language. Think of Docker in this case as a portable cross-platform build environment similar to a cross-compilation environment.

One big benefit we’ve reaped since integrating the temporary Python environment into the Docker build workflow is that it’s now part of the source code, and therefore trivial to track and manage using our SCM. Now when developers need to change something in the build/runtime environment, they simply make changes in their source code branches, and the builds produced from these branches work just fine. When the branches merge into the trunk, the merge for the build parts of the branch is treated just like code. We no longer need to maintain separate build boxes for this, and any machine with Docker will suffice to create a build. Plus, the build and runtime environment is now completely auditable since it’s explicitly declared in terms of Docker files and shell scripts. Our build management has never been easier.

Faisal is an engineer working with Druva. He has, over the last 17 years, worked in development, IT and operations.


Leave a Comment

Your email address will not be published. Required fields are marked *

Skip to toolbar