Good Dockerfile Practices - NASA-PDS/nasa-pds.github.io GitHub Wiki

These are some good practices I've¹ seen going around and they've worked for me.

Images should be small

Choose a base image carefully: do you need a full ubuntu (114MB) when debian (72.9MB) will work fine? Or better yet, consider alpine, which is one of the smallest Linux bases out there (5.61MB).

☝️ Always clean up after yourself. If you did something like:

RUN apt-get install -y maven subversion &&\
    svn checkout http://old.repo/old/code/tags/1.0.0 proj &&\
    cd proj && mvn install &&\
    cd .. && rm -r proj

and you don't need svn or mvn for the container to actually run, remove them:

RUN apt-get install -y maven subversion &&\
    svn checkout http://old.repo/old/code/tags/1.0.0 proj &&\
    cd proj && mvn install &&\
    cd .. && rm -r proj &&\
    apt-get purge -y --auto-remove subversion maven && rm -rf /var/lib/apt/lists/*

☝️ Use as few RUN statements as possible. Each RUN adds a filesystem layer to the image, increasing its size. So instead of:

RUN cp ${PWD}/override.xml ./pom.xml
RUN mvn clean install
RUN cp target/whatever.jar ${WEBAPPS}
RUN mvn site
RUN cp -R target/site ${WEBAPPS}/docs
RUN mvn clean

do:

RUN cp ${PWD}/override.xml ./pom.xml &&\
    mvn clean install &&\
    cp target/whatever.jar ${WEBAPPS} &&\
    mvn site &&\
    cp -R target/site ${WEBAPPS}/docs &&\
    mvn clean

(And later on: don't run mvn in Dockerfiles if you don't have to! 😝)

Images should be based on the "grandest" component you're using

For example, if the first thing you do in your image is yum install openjdk, then instead of FROM centos do FROM openjdk. Or, if the first thing you do after that is apt-get install -y tomcat, then use FROM tomcat:9.0.43-jdk15-openjdk-buster.

Let others worry about how to package JDK or Tomcat, you focus on your app. ☺️

☝️ Not everyone is a smart packager, though! You may have to use a lower level base image anyway because of VOLUME statements or non-obvious and/or broken ways of including extensions to a base package. (The openldap image, for example, is frustrating at this.)

When building an image, the build context includes the given directory

And that's sent to the Docker daemon (minus anything listed in .dockerignore) so there's usually no need to have git or mvn as steps in the Dockerfile. This means that instead of:

RUN git clone https://github.com/org/repo.git &&\
    cd repo &&\
    mvn package &&\
    cp target/app.war ${CATALINA_BASE}/webapps

you can simply do:

COPY target/app.war ${CATALINA_BASE}/webapps/app.war

and run:

$ ls -F
docker/    pom.xml    target/
$ mvn package
…
$ docker image build --tag app:latest --file docker/Dockerfile .

This also makes subsequent image builds idempotent. If someone were to commit a new change to main (master) on org/repo then two runs of docker image build could produce two different images 😮

☝️ Actually, it won't necessarily! Because each layer gets cached during the build, a single docker image build will save the filesystem layer containing git clone https://github.com/org/repo.git at the time it was run. That means:

If you change org/repo, rebuilding the image won't do anything unless you do docker image build --no-cache. You'd have to advise everyone to run docker image build --no-cache prominently in documentation.
If the image is later published and pulled by other Docker users, then when they try to build the image, they'll get your cached layer! The results will be non-obvious and confusing! 😱

💁‍♀️ So, in general, don't pull exterior resources with RUN git clone or RUN curl or ADD unless they're well-defined, static, and unchanging.

☝️ Use the .dockerignore to speed up your image builds. Suppose you have this situation:

$ du -sh *
16K   Dockerfile
12K   INSTALL.md
8.0K  docker-compose.yaml
8.2K  pom.xml
91K   target
9.3G  art-assets

And the art-assets aren't needed to build the image; no need to send 9.3 gigabytes of data to the Docker daemon all the time! Just put art-assets in your .dockerignore file next to the Dockerfile and build with docker image build .

🦶 Notes

¹Sean Kelly