Good Dockerfile Practices - NASA-PDS/nasa-pds.github.io GitHub Wiki
These are some good practices I'veยน seen going around and they've worked for me.
Images should be small
Choose a base image carefully: do you need a full ubuntu
(114MB) when debian
(72.9MB) will work fine? Or better yet, consider alpine
, which is one of the smallest Linux bases out there (5.61MB).
โ๏ธ Always clean up after yourself. If you did something like:
RUN apt-get install -y maven subversion &&\
svn checkout http://old.repo/old/code/tags/1.0.0 proj &&\
cd proj && mvn install &&\
cd .. && rm -r proj
and you don't need svn
or mvn
for the container to actually run, remove them:
RUN apt-get install -y maven subversion &&\
svn checkout http://old.repo/old/code/tags/1.0.0 proj &&\
cd proj && mvn install &&\
cd .. && rm -r proj &&\
apt-get purge -y --auto-remove subversion maven && rm -rf /var/lib/apt/lists/*
โ๏ธ Use as few RUN
statements as possible. Each RUN
adds a filesystem layer to the image, increasing its size. So instead of:
RUN cp ${PWD}/override.xml ./pom.xml
RUN mvn clean install
RUN cp target/whatever.jar ${WEBAPPS}
RUN mvn site
RUN cp -R target/site ${WEBAPPS}/docs
RUN mvn clean
do:
RUN cp ${PWD}/override.xml ./pom.xml &&\
mvn clean install &&\
cp target/whatever.jar ${WEBAPPS} &&\
mvn site &&\
cp -R target/site ${WEBAPPS}/docs &&\
mvn clean
(And later on: don't run mvn
in Dockerfiles if you don't have to! ๐)
Images should be based on the "grandest" component you're using
For example, if the first thing you do in your image is yum install openjdk
, then instead of FROM centos
do FROM openjdk
. Or, if the first thing you do after that is apt-get install -y tomcat
, then use FROM tomcat:9.0.43-jdk15-openjdk-buster
.
Let others worry about how to package JDK or Tomcat, you focus on your app. โบ๏ธ
โ๏ธ Not everyone is a smart packager, though! You may have to use a lower level base image anyway because of VOLUME
statements or non-obvious and/or broken ways of including extensions to a base package. (The openldap
image, for example, is frustrating at this.)
When building an image, the build context includes the given directory
And that's sent to the Docker daemon (minus anything listed in .dockerignore
) so there's usually no need to have git
or mvn
as steps in the Dockerfile
. This means that instead of:
RUN git clone https://github.com/org/repo.git &&\
cd repo &&\
mvn package &&\
cp target/app.war ${CATALINA_BASE}/webapps
you can simply do:
COPY target/app.war ${CATALINA_BASE}/webapps/app.war
and run:
$ ls -F
docker/ pom.xml target/
$ mvn package
โฆ
$ docker image build --tag app:latest --file docker/Dockerfile .
This also makes subsequent image builds idempotent. If someone were to commit a new change to main
(master
) on org/repo
then two runs of docker image build
could produce two different images ๐ฎ
โ๏ธ Actually, it won't necessarily! Because each layer gets cached during the build, a single docker image build
will save the filesystem layer containing git clone https://github.com/org/repo.git
at the time it was run. That means:
- If you change
org/repo
, rebuilding the image won't do anything unless you dodocker image build --no-cache
. You'd have to advise everyone to rundocker image build --no-cache
prominently in documentation. - If the image is later published and pulled by other Docker users, then when they try to build the image, they'll get your cached layer! The results will be non-obvious and confusing! ๐ฑ
๐โโ๏ธ So, in general, don't pull exterior resources with RUN git clone
or RUN curl
or ADD
unless they're well-defined, static, and unchanging.
โ๏ธ Use the .dockerignore
to speed up your image builds. Suppose you have this situation:
$ du -sh *
16K Dockerfile
12K INSTALL.md
8.0K docker-compose.yaml
8.2K pom.xml
91K target
9.3G art-assets
And the art-assets
aren't needed to build the image; no need to send 9.3 gigabytes of data to the Docker daemon all the time! Just put art-assets
in your .dockerignore
file next to the Dockerfile
and build with docker image build .