Keeping a persisted directory of data from a Docker container using volumes - lmmx/devnotes GitHub Wiki

If a Docker container is run, the files modified during its session will not be persisted after the Docker process is shut down.

If you want to keep the changes made to its container file system locally, it can be desirable to use the host file system as 'storage' (i.e. to 'persist' the changes 'locally')

docker ps will list the processes of running docker containers with their CONTAINER_ID

This Q&A describes ways to "explore a docker container's file system"

  • Snapshotting
  • ssh
  • nsenter

More simply, since docker exec was introduced, you can docker exec <container> ls <dir path> (which seems preferable to creating a static image)

Much like virtual machines, you can mount the host's directories in a container and these will persist unlike the rest of the container's file system.

In my case I am spinning up the image coatwork/dockerimage:latest for the 2020 Combinatorial Optimisation summer school

docker run --rm -it -p 9001:9001 coatwork/dockerimage:latest

and this creates a persisted directory:

docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest

I knew the working directory was /jupyter/ because I looked at the Dockerfile on Docker Hub (here) and read the line:

WORKDIR /jupyter

which is run before the Jupyter server is launched (see the sections on WORKDIR in Docker's documentation and note on best practices)

So passing /jupyter/persist as the volume location means it will be created as a directory alongside the notebook directories.

You can check the permissions for the directory with ls -ld and then convert these into the numeric code at chmod-calculator.com

sudo docker exec silly_johnson ls -ld "/jupyter/persist"

drwxr-xr-x 3 1000 1000 4096 Sep 14 12:34 /jupyter/persist

gives the permissions code for the directory ("silly_johnson" is the name of the active container found by sudo docker ps)

  • the three parts of the permissions are: owners, group members, other users

To make the directory writable by both host and container, you'd want to chmod +w it but that doesn't work with docker exec

  • Note that if jupyter/ is used as the volume location, not jupyter/persist (which was a location I invented for this purpose), then the container will mirror that empty directory and no notebooks appear there
    • this might have happened due to file permission mismatch?

To set permissions, it seems you have to set them before using the directory, as I'll now demonstrate.


The following section is a demo of what doesn't work, feel free to skip to the next line break

To step into the container running Jupyter, as shown here

docker exec -it xenodochial_kilby bash

(Above the container ID was silly_johnson but now it's become xenodochial_kilby)

This is a lot like ssh, and shows a command line from within the container's file system, running the bash shell.

www-data@85dd6fb1b423:/jupyter$ ls -ld ./*/
drwxr-xr-x 1 www-data www-data 4096 Aug 14 17:34 ./2020-09-14_monday/
drwxr-xr-x 1 www-data www-data 4096 Sep  9 22:15 ./2020-09-15_tuesday/
drwxr-xr-x 1 www-data www-data 4096 Aug 27 08:51 ./2020-09-16_wednesday/
drwxr-xr-x 1 www-data www-data 4096 Sep  7 10:05 ./2020-09-17_thursday/
drwxr-xr-x 1 www-data www-data 4096 Sep  9 17:58 ./2020-09-18_friday/
drwxr-xr-x 1 www-data www-data 4096 Sep  1 16:13 ./2020-09-22_tuesday/
drwxr-xr-x 1 www-data www-data 4096 Sep  7 11:14 ./2020-09-23_wednesday/
drwxr-xr-x 1 www-data www-data 4096 Sep  1 16:13 ./2020-09-24_thursday/
drwxr-xr-x 1 www-data www-data 4096 Aug 14 15:15 ./local/
drwxr-xr-x 3     1000     1000 4096 Sep 14 12:34 ./persist/

So now I'll try to change the permissions of the volume at persist/ to 777

www-data@85dd6fb1b423:/jupyter$ chmod 777 persist/
chmod: changing permissions of 'persist/': Operation not permitted

It's still not possible: you need to change it before using it.


To create a directory, you run docker volume create, however this defaults to storing the volume under /var/lib/docker/volume/ whereas I want to store it at /home/louis/co/testing/files/.

To specify a custom location, you need to pass the -o or --opt flag to docker volume create, and then set one of the settings to the unix mount tool (which is the "driver" managing volumes here).

It's not made very clear how this is derived from mount exactly (docker's docs are not friendly, frustratingly, and just say that you can find "the complete list of options" in the man pages for mount) but:

--opt device=/home/louis/co/testing/files/

seems to be the way to specify a custom location in this way

docker volume list will show any volumes created, so now I'll try to create a docker volume with custom permissions

So the command would be something like

docker volume create --opt device=/home/louis/co/testing/files/ persistent_files

It says I need to specify a type (this isn't documented!)

So if I try to go back to the start and create a test volume:

docker volume create test_vol
docker volume inspect test_vol

[
    {
        "CreatedAt": "2020-09-14T15:46:26+01:00",
        "Driver": "local",
        "Labels": {},
        "Mountpoint": "/var/lib/docker/volumes/test_vol/_data",
        "Name": "test_vol",
        "Options": {},
        "Scope": "local"
    }
]

So there's nothing to go off in the options for a standard/template volume. Not very useful.

Since Linux uses ext4 filesystems, I'd think type=ext4 might work

docker volume create --opt type=ext4 --opt device=/home/louis/co/testing/files/ persistent_files

It worked!

Docker doesn't point this out exactly but the mount options for ext4 filesystems are in the ext4 man pages

Importantly, no directory has yet been created at /home/louis/co/testing/files/

So now we change the command which was previously:

docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest

to:

docker run --rm -it -v persistent_files:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest

but this gives the error:

docker: Error response from daemon: error while mounting volume '/var/lib/docker/volumes/persistent_files/_data': failed to mount local volume: mount /home/louis/co/testing/files/:/var/lib/docker/volumes/persistent_files/_data: no such file or directory.

I interpret this to mean I need to create the directory ~/co/testing/files/ beforehand, and I'll set the permissions to the most permissive possible:

mkdir ~/co/testing/files/
chmod 777 ~/co/testing/files/

This time I get an error saying the directory is not a "block device"

docker: Error response from daemon: error while mounting volume '/var/lib/docker/volumes/persistent_files/_data': failed to mount local volume: mount /home/louis/co/testing/files/:/var/lib/docker/volumes/persistent_files/_data: block device required.

So this seems like the device argument is wrong, which was required for the custom location, which now seems to still be getting put in /var/lib/docker/volumes/ anyway, so I think this is all wrong.

docker volume rm persistent_files

To try again, I'll just use it with docker run

docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest

and then run on the host filesystem (not from within docker!)

sudo chown -R louis files
sudo chmod -R 755 files

So in summary, to run it again, clear the persistent directory (rm -rf ~/co/testing/files) and start again:

docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest

Immediately there'll now be a files/ directory inside ~/co/testing/ (empty)

To step into the docker container running Jupyter:

sudo docker exec -it $(sudo docker ps -n 1 --format='{{.Names}}') bash
  • Save a notebook from a killed kernel to overwrite the localhost (unmodified template)
  • The Docker container is now ready to be 'persisted' into the volume contained at /jupyter/persist/

! ! !

While looking this up I came across the docs for docker cp which copies a file and changes the file system permissions appropriately between container and host file system!

I tried to run docker cp and discovered that it gives the files the same owner as executed docker, and even though I'm running docker run without sudo, I still need sudo elevated permissions for docker cp, which then means the files get the root owner, which would then need to be modified to go back to normal.

There are 2 solutions available:

  • use tar instead of docker cp (more accurately going via docker cp but piping through tar)
  • use "overlayfs" (overlay filesystem), available with the driver overlay2 on Linux 4.0 kernels and above

There's an example of the tar method in this Q&A ("File ownership after docker cp"), but it's going the other way: sending files into a container and setting permissions there.

tar -cf - foo.bar --mode u=+r,g=-rwx,o=-rwx --owner root --group root | docker cp - nginx:/

It does illustrate however how the permissions work: rather than the octal (numeric, 0-7) representation, the permissions are set as --mode u=...,g=...,o=... (u for [any other] user, g for [user who is a member of] group, o for owner)

--mode specifies the permissions for the target. Similar to chown, they can be given in symbolic notation or as an octal number.


Regarding the more complicated overlay setup:


So let's start again. This time, no mounted volume (as docker cp won't work with those)

Here's what I attempted (and adding the -a flag to docker cp did nothing so here I omit it)

  • This was run in ~/co/testing on the host filesystem:
sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/ file_copies

I'm going to re-execute this and expect to get the directory ~/co/testing/file_copies/ filled with all of the notebooks, but each individual notebook will have the wrong permissions.

I no longer need the mounted volume (because I'm copying through docker cp rather than using a volume to bridge the two filesystems), so I'll re-start the docker container with docker run without the -v flag

The sudo docker cp... command in the block just above which wrote to file_copies can now be carried out and will create the directory ~/co/testing/file_copies/ however we don't need all of that!

For now it'll do. Here's ls from the host file system into that directory:

ls file_copies/
2020-09-14_monday  2020-09-15_tuesday  2020-09-16_wednesday  2020-09-17_thursday  2020-09-18_friday  2020-09-22_tuesday  2020-09-23_wednesday  2020-09-24_thursday  local  Test_me.ipynb

and if I get the file permissions:

ls -l
total 48
drwxr-xr-x 3 root root 4096 Sep 14 21:21 2020-09-14_monday
drwxr-xr-x 2 root root 4096 Sep  9 23:15 2020-09-15_tuesday
...
-rw-r--r-- 1 root root  774 Sep  9 17:02 Test_me.ipynb

It's visible that all group and user names are root (when they should be louis)

I'm going to erase this directory again, file_copies/ which was the docker cp mirror of the container's /jupyter/ directory. Instead I'm going to create an empty directory jup which will have the notebook directories underneath, but I'll put them there one at a time (on the evening of each day of the summer school)

sudo rm -rf ~/co/testing/file_copies/
mkdir ~/co/testing/jup/

So obviously this directory will have normal host system permissions

ls -ld jup/
drwxr-xr-x 2 louis louis 4096 Sep 14 21:33 jup/

The first day I'm going to try and copy over is 2020-09-14_monday, so here's the standard version which will again give it the wrong permissions:

sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday jup

As expected, this gives

ls -l jup/
total 4
drwxr-xr-x 3 root root 4096 Sep 14 21:21 2020-09-14_monday

So now I'm again going to delete this directory to clean up before trying again

sudo rm -rf jup/*/

...and only now can I make an attempt at copying it over via tar

To be very clear: I am modifying the following command:

sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday jup

specifically to pass via STDIN as docker cp - and then piping to tar and only then going into a directory beneath jup.

To recap, here was the tar command with docker cp in the opposite direction:

tar -cf - foo.bar --mode u=+r,g=-rwx,o=-rwx --owner root --group root | docker cp - nginx:/

With both of those in mind, here's my first guess at how this'd work:

sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday - | tar Cxf jup - --mode u=+r,g=-rwx,o=-rwx --owner louis --group louis

and we have a winner !!!

ls -l jup/
total 4
drwxr-xr-x 3 louis louis 4096 Sep 14 21:21 2020-09-14_monday

The directory is read/writable and executable

ls -l jup/2020-09-14_monday/
total 20
-rw-r--r-- 1 louis louis 12096 Sep 14 21:21 exercise1.ipynb

The files are read/writable.

So now I can run this after a day of modifying notebooks and keep the directory safe on my hard drive. Hooray!

⚠️ **GitHub.com Fallback** ⚠️