Keeping a persisted directory of data from a Docker container using volumes - lmmx/devnotes GitHub Wiki
If a Docker container is run, the files modified during its session will not be persisted after the Docker process is shut down.
If you want to keep the changes made to its container file system locally, it can be desirable to use the host file system as 'storage' (i.e. to 'persist' the changes 'locally')
docker ps
will list the processes of running docker containers with their CONTAINER_ID
This Q&A describes ways to "explore a docker container's file system"
- Snapshotting
- ssh
nsenter
More simply, since docker exec
was introduced, you can docker exec <container> ls <dir path>
(which seems preferable to creating a static image)
Much like virtual machines, you can mount the host's directories in a container and these will persist unlike the rest of the container's file system.
In my case I am spinning up the image coatwork/dockerimage:latest
for the 2020 Combinatorial Optimisation summer school
docker run --rm -it -p 9001:9001 coatwork/dockerimage:latest
and this creates a persisted directory:
docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest
I knew the working directory was /jupyter/
because I looked at the Dockerfile on Docker Hub (here) and read the line:
WORKDIR /jupyter
which is run before the Jupyter server is launched (see the sections on WORKDIR
in Docker's documentation and note on best practices)
So passing /jupyter/persist
as the volume location means it will be created as a directory alongside the notebook directories.
You can check the permissions for the directory with ls -ld
and then convert these into the numeric code at chmod-calculator.com
sudo docker exec silly_johnson ls -ld "/jupyter/persist"
⇣
drwxr-xr-x 3 1000 1000 4096 Sep 14 12:34 /jupyter/persist
gives the permissions code for the directory ("silly_johnson" is the name of the active container found by sudo docker ps
)
- the three parts of the permissions are: owners, group members, other users
To make the directory writable by both host and container, you'd want to chmod +w
it but that doesn't work with docker exec
- Note that if
jupyter/
is used as the volume location, notjupyter/persist
(which was a location I invented for this purpose), then the container will mirror that empty directory and no notebooks appear there- this might have happened due to file permission mismatch?
To set permissions, it seems you have to set them before using the directory, as I'll now demonstrate.
The following section is a demo of what doesn't work, feel free to skip to the next line break
To step into the container running Jupyter, as shown here
docker exec -it xenodochial_kilby bash
(Above the container ID was silly_johnson
but now it's become xenodochial_kilby
)
This is a lot like ssh
, and shows a command line from within the container's file system, running the bash
shell.
www-data@85dd6fb1b423:/jupyter$ ls -ld ./*/
drwxr-xr-x 1 www-data www-data 4096 Aug 14 17:34 ./2020-09-14_monday/
drwxr-xr-x 1 www-data www-data 4096 Sep 9 22:15 ./2020-09-15_tuesday/
drwxr-xr-x 1 www-data www-data 4096 Aug 27 08:51 ./2020-09-16_wednesday/
drwxr-xr-x 1 www-data www-data 4096 Sep 7 10:05 ./2020-09-17_thursday/
drwxr-xr-x 1 www-data www-data 4096 Sep 9 17:58 ./2020-09-18_friday/
drwxr-xr-x 1 www-data www-data 4096 Sep 1 16:13 ./2020-09-22_tuesday/
drwxr-xr-x 1 www-data www-data 4096 Sep 7 11:14 ./2020-09-23_wednesday/
drwxr-xr-x 1 www-data www-data 4096 Sep 1 16:13 ./2020-09-24_thursday/
drwxr-xr-x 1 www-data www-data 4096 Aug 14 15:15 ./local/
drwxr-xr-x 3 1000 1000 4096 Sep 14 12:34 ./persist/
So now I'll try to change the permissions of the volume at persist/
to 777
www-data@85dd6fb1b423:/jupyter$ chmod 777 persist/
chmod: changing permissions of 'persist/': Operation not permitted
It's still not possible: you need to change it before using it.
To create a directory, you run docker volume create
, however this defaults to storing the volume under
/var/lib/docker/volume/
whereas I want to store it at /home/louis/co/testing/files/
.
To specify a custom location, you need to pass the -o
or --opt
flag to docker volume create
,
and then set one of the settings to the unix mount
tool (which is the "driver"
managing volumes here).
It's not made very clear how this is derived from mount
exactly (docker's docs are not friendly, frustratingly, and
just say that you can find "the complete list of options" in the man pages for mount
) but:
--opt device=/home/louis/co/testing/files/
seems to be the way to specify a custom location in this way
docker volume list
will show any volumes created, so now I'll try to create a docker volume with custom permissions
So the command would be something like
docker volume create --opt device=/home/louis/co/testing/files/ persistent_files
It says I need to specify a type
(this isn't documented!)
So if I try to go back to the start and create a test volume:
docker volume create test_vol
docker volume inspect test_vol
⇣
[
{
"CreatedAt": "2020-09-14T15:46:26+01:00",
"Driver": "local",
"Labels": {},
"Mountpoint": "/var/lib/docker/volumes/test_vol/_data",
"Name": "test_vol",
"Options": {},
"Scope": "local"
}
]
So there's nothing to go off in the options for a standard/template volume. Not very useful.
Since Linux uses ext4
filesystems, I'd think type=ext4
might work
docker volume create --opt type=ext4 --opt device=/home/louis/co/testing/files/ persistent_files
It worked!
Docker doesn't point this out exactly but the
mount
options forext4
filesystems are in theext4
man pages
Importantly, no directory has yet been created at /home/louis/co/testing/files/
So now we change the command which was previously:
docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest
to:
docker run --rm -it -v persistent_files:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest
but this gives the error:
docker: Error response from daemon: error while mounting volume '/var/lib/docker/volumes/persistent_files/_data': failed to mount local volume: mount /home/louis/co/testing/files/:/var/lib/docker/volumes/persistent_files/_data: no such file or directory.
I interpret this to mean I need to create the directory ~/co/testing/files/
beforehand, and I'll set the permissions to the most permissive possible:
mkdir ~/co/testing/files/
chmod 777 ~/co/testing/files/
This time I get an error saying the directory is not a "block device"
docker: Error response from daemon: error while mounting volume '/var/lib/docker/volumes/persistent_files/_data': failed to mount local volume: mount /home/louis/co/testing/files/:/var/lib/docker/volumes/persistent_files/_data: block device required.
So this seems like the device
argument is wrong, which was required for the custom location, which now seems to still be getting put in /var/lib/docker/volumes/
anyway, so I think this is all wrong.
docker volume rm persistent_files
To try again, I'll just use it with docker run
docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest
and then run on the host filesystem (not from within docker!)
sudo chown -R louis files
sudo chmod -R 755 files
So in summary, to run it again, clear the persistent directory (rm -rf ~/co/testing/files
) and start again:
docker run --rm -it -v /home/louis/co/testing/files/:/jupyter/persist -p 9001:9001 coatwork/dockerimage:latest
Immediately there'll now be a files/
directory inside ~/co/testing/
(empty)
To step into the docker container running Jupyter:
sudo docker exec -it $(sudo docker ps -n 1 --format='{{.Names}}') bash
- Save a notebook from a killed kernel to overwrite the localhost (unmodified template)
- The Docker container is now ready to be 'persisted' into the volume contained at
/jupyter/persist/
! ! !
While looking this up I came across the docs for docker cp
which copies a file and changes the file system permissions
appropriately between container and host file system!
I tried to run docker cp
and discovered that it gives the files the same owner as executed docker
, and even though I'm
running docker run
without sudo
, I still need sudo
elevated permissions for docker cp
, which then means the files
get the root
owner, which would then need to be modified to go back to normal.
There are 2 solutions available:
- use
tar
instead ofdocker cp
(more accurately going viadocker cp
but piping throughtar
) - use "overlayfs" (overlay filesystem), available with the driver
overlay2
on Linux 4.0 kernels and above
There's an example of the tar
method in this Q&A
("File ownership after docker cp
"), but it's going the other way: sending files into a container and setting permissions there.
tar -cf - foo.bar --mode u=+r,g=-rwx,o=-rwx --owner root --group root | docker cp - nginx:/
It does illustrate however how the permissions work: rather than the octal (numeric, 0-7) representation, the permissions
are set as --mode u=...,g=...,o=...
(u for [any other] user, g for [user who is a member of] group, o for owner)
--mode
specifies the permissions for the target. Similar tochown
, they can be given in symbolic notation or as an octal number.
Regarding the more complicated overlay setup:
- Here's the
overlay
andoverlay2
driver docs- Here's the docs for storage drivers
- Here's a page on example overlayfs usage
- Here's a cartoon followed by a demo explaining overlayfs and docker's usage (a little bit)
- I started following this but then realised I wanted to try the
docker cp
+tar
variant instead.
- I started following this but then realised I wanted to try the
So let's start again. This time, no mounted volume (as docker cp
won't work with those)
Here's what I attempted (and adding the -a
flag to docker cp
did nothing so here I omit it)
- This was run in
~/co/testing
on the host filesystem:
sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/ file_copies
I'm going to re-execute this and expect to get the directory ~/co/testing/file_copies/
filled with
all of the notebooks, but each individual notebook will have the wrong permissions.
I no longer need the mounted volume (because I'm copying through docker cp
rather than using a volume to
bridge the two filesystems), so I'll re-start the docker container with docker run
without the -v
flag
The sudo docker cp...
command in the block just above which wrote to file_copies
can now be carried out
and will create the directory ~/co/testing/file_copies/
however we don't need all of that!
For now it'll do. Here's ls
from the host file system into that directory:
ls file_copies/
2020-09-14_monday 2020-09-15_tuesday 2020-09-16_wednesday 2020-09-17_thursday 2020-09-18_friday 2020-09-22_tuesday 2020-09-23_wednesday 2020-09-24_thursday local Test_me.ipynb
and if I get the file permissions:
ls -l
total 48
drwxr-xr-x 3 root root 4096 Sep 14 21:21 2020-09-14_monday
drwxr-xr-x 2 root root 4096 Sep 9 23:15 2020-09-15_tuesday
...
-rw-r--r-- 1 root root 774 Sep 9 17:02 Test_me.ipynb
It's visible that all group and user names are root
(when they should be louis
)
I'm going to erase this directory again, file_copies/
which was the docker cp
mirror
of the container's /jupyter/
directory. Instead I'm going to create an empty directory
jup
which will have the notebook directories underneath, but I'll put them there one
at a time (on the evening of each day of the summer school)
sudo rm -rf ~/co/testing/file_copies/
mkdir ~/co/testing/jup/
So obviously this directory will have normal host system permissions
ls -ld jup/
drwxr-xr-x 2 louis louis 4096 Sep 14 21:33 jup/
The first day I'm going to try and copy over is 2020-09-14_monday
,
so here's the standard version which will again give it the wrong permissions:
sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday jup
As expected, this gives
ls -l jup/
total 4
drwxr-xr-x 3 root root 4096 Sep 14 21:21 2020-09-14_monday
So now I'm again going to delete this directory to clean up before trying again
sudo rm -rf jup/*/
...and only now can I make an attempt at copying it over via tar
To be very clear: I am modifying the following command:
sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday jup
specifically to pass via STDIN as docker cp -
and then piping to tar
and only then going into
a directory beneath jup
.
To recap, here was the tar
command with docker cp
in the opposite direction:
tar -cf - foo.bar --mode u=+r,g=-rwx,o=-rwx --owner root --group root | docker cp - nginx:/
With both of those in mind, here's my first guess at how this'd work:
sudo docker cp $(sudo docker ps -n 1 --format='{{.Names}}'):/jupyter/2020-09-14_monday - | tar Cxf jup - --mode u=+r,g=-rwx,o=-rwx --owner louis --group louis
and we have a winner !!!
ls -l jup/
total 4
drwxr-xr-x 3 louis louis 4096 Sep 14 21:21 2020-09-14_monday
The directory is read/writable and executable
ls -l jup/2020-09-14_monday/
total 20
-rw-r--r-- 1 louis louis 12096 Sep 14 21:21 exercise1.ipynb
The files are read/writable.
So now I can run this after a day of modifying notebooks and keep the directory safe on my hard drive. Hooray!