Docker Recovery From Full Disk - earthcubeprojects-chords/chords GitHub Wiki

This emergency recovery was triggered by issue #375.

The tzvolcano portal ran out of disk space. These are the relevant factors:

  • The m3.large instance creates a root disk of 8G. We typically bump this up to 20G for tzvolcano.
  • But the instance type of m3.large had created a root disk of type 'standard'. Normally the volume page shows a disk type of 'gp2' for other instance types.
  • It appears that you can't resize this type of disk.

I was able to log into tzvolcano, but I couldn't do anything with docker, due to the lack of disk space. To increase the disk space, I did the following from the ec2 console (here is guidance on changing storage types):

  • Stop the tzvolcano instance.
  • Write down the availability zone of the instance (uswest-2b).
  • Write down the root device name for the instance (/dev/sda1).
  • Made a snapshot of the original volume.
  • Created a new volume from the snapshot. The new volume was 20G, on the SSD media, with the previous availability zone.
  • Detach the old volume from the instance.
  • Attach the new volume to the instance, with the same disk device name.
  • Restart the instance.

I ssh'ed back into tzvolcano. It now had plenty of free disk space. However, docker was completely hosed, apparently from a corrupted devmapper scheme. Any docker command would produce a result such as:

[root@ip-172-31-45-116 ~]# docker pull alpine
Using default tag: latest
latest: Pulling from library/alpine
88286f41530e: Extracting [==================================================>]  1.99 MB/1.99 MB
failed to register layer: devmapper: Thin Pool has 47639 free data blocks which is less than minimum required 163840 free data blocks. Create more free space in thin pool or use dm.min_free_space option to change behavior

The advice for this is to completely remove docker. But this would wipe out the tzvolcano data. Also, I didn't want to lose the CHORDS image that was running currently, since it is 'latest', and a pull would have gotten a much newer one. To deal with these issues, the following was performed:

# Save the CHORDS image: 
docker save <image_id> > ~/chords.img
# Save the CHORDS volumes: 
tar -cvf /var/lib/docker/volumes ~/volumes.tar
# Uninstall docker: 
service docker stop 
yum remove docker
# Get rid of docker artifacts: 
mv /var/lib/docker /var/lib/docker.save
# Reinstall docker:
yum install docker
# Restore the volumes: 
cd /var/lib/docker
tar -xvf ~/volumes.tar
# Restart docker: 
service docker start

At this point, docker was functioning properly. I was able to pull the alpine image, and docker volume ls showed the CHORDS volumes were intact.

Finally, it was time to restore the CHORDS images:

# Pull the images
cd
docker-compose -p chords pull
# replace ncareol/chords:latest with the saved version
docker rmi ncareol/chords:latest
docker load < ~/chords.img
docker images
docker tag <image_id> ncareol/chords:latest
# Verify
docker images

After verifying that the images were correct, the portal was restarted:

docker-compose -p chords up -d

A critical aspect of this exercise was the fact that I was able to save and restore the named volumes just by copying the existing contents of /var/lib/docker/volumes into the new /var/lib/docker/volumes.

⚠️ **GitHub.com Fallback** ⚠️