Backup and reinstall 4CAT - digitalmethodsinitiative/4cat GitHub Wiki

Overview

To backup and later reinstall 4CAT to its previous state you need the following:

  • 4CAT database
  • 4CAT dataset files
  • 4CAT config files/previous version (to either use the same version when reinstalling or upgrade from the previous version)

For additional information on collecting these files, see below. It will depend on how you set up and installed 4CAT.

Backup only certain datasets

It is also possible to export any specific dataset to a ZIP file. The ZIP file will contain your dataset, its analyses, and metadata files with the information normally stored in the database. These exports can then be imported into another instance of 4CAT.

image

Note: not all versions of 4CAT are compatible (primarily due to database changes). Ensure you check the 4CAT version to ensure compatibility. Found in the bottom right corner of 4CAT

image

The version is also in the metadata files exported.

Backup 4CAT

Docker installation

The location of your 4CAT data is defined in the docker-compose.yml file used to set up 4CAT (or the docker-compose_build.yml file if it was used). This is visible in the "volumes" section of each service.

For mounts using the 4cat/data/ directory, it would appear as so:

db:
    volumes:
      - ./data/postgres/:/var/lib/postgresql/data/
backend:
    volumes:
      - ./data/datasets/:/usr/src/app/data/
      - ./data/config/:/usr/src/app/config/
      - ./data/logs/:/usr/src/app/logs/
frontend:
    volumes:
      - ./data/datasets/:/usr/src/app/data/
      - ./data/config/:/usr/src/app/config/
      - ./data/logs/:/usr/src/app/logs/

The backend and frontend share three mounts and the database (db) has one all in the ./data/ directory.

Docker volumes

By default using docker-compose.yml, 4CAT will use Docker's volume management to store all your data. The names of the volumes were set in you .env file and can also be found using docker volume ls and docker inspect. Docker has more information here on backup and restoring volumes. There are four 4CAT volumes listed and named at the end of the docker-compose.yml file:

volumes:
  4cat_db:
    name: ${DOCKER_DB_VOL} # default in .env: 4cat_4cat_db
  4cat_data:
    name: ${DOCKER_DATA_VOL} # default in .env: 4cat_4cat_data
  4cat_config:
    name: ${DOCKER_CONFIG_VOL} # default in .env: 4cat_4cat_config
  4cat_logs:
    name: ${DOCKER_LOGS_VOL} # default in .env: 4cat_4cat_logs
  • 4cat_data contains dataset data
  • 4cat_db contains the database with dataset metadata as well as 4CAT settings, users, etc.
  • 4cat_config is used by 4CAT to maintain versioning and update
  • 4cat_logs contain all logged information and errors

Essentially, you can zip/copy/store the those directories as you wish. How to restore them depends on if you wish to continue using Docker volumes or instead point a mount to the redeployed directories. Docker volumes are easy to deploy and allow Docker management, but for setups where you which to backup, restore, and fine tune storage, mounts may be much more suited to your setup.

Mounts

docker-compose_build.yml by default mounts the data directory in your 4CAT folder as shown in the example above. You can thus backup those directories as you wish. Be sure to store all four directories.

Backup from manually installed 4CAT

This should be relatively straightforward and primary depends on the choices you made at installation. Ensure you store the following:

  • The psql database related to 4CAT
    • Database information can be found in your config/config.ini file in the DATABASE section
    • The database contains all metadata around your datasets; without it 4CAT will not recognize the type of data collected and be unable to display or interact directly with the dataset files.
  • 4CAT's dataset file
    • Dataset files are flat files located in the directory specified as path_data in the config/config.ini file in the PATH section
    • These are your actual dataset files
  • 4CAT's version and configuration data
    • These files are stored in the 4CAT config/ directory; recommended to store the entire directory
    • In addition to the config.ini file which relates to your setup, there will be two files .current-version and .current-version-frontend which track your 4CAT version and are used in the upgrade process.

That's all you need. You could additionally store any log files if desired from the path_logs directory

Installing 4CAT using previous data

You may wish to redeploy 4CAT in the future using previously collected data. This can be done in principle though we of course cannot guarantee complete compatibility between datasets and analyses over time. As a general principle, we aim to keep the collected datasets as closely as possible to their original sources and we do not modify them. With that being said, sometimes the original sources modify their data and so datasets can contain different information depending on when they were collected.

Required data

The two most important pieces needed are your 4CAT database and the accompanying dataset files. There is also a version file that will let the migrate process know from what version your previous 4CAT instances needs to be updated.

  • 4CAT database
  • 4CAT dataset files
  • 4CAT config/.current-version and config/.current-version-frontend files

Docker Install

Depending on your installation and backup method from above, you should have the four main directories. The log directory is ultimately option (nothing will fail without it) and it contains only logged information.

How you restore 4CAT will depend on if you intend on using volumes or mounts as explained above and in the installation section.

Docker Volumes

If your system Docker has the 4CAT volumes (docker volume ls should list existing volumes), ensure the existing volumes listed are the same as those named in your .env file. Run docker compose up -d as normal and Docker should use those volumes (assuming no changes have been made to the docker-compose.yml file "volumes" sections). 4CAT should use these files and, if you have pulled a newer version of 4CAT, should even make any necessary updates. Check the logs to note any warnings or errors.

If you backed up the data in the volumes, you can do one of two things. You can either extract the data into folders and, instead of Docker volumes, mount directly to the data folders (see this example) or you can create named Docker volumes and extract the data there. Current Docker instructions here, but it would look something like the following:

  1. Create named volumes (as named in your .env file)
docker volume create 4cat_4cat_data
docker volume create 4cat_4cat_logs
docker volume create 4cat_4cat_config
docker volume create 4cat_4cat_database
  1. Extract the data to each volume Assuming the saved data is in local folders such as /backup/4cat_data.tar.gz
docker run --rm -v 4cat_4cat_data:/data -v $(pwd):/backup alpine tar -xzf /backup/4cat_data.tar.gz -C /data
docker run --rm -v 4cat_4cat_logs:/data -v $(pwd):/backup alpine tar -xzf /backup/4cat_logs.tar.gz -C /data
docker run --rm -v 4cat_4cat_config:/data -v $(pwd):/backup alpine tar -xzf /backup/4cat_config.tar.gz -C /data
docker run --rm -v 4cat_4cat_database:/data -v $(pwd):/backup alpine tar -xzf /backup/4cat_database.tar.gz -C /data
  1. Verify data was extracted
# List volumes
docker volume ls

# Inspect volume contents (optional)
docker run --rm -v 4cat_4cat_data:/data alpine ls -la /data
  1. Deploy 4CAT docker compose up -d

Mounted directories

This should be relatively straightforward. Ensure your backed up data is extracted into a folder structure that matches your docker-compose.yml files.

For example, this 4CAT folder structure:

4cat/
  data/
    postgres/
    data/
    config/
    logs/

Should correspond with the following volume sections in docker-compose.yml file:

db:
    volumes:
      - ./data/postgres/:/var/lib/postgresql/data/
backend:
    volumes:
      - ./data/datasets/:/usr/src/app/data/
      - ./data/config/:/usr/src/app/config/
      - ./data/logs/:/usr/src/app/logs/
frontend:
    volumes:
      - ./data/datasets/:/usr/src/app/data/
      - ./data/config/:/usr/src/app/config/
      - ./data/logs/:/usr/src/app/logs/

You should then be able to run docker compose up -d and Docker + 4CAT will pick up the rest.

Manual Install

  1. 4CAT database: this PSQL database was created by you; edit the config/config.ini file database section to connect to this database
  2. 4CAT dataset files: files are stored in the path_data folder listed in the config/config.ini file
  3. 4CAT config/.current-version file: file stored in your 4CAT working directory in the config folder: config/.current-version. Once you have these files, you can upgrade by following these instructions. The migrate.py step will ensure you database is upgraded from the version listed in .current-version.