Project location and backup policy - GlascherLab/LabWiki GitHub Wiki
Each project in the Glascher Lab should be organized according to the following principles (details explained below=:
- Management of the project using DataLad
- Folder and file organization according to BIDS
- Mirroring of project to GIN
- Sufficient backups of all files (see below)
Original data location
The original data are the most valuable and should be stored in location, in which they are automatically backup, so that they can be retrieved again, if the working directory of a project is compromised (for whatever reason).
MRI data
The DICOM images obtained in the scanner are transferred into the DICOM database by the MR technicians after the session is completed. The data based is backuped multiple times. DICOM images can be retrieved by using the dicq command line tool on a computer connected to the ISN LAN network.
Retrieving and converting DICOM data
EEG data
EEG data are not automatically transferred to ISN network. It is the responsibility of the project leader to copy the original data files (*.bdf/xdf/edf) to a safe location. To do so, please create a folder in /common/raw/eeg/<your project name> and copy all the raw data files there. There is no specific folder structure, but it may make sense to create subfolder for each subject (or group of subjects). Please make sure that the folder is readable by other users as well (e.g. check the read permissions for others or change the user group of the project folder to your project group and set the gid bit (see here for details)
Behavioral and physiological data
Behavioral (e.g. log files, also from the scanner) and physiological data (eye-tracking, skin conductance, heart rate) should be copied to a folder names /common/raw/behave/<your project name> and /common/raw/phys/<your project name>respectively.
If all your neural and physiological data are in the same data file (edf/xdf) because you have used Lab Streaming Layer you should write a small README file (stored in the eeg folder) explaining what data sources are collected in the edf/xdf file.
Project location
The project folder is the main working folder, where data processing, computational modeling, and statistical analyses are run. The organization of this folder (with DataLad and BIDS) is described in the following sections of this Wiki page. However, the first question to decide is where the project folder should be located. There are 2-3 options:
- On your local laptop
- On the group server (in our case
dendrite:/projects/crunchie/<your username>/<your project name> - As a project mirror on GIN (German Infrastructure for Neuroscience) under the Lab's GIN account (glaescher)
Pros and Cons for each location
Project folder on a local laptop
- firewall should be enabled, but you are responsible for it
- laptop can be stolen (need for maintaining a good backup strategy)
- mobile computing: you can run analyses anywhere
- fast computations (sometimes): modern laptop CPUs (esp. the M1/2 chips form Apple) are often faster than the CPUs on the dendrite server
- limited RAM: usually not more than 16 GB of RAM
- backups have to be run by the project leader (no automatic backup)
Project folder on the dendrite server
- data is stored in a firewalled location
- 1.5 TB of RAM available
- 32 core CPU, not the fastest anymore, but good for parallelization of jobs
- can be accessed via VNC, also remotely (see ISN wiki)
- backups have to be run by the project leader (no automatic backup)
Project location on GIN
- good for sharing with remote collaborator who also work with the data
- project needs to be mirrored locally (laptop, server) because there are no computing services on GIN (it's just a storage site)
Backup policy
Only a few folders in the ISN network are backup automatically. These include the data in /common/raw and $HOME (your home directory, should not exceed 20 GB). The reason is that there are only a few hours every night available for backup and therefore we cannot include everything.
As a consequence, your project folder either on your local laptop or on the server have to backuped regularly by the project leader (i.e. YOU) both on GIN and on an external hard drive. This page describes how to use rsync as a backup tool on Linux/Mac to copy changes to your project folder to a hard drive.
If you setup your DataLad infrastructure with a mirror to GIN, then you can backup to an external disk only now an then (e.g. every 2-3 months). However, if you haven't setup a GIN mirror, you should backup to an external hard disk more regularly. In that case, I recommend keeping the backup disk on your desk and using rsync from your local machine, although that requires that the laptop has be kept connected with the ISN LAN for the duration of the backup.