Download and store data on Albiorix - The-Bioinformatics-Group/Albiorix GitHub Wiki

Introduction [Work in progress]

Moving data to Albiorix in a secure way and storing it safely are important tasks that any user of the system should have at least some basic knowledge of. This post will discuss the following three subjects:

How to download data from a remote server
Longtime storage of data
Where to store files you are working with

Download and longtime storage

On Albiorix, create a SGE script with a rsync command that fetches the remote data and stores it in a place designated for longtime storage of data (e.g. /nobackup/data4/DATA). Good examples of how to run the "rsync" program can be found here.
Make sure the data is uncorrupted with the help of the md5 checksum files provided with the data.
Write a README.md file that describes the data (when it was downloaded, where it was downloaded from, which project it belongs to, which species it come from etc.)
Compress the files if needed, with the gzip command.

Safely working with the data

It is important that you keep at least one copy of your raw data files in safe storage someplace (e.g. in a designated directory in /nobackup) and that you only work on and alter a different copy of the files. The latter can be placed in your home directory, in a designated directory in /nobackup or on the compute nodes.

Storing data on the compute nodes

Different data sources

1KP Transcriptome; UF Research Computing Research Collaboration Data Server (https://bio.rc.ufl.edu/); CURL; Putty; Wget, rsync, Filezilla, Winscp (Mats slides)