Download and store data on Albiorix - The-Bioinformatics-Group/Albiorix GitHub Wiki
Introduction [Work in progress]
Moving data to Albiorix in a secure way and storing it safely are important tasks that any user of the system should have at least some basic knowledge of. This post will discuss the following three subjects:
- How to download data from a remote server
- Longtime storage of data
- Where to store files you are working with
Download and longtime storage
- On Albiorix, create a SGE script with a
rsync
command that fetches the remote data and stores it in a place designated for longtime storage of data (e.g./nobackup/data4/DATA
). Good examples of how to run the "rsync" program can be found here. - Make sure the data is uncorrupted with the help of the md5 checksum files provided with the data.
- Write a
README.md
file that describes the data (when it was downloaded, where it was downloaded from, which project it belongs to, which species it come from etc.) - Compress the files if needed, with the
gzip
command.
Safely working with the data
It is important that you keep at least one copy of your raw data files in safe storage someplace (e.g. in a designated directory in /nobackup
) and that you only work on and alter a different copy of the files. The latter can be placed in your home directory, in a designated directory in /nobackup
or on the compute nodes.
Storing data on the compute nodes
Different data sources
1KP Transcriptome; UF Research Computing Research Collaboration Data Server (https://bio.rc.ufl.edu/); CURL; Putty; Wget, rsync, Filezilla, Winscp (Mats slides)