06 Moving Data Around - NU-CPGME/quest_genomics_2025 GitHub Wiki
February 2025
Egon A. Ozer, MD PhD ([email protected])
Ramon Lorenzo Redondo, PhD ([email protected])
This is probably the most familiar way you can move files around between your computer or fsmresfiles and Quest. These browsers allow you to access and browse Quest's filesystem in a graphical fashion and allows you to drag and drop files or folders back and forth. This works pretty well in most cases.
Options:
Cyberduck (This one is pretty, but will ask you for money)
FileZilla (This one is free)
This is my most-used function for moving data around between my computer, Quest, and fsmresfiles. One of its greatest strengths is its ability to pick up a transfer that was interrupted and its ability to detect differences between the source and destination so that only files not already at the destination will be transferred.
The general structure of the command is rsync -a -L --progress [source] [destination]
-a
: The a
stands for "archive". Provides a lot of settings in the background for maintaining dates and other file information when transferring
-L
: Copies links. Adding this option will convert linked files (created using ln -s
) at the source into the full file at the desitnation, but with the file at the destination will have the same name as the link on the source and not necessarily the original file name.
--progress
: Provides output on the progress of the transfer
You can use rsync to move files and directories on your own computer. If you want to transfer back and forth between a remote server (like Quest), you'll have to put your NetID (or other login if its not Quest), an ampersand (@
) and the server address before the source or destination path and separate it with a colon :
. For example: [email protected]:/path/to/directory/file.txt
Rsync examples:
Transferring all reads files in a directory on your computer into a directory on Quest:
rsync -a -L --progress /home/reads/*.fastq.gz [email protected]:"/projects/b1042/OzerLab/reads_test/"
Downloading all fasta files in directory on Quest into the current folder on your computer (i.e. a period .
to indicate the current working directory from which you are running the rsync
command):
rsync -a -L --progress [email protected]:"/projects/b1042/OzerLab/results/*.fasta" .
Moving the entire directory from Quest to your computer:
rsync -a -L --progress <[email protected]:"/projects/b1042/OzerLab/results" .
NOTE FOR QUEST: You cannot access an external server using rsync on Quest (or at least I haven't figured out how to do it). You should always run rync in a terminal on your own computer whether transferring to or from Quest.
The more you can avoid Onedrive or SharePoint for storing or transferring data to and from Quest, the better. There are some options for uploading and downloading to your own computer like rclone, but it's far from straightforward or easy to navigate. You're probably best off downloading from OneDrive to your own computer and then using rsync
or a server browser program like FileZilla or CyberDuck to transfer.
For genome or gene sequence information, use NCBI Datasets. Use a web browser to find sequence information on NCBI Genomes, Taxonomy, or other databases.
## Activate the datasets Conda environment
conda activate /projects/p30002/condaenvs/ncbi_datasets_env
## Download a single genome sequence with gene and protein sequences and annotations
datasets download genome accession GCF_000006765.1 --filename PAO1.zip --include genome,cds,protein,gbff
## Download all genome sequences for a species
datasets download genome taxon "Human respiratory syncytial virus" --filename RSV.zip
Online instructions here: https://www.ncbi.nlm.nih.gov/sra/docs/sradownload/
First use a web browser to search SRA for run IDs.
## Activate the Conda environment
conda activate /projects/p30002/condaenvs/sratools_env
## Get read sequences from one sequencing run:
prefetch SRR11192680
fasterq-dump SRR11192680
## Get read sequences from a list of run IDs
prefetch --option-file sra_list.txt
fasterq-dump SRR100332*
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.