Globus - core-unit-bioinformatics/knowledge-base GitHub Wiki

Globus

What is Globus?

Their own description and basic funding info:

"Globus is a non-profit service for secure, reliable research data management.
With Globus, subscribers can move, share, & discover data via a single
interface – whether your files live on a supercomputer, lab cluster,
tape archive, public cloud or your laptop, you can manage this data
from anywhere, using your existing identities, via just a web browser.
[...]
Globus is a group at the University of Chicago that develops and operates
a non-profit service for use by the research community.
[...]
Globus products and services are developed and operated by the
University of Chicago and Argonne National Laboratory,
supported by funding from the Department of Energy,
the National Science Foundation, and the National Institutes of Health​"

Problem: no folders but $HOME are accessible

By default, only the $HOME folder is readable/writable after setting up a new personal endpoint on a machine (say, your laptop). You can make additional folders accessible for Globus by editing this file:

$HOME/.globusonline/lta/config-paths

How to batch-download files on the command line?

  1. make sure the globus executable is available
    • for example, on HPC environments, this could require loading an environment module like this: module load Globus
  2. run globus login --no-local-server and follow the login instructions
  3. search for the endpoint ID of the source for the transfer:
    • globus endpoint search "endpoint keywords"
    • for example: globus endpoint search "shared ebi public"
    • result UUID: fd9c190c-b824-11e9-98d7-0a63aa6b37da
    • we will refer to this as UUID-SRC in the following
  4. for convenience, export the UUID as a variable
    • export SOURCE=UUID-SRC
  5. search for the endpoint ID of the target for the transfer
    • (same as above)
    • for example: globus endpoint search "hilbert storage"
    • result UUID: 51a7a935-3716-4096-8d1c-b3f5abbc7544
    • we will refer to this as UUID-TRG in the following
  6. for convenience, export the UUID as a variable
    • export TARGET=UUID-TRG
  7. prepare a file listing input/output pairs:
    • one file pair per line
    • first, specify the source path, i.e. the path to the file on the source Globus endpoint
    • second, specify the target path, i.e., the path to the file on the target (destination) endpoint
    • source/path/file target/path/file
    • note that the target path is relative to the working directory in which you execute Globus
    • save all entries in a text file, e.g., file_pairs.txt
  8. initiate the transfer as follows:
    • globus transfer $SOURCE $TARGET:$PWD --batch --dry-run < file_pairs.txt
    • this command will print the complete list of file transfers
  9. omit the --dry-run option to actually start the transfer

Common problem: finding the file path on Globus

EBI/ENA

File paths can be found in the file report of an accessioned entry, and look like this:

ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR323/000/ERR3239740/ERR3239740_1.fastq.gz

The same file can be located on the EBI/ENA Globus under this path:

/gridftp/ena/fastq/ERR323/000/ERR3239740/ERR3239740_1.fastq.gz

Note the simple change in the path prefix that distinguishes the FTP from the Globus location.

⚠️ **GitHub.com Fallback** ⚠️