Globus - core-unit-bioinformatics/knowledge-base GitHub Wiki
Their own description and basic funding info:
"Globus is a non-profit service for secure, reliable research data management.
With Globus, subscribers can move, share, & discover data via a single
interface – whether your files live on a supercomputer, lab cluster,
tape archive, public cloud or your laptop, you can manage this data
from anywhere, using your existing identities, via just a web browser.
[...]
Globus is a group at the University of Chicago that develops and operates
a non-profit service for use by the research community.
[...]
Globus products and services are developed and operated by the
University of Chicago and Argonne National Laboratory,
supported by funding from the Department of Energy,
the National Science Foundation, and the National Institutes of Health"
By default, only the $HOME
folder is readable/writable
after setting up a new personal endpoint on a machine
(say, your laptop). You can make additional folders
accessible for Globus by editing this file:
$HOME/.globusonline/lta/config-paths
- make sure the
globus
executable is available- for example, on HPC environments, this could require
loading an environment module
like this:
module load Globus
- for example, on HPC environments, this could require
loading an environment module
like this:
- run
globus login --no-local-server
and follow the login instructions - search for the endpoint ID of the source for the transfer:
globus endpoint search "endpoint keywords"
- for example:
globus endpoint search "shared ebi public"
- result UUID:
fd9c190c-b824-11e9-98d7-0a63aa6b37da
- we will refer to this as UUID-SRC in the following
- for convenience, export the UUID as a variable
export SOURCE=UUID-SRC
- search for the endpoint ID of the target for the transfer
- (same as above)
- for example:
globus endpoint search "hilbert storage"
- result UUID:
51a7a935-3716-4096-8d1c-b3f5abbc7544
- we will refer to this as UUID-TRG in the following
- for convenience, export the UUID as a variable
export TARGET=UUID-TRG
- prepare a file listing input/output pairs:
- one file pair per line
- first, specify the source path, i.e. the path to the file on the source Globus endpoint
- second, specify the target path, i.e., the path to the file on the target (destination) endpoint
source/path/file target/path/file
- note that the target path is relative to the working directory in which you execute Globus
- save all entries in a text file, e.g.,
file_pairs.txt
- initiate the transfer as follows:
globus transfer $SOURCE $TARGET:$PWD --batch --dry-run < file_pairs.txt
- this command will print the complete list of file transfers
- omit the
--dry-run
option to actually start the transfer
File paths can be found in the file report of an accessioned entry, and look like this:
ftp://ftp.sra.ebi.ac.uk/vol1/fastq/ERR323/000/ERR3239740/ERR3239740_1.fastq.gz
The same file can be located on the EBI/ENA Globus under this path:
/gridftp/ena/fastq/ERR323/000/ERR3239740/ERR3239740_1.fastq.gz
Note the simple change in the path prefix that distinguishes the FTP from the Globus location.