Globus on dtn - shawfdong/hyades GitHub Wiki

GridFTP is an extension of the standard FTP protocol for high-speed, reliable, and secure data transfer[1]. It achieves high performance by using parallel TCP streams and multi-node transfers. GridFTP separates control and data channels. By default, the control channel listens on TCP port 2811.

GridFTP provides much of the underlying framework for Globus Connect and is part of the Globus Toolkit. Among other things, the Globus Toolkit provides:

  • globus-gridftp-server: a server implementation of the GridFTP protocol
  • globus-url-copy: a scriptable command line client

Table of Contents

globus-url-copy

On dtn.ucsc.edu, the version of Globus Toolkit is 5.2.5. Here are some sample commands, using the client:

Listing directory on an ESnet Data Transfer Node[2]

$ globus-url-copy -list ftp://lbl-diskpt1.es.net:2811/data1/

Copying file from the ESnet DTN at LBL (Berkeley, CA) to UCSC DTN (4 streams):

$ export GLOBUS_TCP_PORT_RANGE=50000,51000
$ globus-url-copy -vb -fast -p 4 ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///data/shaw/10G.dat
We've observed an average transfer speed of 740 MB/sec between the 2 Data Transfer Nodes.

Copying file from the ESnet DTN at BNL (near NYC, NY) to UCSC DTN (8 streams):

$ export GLOBUS_TCP_PORT_RANGE=50000,51000
$ globus-url-copy -vb -fast -p 8 ftp://bnl-diskpt1.es.net:2811/data1/10G.dat file:///data/shaw/10G.dat
We've observed an average transfer speed of 470 MB/sec between the 2 Data Transfer Nodes.

Note::

Somewhat like active FTP, the client globus-url-copy first connects from a random unprivileged port to GridFTP server's control channel port, 2811 by default. Next the client globus-url-copy starts listening on a random unprivileged port of the data channel. Then the server connects back to the to the client's specified data port, and start to transfer data.

The data port can be anything like 52235, but the firewall on dtn.ucsc.edu only opens the port range 50000 – 51000. Thus globus-url-copy can fail with the misleading error of No route to host! The solution here is to set the environment variable:

$ export GLOBUS_TCP_PORT_RANGE=50000,51000
such that globus-url-copy will listens on a port between 50000 and 51000.

globus-gridftp-server

SSHFTP

I'll use one of NERSC's Data Transfer Nodes to SSHFTP (GridFTP-over-SSH), which uses SSH for authentication, on dtn.ucsc.edu[3].

$ ssh dtn01.nersc.gov
[NERSC]$ module load globus
[NERSC]$ globus-url-copy -list sshftp://[email protected]/data/shaw/
[NERSC]$ cd $GSCRATCH
[NERSC]$ globus-url-copy -vb -fast -p 8 sshftp://[email protected]/data/shaw/100G.dat 100G.dat
I observed a very respectable average transfer speed of about 315 MB/sec, with peak speed at almost 800 MB/s, when I did the test!

GSIFTP

Unfortunately, dtn still uses a self-signed certificate, generated by the Globus toolkit. Consequently, the procedure will be a lot more involved.

$ openssl x509 -in 3f63c941.0 -subject -noout
subject= /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d
NOTE CN of the self-signed certificate is 30450880-21b9-11e4-b5bd-12313940394d, not dtn.ucsc.edu!

I'll use Hyades to test the GSIFTP, which uses digital signature for authentication, on dtn.

Download the Globus Connect CA certificate to Hyades and create symbolic links:

[hyades]$ cd ~/.globus/certificates
[hyades]$ scp [email protected]:/usr/lib/python2.6/site-packages/globus/connect/security/go-ca-cert.* .
[hyades]$ openssl x509 -in go-ca-cert.pem -hash -noout
7a42187f
[hyades]$ ln -s go-ca-cert.pem 7a42187f.0
[hyades]$ ln go-ca-cert.signing_policy 7a42187f.signing_policy

Download my CILogon certificate (usercred-cilogon.p12) and copy it to Hyades. Then run:

[hyades]$ cd ~/.globus/
[hyades]$ mv usercred-cilogon.p12 usercred.p12
[hyades$ chmod 0600 usercred.p12

Generate a new proxy certificate:

[hyades]$ grid-proxy-init
which will generate, from my CILogon certificate, a short-lived proxy certificate, along with its private key, and save them at /tmp/x509up_u${UID}. I'll use this proxy certificate to authenticate onto dtn.

List the files on dtn:

[hyades]$ globus-url-copy \
    -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" \
    -list gsiftp://[email protected]:2811/
NOTE We have to us the -ss option to specify the certificate subject of the source server (dtn), because the aforementioned certificate does not have dtn.ucsc.edu as its CN!

Copy a 10GB file from dtn to Hyades:

[hyades]$ globus-url-copy \
    -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d"  \
    -vb -fast -p 4 \
    gsiftp://[email protected]:2811/data/shaw/10G.dat \
    /scratch/tmp/10G.dat
The average transfer speed was 520 MB/s when I did the test.

Third Party Transfer

GridFTP allows remote transfer between two servers to be initiated by a third party. For example, on Hyades, we can initiate a file transfer from an ESnet Data Transfer Node to dtn:

[hyades]$ globus-url-copy \
    -ds "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" \
    -vb -fast -p 4  \
    ftp://lbl-diskpt1.es.net:2811/data1/10G.dat \
    gsiftp://[email protected]:2811/data/shaw/10G.dat
NOTE here we use the -ds option to specify the certificate subject of the destination server (dtn).

Globus Online

Globus is a fast, reliable file transfer service that makes it easy for users to move data between two GridFTP servers or between a GridFTP server and a user’s machine (Windows, Mac or Linux). Globus automates the activity of managing file transfers: monitoring performance, retrying failed transfers, recovering from faults automatically whenever possible, and reporting status. For example, the number of parallel streams initiated is dependent on the size of the files being transferred: 2 streams for files less than 50MB, 4 streams for files between 50Mb and 250MB and 8 streams for files >250MB.

Globus CLI

The Globus endpoint name for dtn.ucsc.edu used to be jsonstro#dtn, which still works. To allow for better discoverability, I've added a new endpoint name for dtn, ucsc#dtn, using the Globus Command Line Interface (CLI)[4]. Here are the steps I took:

Sign up a Globus account named ucsc at https://www.globus.org/SignUp.

Generate a pair of SSH authentication keys:

$ cd ~/.ssh
$ ssh-keygen -t rsa -b 2048 -f globus

Upload the public key, globus.pub, to my Globus account, by visiting the Manage Identities page and clicking "add linked identity", followed by "Add SSH Public Key".

Log in to cli.globusonline.org

$ ssh -i ~/.ssh/globus -l ucsc cli.globusonline.org
$ endpoint-add -p dtn.ucsc.edu -s "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" dtn
$ endpoint-modify --public dtn
$ endpoint-modify --myproxy-oauth-server=cilogon.org dtn
$ endpoint-modify --default-directory=/data/ dtn
$ endpoint-list -vp ucsc#dtn
Name                    : ucsc#dtn
Host(s)                 : gsiftp://dtn.ucsc.edu:2811
Subject(s)              : /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d
Target Endpoint         : n/a
Default Directory       : /data/
Force Encrypted Transfer: No
Disable Verify          : No
MyProxy Server          : n/a
MyProxy DN              : n/a
MyProxy OAuth Server    : cilogon.org
Port Range              : n/a
Credential Status       : n/a
Credential Expires      : n/a
Credential Time Left    : n/a
Credential Subject      : n/a
S3 URL                  : n/a
Owner Activated         : No
Managed Endpoint        : No
Provider Subscription   : n/a

References

  1. ^ GridFTP Key Concepts
  2. ^ ESnet Data Transfer Nodes
  3. ^ Globus Toolkit 5.2.5 - Admin Guide
  4. ^ Using the Globus Command Line Interface (CLI)
⚠️ **GitHub.com Fallback** ⚠️