Globus on dtn - shawfdong/hyades GitHub Wiki
GridFTP is an extension of the standard FTP protocol for high-speed, reliable, and secure data transfer[1]. It achieves high performance by using parallel TCP streams and multi-node transfers. GridFTP separates control and data channels. By default, the control channel listens on TCP port 2811.
GridFTP provides much of the underlying framework for Globus Connect and is part of the Globus Toolkit. Among other things, the Globus Toolkit provides:
- globus-gridftp-server: a server implementation of the GridFTP protocol
- globus-url-copy: a scriptable command line client
On dtn.ucsc.edu, the version of Globus Toolkit is 5.2.5. Here are some sample commands, using the client:
Listing directory on an ESnet Data Transfer Node[2]
$ globus-url-copy -list ftp://lbl-diskpt1.es.net:2811/data1/
Copying file from the ESnet DTN at LBL (Berkeley, CA) to UCSC DTN (4 streams):
$ export GLOBUS_TCP_PORT_RANGE=50000,51000 $ globus-url-copy -vb -fast -p 4 ftp://lbl-diskpt1.es.net:2811/data1/10G.dat file:///data/shaw/10G.datWe've observed an average transfer speed of 740 MB/sec between the 2 Data Transfer Nodes.
Copying file from the ESnet DTN at BNL (near NYC, NY) to UCSC DTN (8 streams):
$ export GLOBUS_TCP_PORT_RANGE=50000,51000 $ globus-url-copy -vb -fast -p 8 ftp://bnl-diskpt1.es.net:2811/data1/10G.dat file:///data/shaw/10G.datWe've observed an average transfer speed of 470 MB/sec between the 2 Data Transfer Nodes.
Note::
Somewhat like active FTP, the client globus-url-copy first connects from a random unprivileged port to GridFTP server's control channel port, 2811 by default. Next the client globus-url-copy starts listening on a random unprivileged port of the data channel. Then the server connects back to the to the client's specified data port, and start to transfer data.
The data port can be anything like 52235, but the firewall on dtn.ucsc.edu only opens the port range 50000 – 51000. Thus globus-url-copy can fail with the misleading error of No route to host! The solution here is to set the environment variable:
$ export GLOBUS_TCP_PORT_RANGE=50000,51000such that globus-url-copy will listens on a port between 50000 and 51000.
I'll use one of NERSC's Data Transfer Nodes to SSHFTP (GridFTP-over-SSH), which uses SSH for authentication, on dtn.ucsc.edu[3].
$ ssh dtn01.nersc.gov [NERSC]$ module load globus [NERSC]$ globus-url-copy -list sshftp://[email protected]/data/shaw/ [NERSC]$ cd $GSCRATCH [NERSC]$ globus-url-copy -vb -fast -p 8 sshftp://[email protected]/data/shaw/100G.dat 100G.datI observed a very respectable average transfer speed of about 315 MB/sec, with peak speed at almost 800 MB/s, when I did the test!
Unfortunately, dtn still uses a self-signed certificate, generated by the Globus toolkit. Consequently, the procedure will be a lot more involved.
$ openssl x509 -in 3f63c941.0 -subject -noout subject= /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394dNOTE CN of the self-signed certificate is 30450880-21b9-11e4-b5bd-12313940394d, not dtn.ucsc.edu!
I'll use Hyades to test the GSIFTP, which uses digital signature for authentication, on dtn.
Download the Globus Connect CA certificate to Hyades and create symbolic links:
[hyades]$ cd ~/.globus/certificates [hyades]$ scp [email protected]:/usr/lib/python2.6/site-packages/globus/connect/security/go-ca-cert.* . [hyades]$ openssl x509 -in go-ca-cert.pem -hash -noout 7a42187f [hyades]$ ln -s go-ca-cert.pem 7a42187f.0 [hyades]$ ln go-ca-cert.signing_policy 7a42187f.signing_policy
Download my CILogon certificate (usercred-cilogon.p12) and copy it to Hyades. Then run:
[hyades]$ cd ~/.globus/ [hyades]$ mv usercred-cilogon.p12 usercred.p12 [hyades$ chmod 0600 usercred.p12
Generate a new proxy certificate:
[hyades]$ grid-proxy-initwhich will generate, from my CILogon certificate, a short-lived proxy certificate, along with its private key, and save them at /tmp/x509up_u${UID}. I'll use this proxy certificate to authenticate onto dtn.
List the files on dtn:
[hyades]$ globus-url-copy \ -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" \ -list gsiftp://[email protected]:2811/NOTE We have to us the -ss option to specify the certificate subject of the source server (dtn), because the aforementioned certificate does not have dtn.ucsc.edu as its CN!
Copy a 10GB file from dtn to Hyades:
[hyades]$ globus-url-copy \ -ss "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" \ -vb -fast -p 4 \ gsiftp://[email protected]:2811/data/shaw/10G.dat \ /scratch/tmp/10G.datThe average transfer speed was 520 MB/s when I did the test.
GridFTP allows remote transfer between two servers to be initiated by a third party. For example, on Hyades, we can initiate a file transfer from an ESnet Data Transfer Node to dtn:
[hyades]$ globus-url-copy \ -ds "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" \ -vb -fast -p 4 \ ftp://lbl-diskpt1.es.net:2811/data1/10G.dat \ gsiftp://[email protected]:2811/data/shaw/10G.datNOTE here we use the -ds option to specify the certificate subject of the destination server (dtn).
Globus is a fast, reliable file transfer service that makes it easy for users to move data between two GridFTP servers or between a GridFTP server and a user’s machine (Windows, Mac or Linux). Globus automates the activity of managing file transfers: monitoring performance, retrying failed transfers, recovering from faults automatically whenever possible, and reporting status. For example, the number of parallel streams initiated is dependent on the size of the files being transferred: 2 streams for files less than 50MB, 4 streams for files between 50Mb and 250MB and 8 streams for files >250MB.
The Globus endpoint name for dtn.ucsc.edu used to be jsonstro#dtn, which still works. To allow for better discoverability, I've added a new endpoint name for dtn, ucsc#dtn, using the Globus Command Line Interface (CLI)[4]. Here are the steps I took:
Sign up a Globus account named ucsc at https://www.globus.org/SignUp.
Generate a pair of SSH authentication keys:
$ cd ~/.ssh $ ssh-keygen -t rsa -b 2048 -f globus
Upload the public key, globus.pub, to my Globus account, by visiting the Manage Identities page and clicking "add linked identity", followed by "Add SSH Public Key".
Log in to cli.globusonline.org
$ ssh -i ~/.ssh/globus -l ucsc cli.globusonline.org $ endpoint-add -p dtn.ucsc.edu -s "/C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d" dtn $ endpoint-modify --public dtn $ endpoint-modify --myproxy-oauth-server=cilogon.org dtn $ endpoint-modify --default-directory=/data/ dtn $ endpoint-list -vp ucsc#dtn Name : ucsc#dtn Host(s) : gsiftp://dtn.ucsc.edu:2811 Subject(s) : /C=US/O=Globus Consortium/OU=Globus Connect Service/CN=30450880-21b9-11e4-b5bd-12313940394d Target Endpoint : n/a Default Directory : /data/ Force Encrypted Transfer: No Disable Verify : No MyProxy Server : n/a MyProxy DN : n/a MyProxy OAuth Server : cilogon.org Port Range : n/a Credential Status : n/a Credential Expires : n/a Credential Time Left : n/a Credential Subject : n/a S3 URL : n/a Owner Activated : No Managed Endpoint : No Provider Subscription : n/a