AWS Data Transfer - VertebrateResequencing/vr-pipe GitHub Wiki

VRPipe does not have any in-built method of getting input data or sending output data. It just works with files on disc, and those files must be visible to all instances in your cluster.

So before you can start using VRPipe for real work you need to transfer some input data from somewhere to the shared filesystem on the instance your vrpipe-server is running on. And before you terminate all your VRPipe-related instances, you need to transfer the result files somewhere else where they won't get deleted.

There are many ways to do these transfers, including simple ftp, sftp, scp et al. There are also ways to get things in and out of Amazon's S3 simple storage service. You can use those mechanisms if you know how, as appropriate.

For the situation where you have some files on a local disc and want to upload them to your EC2 instance and vice versa, we recommend using FDT, which is essentially a faster scp.

  1. Install FDT on the EC2 instance:
    cd (go to ec2-user's home directory)
    wget http://monalisa.cern.ch/FDT/lib/fdt.jar
  2. Open up inbound TCP port 54321 in the security group used by your EC2 instance:
    enter details and add rule
    apply rule changes
  3. Get FDT for your local machine:
    wget http://monalisa.cern.ch/FDT/lib/fdt.jar

To transfer over a single file:
java -jar fdt.jar -sshKey <ec2_keypair_name>.pem ./local.data ec2-user@<ec2_public_ip_address>:destinationDir
(destinationDir here is relative to ec2-user's home directory; you can also specify an absolute path. This example creates ~ec2-user/destinationDir/local.data on the EC2 instance.)

To transfer over an entire directory and all its contents (recursively), add the -r option:
java -jar fdt.jar -sshKey <ec2_keypair_name>.pem -r ./local_dir ec2-user@<ec2_public_ip_address>:destinationDir

To retrieve a directory from the EC2 instance and save its contents locally, reverse the order:
java -jar fdt.jar -sshKey <ec2_keypair_name>.pem -r ec2-user@<ec2_public_ip_address>:resultsDir ./
(this example creates a folder called resultsDir in the current directory on your local machine)