Assembling with Shasta on EC2 - Green-Biome-Institute/AWS GitHub Wiki

This page will help you if you would like to run Shasta to assemble a genome in AWS.

Shasta on Ubuntu 20.04

Note: Using a Linux-based OS, we need to use an ‘x86’ architecture. And remember, if you ever have any questions that aren't answered on here, the softwares homepage/github will usually have documentation regarding its use. Like wtdbg2, Shasta has great documentation on its github. There is a link in the resources at the bottom of this page to it.

If you are using an instance that is already assembled to run Shasta, start at step 7. (04/20/21) The current custom EC2 Shasta AMI for GBI has the ID ami-0a5ff45378d20cc4a and name GBI_ShastaAssembler_Ubuntu_x86_r5.xlarge. To create an instance from this, follow the instructions on the EC2 page.

Start Ubuntu Server 20.04 Instance with a 64-bit (x86) processor.
Log in through terminal: ssh -i /path/to/keypairs/keypair.pem [email protected]

Ex.: ssh -i /Users/flintmitchell/AWS_keypairs/flints-keypair-1.pem [email protected]

If you are using S3, download awscli to gain access to S3 storage buckets: sudo apt install awscli
Make a folder to organize your data and the results that will come from the assembly: mkdir data_folder_name

example: mkdir my_genome_assembly

Copy data from local or S3 to your data folder:

S3: aws s3 cp s3://[bucket-name]/[desired-file] [path/to/instance/location]
SCP: scp -i /path/my-key-pair.pem /local_path/file.filename [email protected]:ec2_path/destination

All of Shasta's dependencies come with it, so all you need to do is the following commands (from the documentation):

curl -O -L https://github.com/chanzuckerberg/shasta/releases/download/0.7.0/shasta-Linux-0.7.0
chmod ugo+x shasta-Linux-0.7.0

That's all! Now you can do an assembly with the following:

./shasta-Linux-0.7.0 --input [path/to/your-sequencing-data.FASTA]

All the result files will be returned in a folder named "ShastaRun".
Downloading your results.

We can once again use the scp command from step 8 (with a slight change) to copy the results to our local storage. We will also use the flag -r, which will copy through all the files in a given folder recursively (2 flags can be sent together, so -r and -i will be -ri [note, not -ir, order matters]) scp, -ir, keypair, results-on-ec2-instance, local file:
scp -ri /path/to/keypairs/keypair.pem [email protected]:~/data_folder_name/results_folder_name local/path/to/results_folder
- Ex. scp -ri /Users/flintmitchell/Desktop/GBI/AWS_keypairs/flints-keypair-1.pem [email protected]:~/ecoli-data/ecoli-ont /Users/flintmitchell/Desktop/GBI/Results

Just like with the other assemblers, I will be updating this page with more information on how Shasta actually works and more about the parameters that we can change to optimize our assemblies.

Resources for Shasta:

https://chanzuckerberg.github.io/shasta/QuickStart.html#QuickStartLinux

https://www.biorxiv.org/content/10.1101/715722v1.full.pdf

Go back to GBI AWS Wiki