Assembling with Shasta on EC2 - Green-Biome-Institute/AWS GitHub Wiki
This page will help you if you would like to run Shasta to assemble a genome in AWS.
Shasta on Ubuntu 20.04
Note: Using a Linux-based OS, we need to use an ‘x86’ architecture. And remember, if you ever have any questions that aren't answered on here, the softwares homepage/github will usually have documentation regarding its use. Like wtdbg2, Shasta has great documentation on its github. There is a link in the resources at the bottom of this page to it.
If you are using an instance that is already assembled to run Shasta, start at step 7. (04/20/21) The current custom EC2 Shasta AMI for GBI has the ID ami-0a5ff45378d20cc4a
and name GBI_ShastaAssembler_Ubuntu_x86_r5.xlarge
. To create an instance from this, follow the instructions on the EC2 page.
- Start Ubuntu Server 20.04 Instance with a 64-bit (x86) processor.
- Log in through terminal:
ssh -i /path/to/keypairs/keypair.pem [email protected]
- Ex.:
ssh -i /Users/flintmitchell/AWS_keypairs/flints-keypair-1.pem [email protected]
- If you are using S3, download awscli to gain access to S3 storage buckets:
sudo apt install awscli
- Make a folder to organize your data and the results that will come from the assembly:
mkdir data_folder_name
- example:
mkdir my_genome_assembly
- Copy data from local or S3 to your data folder:
- S3:
aws s3 cp s3://[bucket-name]/[desired-file] [path/to/instance/location]
- SCP:
scp -i /path/my-key-pair.pem /local_path/file.filename [email protected]:ec2_path/destination
- All of Shasta's dependencies come with it, so all you need to do is the following commands (from the documentation):
curl -O -L https://github.com/chanzuckerberg/shasta/releases/download/0.7.0/shasta-Linux-0.7.0
chmod ugo+x shasta-Linux-0.7.0
- That's all! Now you can do an assembly with the following:
- ./shasta-Linux-0.7.0 --input [path/to/your-sequencing-data.FASTA]
- All the result files will be returned in a folder named "ShastaRun".
- Downloading your results.
- We can once again use the scp command from step 8 (with a slight change) to copy the results to our local storage. We will also use the flag
-r
, which will copy through all the files in a given folder recursively (2 flags can be sent together, so-r
and-i
will be-ri
[note, not-ir
, order matters]) scp, -ir, keypair, results-on-ec2-instance, local file: scp -ri /path/to/keypairs/keypair.pem [email protected]:~/data_folder_name/results_folder_name local/path/to/results_folder
- Ex.
scp -ri /Users/flintmitchell/Desktop/GBI/AWS_keypairs/flints-keypair-1.pem [email protected]:~/ecoli-data/ecoli-ont /Users/flintmitchell/Desktop/GBI/Results
- Ex.
Just like with the other assemblers, I will be updating this page with more information on how Shasta actually works and more about the parameters that we can change to optimize our assemblies.
Resources for Shasta:
https://chanzuckerberg.github.io/shasta/QuickStart.html#QuickStartLinux