Analysis pipeline: From scratch to assembled genome on AWS - Green-Biome-Institute/AWS GitHub Wiki

Go back to GBI AWS Wiki

This page is a quick map of procedure for taking you from raw data to having an assembled genome.

To start, we have our raw data on our local computer. First things first, we need to set up AWS.

  1. Log into your AWS account. If you don't already have one, contact the person in charge of your lab / class who can set you up with an account.

  2. Navigate to the EC2 dashboard.

  3. Launch an EC2 instance using the GBI AMI with the correct amount of storage, memory, and computational capacity. Save the keypair to a secure location.

  4. Wait for it to start running and then get the IP address of the EC2 instance. Open Terminal or PowerShell and SSH into your EC2 instance.

  5. Copy your data from your local computer onto the EC2 instance using SCP.

  6. FastQC and trim your data. Do any other pre-processing necessary.

  7. Use an assembly program to assemble your trimmed sequencing reads.

  8. Using BUSCO and QUAST, analyze the quality of the assembled genome.

  9. If necessary run further polishing/scaffolding steps of the assembled genome and re-run QC checks.

  10. If trying to annotate, follow the procedure of masking non-coding regions and doing sequence / functional annotation.

Go back to GBI AWS Wiki