Assembling with NECAT on EC2 - Green-Biome-Institute/AWS GitHub Wiki

Go back to GBI AWS Wiki

Using NECAT to assemble a genome on an EC2 instance

Note: this software must be run on an x86 infrastructure. It will not work using ARM.

Most of the following instructions are from the following link: https://github.com/xiaochuanle/necat NECAT Documentation

If you are using an instance that is already assembled to run NECAT, start at step 8. The current custom EC2 NECAT AMI for GBI has the ID ami-008ba3f536f6e3c07 and name GBI_NECATAssembler_Ubuntu_x86_r5.xlarge To create an instance from this, follow the instructions on the EC2 page.

  1. Start Ubuntu Instance with a 64-bit (ARM) processor
  2. Log in through terminal: ssh -i /path/to/keypairs/keypair.pem [email protected] example: ssh -i /Users/flintmitchell/AWS_keypairs/flints-keypair-1.pem [email protected]
  3. If you are using S3, download awscli to gain access to S3 storage buckets: sudo apt install awscli
  4. Make a folder to organize your data and the results that will come from the assembly: mkdir data_folder_name
  • example: mkdir my_genome_assembly
  1. Copy data from local or S3 to your data folder:
  • S3: aws s3 cp s3://[bucket-name]/[desired-file] [path/to/instance/location]
  • SCP: scp -i /path/my-key-pair.pem /local_path/file.filename [email protected]:ec2_path/destination
  1. Make sure perl is newer than 5.24
  • perl -v
  1. Download NECAT, extract, unzip, add the /bin folder within NECAT to the PATH
  • wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz

  • tar xzvf necat_20200803_Linux-amd64.tar.gz

  • cd NECAT/Linux-amd64/bin

  • export PATH=$PATH:$(pwd)

  1. Create a directory to contain the assembly
  • mkdir my-assembly
  1. Create a config file template using the following command:
  • necat.pl config my-assembly-config.txt
  1. Modifying the relative information:
  • vim my-assembly-config.txt. Press i to insert text into:

PROJECT= your assembly project name

ONT_READ_LIST=read_list.txt

GENOME_SIZE=

THREADS=

MIN_READ_LENGTH=

  • Press escape then type -wq and press enter to save the file and exit.

Using vim once again, go into the read_list.txt file:

vim read_list.txt

Type i to insert and add the full path of the data files including the file itself (fasta or fastq) to this document. If you have multiple files just put each file on a new line (press return before copying the path/filename). Once again, press escape, type -wq, and then enter the save and exit this file.

  1. Assembly time! Correct raw reads:
  • necat.pl correct ecoli_config.txt
  1. Assemble the corrected raw reads:
  • necat.pl assemble ecoli_config.txt
  1. Bridge the newly-created contigs:
  • necat.pl bridge ecoli_config.txt

Your results will be in the file ./project-name/6-bridge_contigs/bridged_contigs.fasta

  1. Downloading your results. The results and all information produced by canu (logs, documentation, etc.) is put into a folder that you name in the above command (lambda-phage-ont in that example) wherever the data is stored. So if the data is stored in ~/sequencing-data-folder then it will create a new folder within that ~/sequencing-data-folder/data-results-folder. We can once again use the scp command from step 5 (with a slight change) to copy the results to our local storage. We will also use the flag -r, which will copy through all the files in a given folder recursively (2 flags can be sent together, so -r and -i will be -ri [note, not -ir, order matters]) scp -ir keypair results-on-ec2-instance local file:

scp -ri /path/to/keypairs/keypair.pem [email protected]:~/data_folder_name/results_folder_name local/path/to/results_folder

Example: scp -ri /Users/flintmitchell/Desktop/GBI/AWS_keypairs/flints-keypair-1.pem [email protected]:~/lambda-phage-data/lambda-phage-ont /Users/flintmitchell/Desktop/GBI/Results

Go back to GBI AWS Wiki