Assembling with NECAT on EC2 - Green-Biome-Institute/AWS GitHub Wiki
Using NECAT to assemble a genome on an EC2 instance
Note: this software must be run on an x86 infrastructure. It will not work using ARM.
Most of the following instructions are from the following link: https://github.com/xiaochuanle/necat NECAT Documentation
If you are using an instance that is already assembled to run NECAT, start at step 8. The current custom EC2 NECAT AMI for GBI has the ID ami-008ba3f536f6e3c07
and name GBI_NECATAssembler_Ubuntu_x86_r5.xlarge
To create an instance from this, follow the instructions on the EC2 page.
- Start Ubuntu Instance with a 64-bit (ARM) processor
- Log in through terminal: ssh -i /path/to/keypairs/keypair.pem [email protected] example: ssh -i /Users/flintmitchell/AWS_keypairs/flints-keypair-1.pem [email protected]
- If you are using S3, download awscli to gain access to S3 storage buckets:
sudo apt install awscli
- Make a folder to organize your data and the results that will come from the assembly:
mkdir data_folder_name
- example:
mkdir my_genome_assembly
- Copy data from local or S3 to your data folder:
- S3:
aws s3 cp s3://[bucket-name]/[desired-file] [path/to/instance/location]
- SCP:
scp -i /path/my-key-pair.pem /local_path/file.filename [email protected]:ec2_path/destination
- Make sure perl is newer than 5.24
perl -v
- Download NECAT, extract, unzip, add the /bin folder within NECAT to the PATH
-
wget https://github.com/xiaochuanle/NECAT/releases/download/v0.0.1_update20200803/necat_20200803_Linux-amd64.tar.gz
-
tar xzvf necat_20200803_Linux-amd64.tar.gz
-
cd NECAT/Linux-amd64/bin
-
export PATH=$PATH:$(pwd)
- Create a directory to contain the assembly
mkdir my-assembly
- Create a config file template using the following command:
necat.pl config my-assembly-config.txt
- Modifying the relative information:
vim my-assembly-config.txt
. Pressi
to insert text into:
PROJECT= your assembly project name
ONT_READ_LIST=read_list.txt
GENOME_SIZE=
THREADS=
MIN_READ_LENGTH=
- Press
escape
then type-wq
and pressenter
to save the file and exit.
Using vim
once again, go into the read_list.txt file:
vim read_list.txt
Type i
to insert and add the full path of the data files including the file itself (fasta or fastq) to this document. If you have multiple files just put each file on a new line (press return before copying the path/filename). Once again, press escape
, type -wq
, and then enter
the save and exit this file.
- Assembly time! Correct raw reads:
necat.pl correct ecoli_config.txt
- Assemble the corrected raw reads:
necat.pl assemble ecoli_config.txt
- Bridge the newly-created contigs:
necat.pl bridge ecoli_config.txt
Your results will be in the file ./project-name/6-bridge_contigs/bridged_contigs.fasta
- Downloading your results. The results and all information produced by canu (logs, documentation, etc.) is put into a folder that you name in the above command (lambda-phage-ont in that example) wherever the data is stored. So if the data is stored in ~/sequencing-data-folder then it will create a new folder within that ~/sequencing-data-folder/data-results-folder. We can once again use the scp command from step 5 (with a slight change) to copy the results to our local storage. We will also use the flag -r, which will copy through all the files in a given folder recursively (2 flags can be sent together, so -r and -i will be -ri [note, not -ir, order matters]) scp -ir keypair results-on-ec2-instance local file:
scp -ri /path/to/keypairs/keypair.pem [email protected]:~/data_folder_name/results_folder_name local/path/to/results_folder
Example: scp -ri /Users/flintmitchell/Desktop/GBI/AWS_keypairs/flints-keypair-1.pem [email protected]:~/lambda-phage-data/lambda-phage-ont /Users/flintmitchell/Desktop/GBI/Results