Chloroplast and Mitochondrial Assemblies from reads - Green-Biome-Institute/AWS GitHub Wiki

To do an assembly of the chloroplast or mitochondrial assembly from our sequencing reads, we use a program called GetOrganelle. This decision was based on a comparison of chloroplast genome assemblers cited below. This program and its default databases are already installed on the GBI Bioinformatics AMI. If you need help installing it somewhere else, please follow the instructions on their Github page linked above.

The executable within GetOrganelle that we will be using is a python script named get_organelle_from_reads.py

There are many examples of this commands usage, so I will just show three copies of the command here, one for a chloroplast assembly and one for a mitochondrial assembly both using the default embryophyta database, and one for a chloroplast assembly using a custom database.

The general command looks like:

Command:

get_organelle_from_reads.py -1 [first FASTQ file] -2 [second FASTQ file] -t [number of desired threads] -o [desired output directory name] -F [plant organelle database name] -R [seed value] -k [desired k-mer value or series of k-mer values] -s [custom seed database, if necessary]

Now for examples:

Chloroplast assembly:

get_organelle_from_reads.py -1 GBI-2_Cmolli_S78_R1_001_val_1.fq.gz -2 GBI-2_Cmolli_S78_R2_001_val_2.fq.gz  -t 8 -o CMollis_PT_embplantseed -F embplant_pt -R 15 -k 21,45,65,85,105

Mitochondrial assembly:

get_organelle_from_reads.py -1 GBI-2_Cmolli_S78_R1_001_val_1.fq.gz -2 GBI-2_Cmolli_S78_R2_001_val_2.fq.gz -t 8 -o CMollis_MT_embplantseed -F embplant_mt -R 30 -k 21,45,65,85,105

Chloroplast assembly with custom database named Clinariifolia_NADH_HQ384819.fasta:

get_organelle_from_reads.py -1 GBI-2_Cmolli_S78_R1_001_val_1.fq.gz -2 GBI-2_Cmolli_S78_R2_001_val_2.fq.gz -t 8 -o CMollis_PT_customseed -F embplant_mt  -R 15 -k 21,45,65,85,105 -s Clinariifolia_NADH_HQ384819.fasta

To see an example of running these three commands above in one script that eventually stops the EC2 instance you are running it on, check out the following bash script: https://github.com/Green-Biome-Institute/AWS/blob/master/get_organelles.sh

Citations:

Freudenthal, J.A., Pfaff, S., Terhoeven, N. et al. A systematic comparison of chloroplast genome assembly tools. Genome Biol 21, 254 (2020). https://doi.org/10.1186/s13059-020-02153-6