01c_Annotation of bins using antiSMASH - esogin/seagrassOmics GitHub Wiki
Anotate select MAGs with antiSMASH
EM Sogin
Created: May 22, 2019
Updated: May 22, 2019
Annotate mags with antiSMASH
After identifying some bins of interest based on their differential abundance inside vs. outside the seagrass meadow, I wanted to determine if there are secondary metabolite gene clusters present in their metagenomes. Initially I downloaded the antiSMASH Conda environment and predicted genes in each bin and annotated using antiSMASH. My first try failed: I gave it the coordinates file instead of the genes file (this is what the antiSMASH people said to try)
for i in *fa; do
mkdir ${i%%.fa}
prodigal -i $i -o ${i%%.fa}/${i%%.fa}_coords.gbk -a ${i%%.fa}/${i%%.fa}_orfs.faa -d ${i%%.fa}/${i%%.fa}_genes.fa
antismash ${i%%.fa}/${i%%.fa}_coords.gbk -c 48 --taxon bacteria --outputfolder ${i%%.fa} --full-hmmer
done
rm *fa
For my second try, I decided to first run a single bin with the predicted genes and see what kind of results I could get in comparison to running through the web interface
antismash ${i%%.fa}/${i%%.fa}_coords.gbk -c 48 --taxon bacteria --outputfolder ${i%%.fa} --full-hmmer
The output was somewhat encouraging (3 clusters in comparison to the 5 on the webpage version). The differences is likely due to different versions of the software (v 4.1 for command line and v. 5.0 for web interface). A new command line version of antiSMASH will be available in the future (might want to keep an eye out for it).
For now, I decided to upload each bin to the web interface.
here are some notes:
Notes on AntiSMASH Analysis
-
The above script works but only at the moment with antismash 4.1, instead upload mags to webserver interface and save results for interperation later. (OK- only 25 mags, if more then consider reimplementing)
-
Initial impressions 2a. MAG metabat.232 contains 67 gene clusters that have antismash ID's domains, this is pretty high given that most MAGS (in general) have much lower or no antibiotic gene clusters. This mag is a Cellvibrionaceae and classified iwthin the genus teredinibacter, would be interesting to see how closely related it with other members of the group. However, elevated abundance really only higher inside one library within the meadow. Check 16S reads.
2b. In general, our mags have biosynthetic gene clusters for antiSMASH natural products. A lot are caracterized in the terpene and NRPS (non ribosomal peptide synthetase cluster) categories. Terpenes are interesting as they are largely produced by plants. Might be worth while checking OTHER outside MAGS for gimilar gene clusters
2c. Idea: Check seagrass genome paper for terpenes in genome reptor, do we know if seagrasses are producing them? Has this been characterized. - in genome paper suggests that terpenoid genes are reduced to only 2 genes
Next steps:
-
So a similar analysis for MAGs outside or at edge of meadow sediment, is secondary metabolite synthesis a common feature of marine sediment bacteria?
-
Check 16S PacBio reads for the Cellvibrionaceae 16S sequence ✅ - Cellvibrionaceae is in ASV list but not an factor leading to the discrimination between habitats (which follows from visualization results).