Methodology - Silantoi33/Genomic-Analysis-of-Carbapenem-Resistant-Pseudomonas-aeruginosa GitHub Wiki

Step 1: Downloading Sequence Data

We obtained the sequences of all our strains from NCBI-GenBank by utilizing the provided accession numbers. The sequences were previously assembled, and unfortunately, we were unable to locate raw data for further analysis. The data retrieval was conducted through the Ubuntu terminal.

Setup

sudo apt install python3-pip
pip install ncbi-acc-download

ncbi-acc-download Version=0.2.8

Bash script

!/bin/bash
#Authors: Silantoi & Kevin
#Date: 27/11/2023
#Modified Date: 07/12/2023
#Script name:fasta.acc.sh
#Use: bash fasta.acc.sh

#loops through each file
#links

for file in $(cat /home/sequser/Downloads/GACRPA-Project/fasta.acc.list.txt)
do
#Download fasta files 
ncbi-acc-download --format fasta $file
done

#Download fasta links using the links

for a in $(cat /home/sequser/Downloads/GACRPA-Project/links.txt)
do
wget $a

#gunzip .gz files

gunzip *.gz
done

Output

/home/sequser/Downloads/GACRPA-Project/fasta_files AE004091.2.fa  
                                                   CP014999.1.fa  
                                                   CP015001.1.fa  
                                                   CP015003.1.fa  
                                                   CP021380.2.fa  
                                                   LVXB01.1.fsa_nt  
                                                   PHSS01.1.fsa_nt  
                                                   PHST01.1.fsa_nt

Step 2: Running strain typing (MLST)

Bash script

#!/bin/bash
#Authors: Silantoi & Kevin
#Date: 27/11/2023
#Modified Date: 07/12/2023
#Script name:gacrpa_mlst.sh
#Usage: bash gacrpa_mlst.sh
#Description: For loop bash script to identify the multi locus sequence type (MLST)

# do mlst

conda activate mlst

for a in *.fa
do
mlst $a | cut -f 1,2,3 >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_mlst.tsv
done

for b in *.fsa_nt
do
mlst $b | cut -f 1,2,3 >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_mlst.tsv
done

Output

AE004091.2.fa	paeruginosa	-
CP014999.1.fa	paeruginosa	277
CP015001.1.fa	paeruginosa	277
CP015003.1.fa	paeruginosa	277
CP021380.2.fa	paeruginosa	277
LVXB01.1.fsa_nt	paeruginosa	277
PHSS01.1.fsa_nt	paeruginosa	277
PHST01.1.fsa_nt	paeruginosa	277

Step 3: Screening for AMR, Virulence and Plasmid determinants

Abricate

Setup

conda create -n abricate -c bioconda abricate

List dependencies;

conda activate abricate

Version= abricate 1.0.1

abricate --list

all databases available in abricate will be listed

Output

DATABASE	SEQUENCES	DBTYPE	DATE
ecoli_vf	2701	nucl	2021-Mar-27
resfinder	3142	nucl	2023-Nov-10
ecoh	597	nucl	2021-Mar-27
megares	6635	nucl	2021-Mar-27
ncbi	5386	nucl	2021-Mar-27
card	2631	nucl	2021-Mar-27
plasmidfinder	460	nucl	2021-Mar-27
argannot	2223	nucl	2021-Mar-27
vfdb	2597	nucl	2021-Mar-27

We will use CARD, ResFinder, & Vfdb

CARD

The Comprehensive Antibiotic Resistance Database (CARD; http://arpcard.mcmaster.ca) is a manually curated resource containing high quality reference data on the molecular basis of antimicrobial resistance (AMR), with an emphasis on the genes, proteins and mutations involved in AMR.

ResFinder

ResFinder uses BLAST for identification of acquired antimicrobial resistance genes in whole-genome data.

Vfdb

The virulence factor database (VFDB, http://www.mgc.ac.cn/VFs/) provides up-to-date knowledge of virulence factors (VFs) of various bacterial pathogens.

Bash script

#!/bin/bash
#Authors: Silantoi & Kevin
#Date: 27/11/2023
#Modified Date: 07/12/2023
#Script name: gacrpa.abricate.sh
#Usage: bash script name
#Decription:For loop bash script to screen for AMR, Virulence and plasmids determinants
#Tool: ABRICATE
#Database: Resfinder and CARD for AMR genes, VFDB - Virulence factors and Plasmidfinder - plasmids 


#do resfinder first

for a in *.fa
do
abricate --db resfinder --csv $a >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_resfinder.csv
done

for b in *.fsa_nt
do
abricate --db resfinder --csv $b >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_resfinder.csv
done

#do now card

for c in *.fa
do
abricate --db card --csv $c >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_card.csv
done

for d in *.fsa_nt
do
abricate --db card --csv $d >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_card.csv
done

#do virulence

for e in *.fa
do
abricate --db vfdb --csv $e >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_vfdb.csv
done

for f in *.fsa_nt
do
abricate --db vfdb --csv $f >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_vfdb.csv
done

# do plasmidfinder

for g in *.fa
do
abricate --db plasmidfinder --csv $g >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_plasmids.csv
done

for h in *.fsa_nt
do
abricate --db plasmidfinder --csv $h >> /home/sequser/Downloads/GACRPA-Project/gacrpa_results/gacrpa_plasmids.csv
done

Results

Resfinder

card

vfdb

Step 4: Pangenome Annotation using Prokka 1.11

Genome annotation is the process of identifying functional elements along the sequence of a genome, thus giving meaning to it. It is necessary because the sequencing of DNA produces sequences of unknown function.

Setup

conda create -n prokka

conda activate prokka

conda install -c bioconda prokka

Version=prokka 1.11

Script

#!/bin/bash
#Authors: Silantoi & Kevin
#Date: 27/11/2023
#Modified Date: 07/12/2023
#Script name: gacrpa.prokka.sh
#Usage: bash gacrpa.prokka.sh
#Description: Annotation of gemones using PROKKA: rapid prokaryotic genome annotation tool

for file in *.fa
do 
tag=${file%.fa}
prokka --cpus 16 --prefix "$tag" --locustag "$tag" --outdir "$tag"_prokka "$file"
done


for file in *.fsa_nt
do
tag=${file%.fsa_nt}
prokka --cpus 16 --prefix "$tag" --locustag "$tag" --outdir "$tag"_prokka "$file"
done

#move output to another directory

mv *_prokka ./gacrpa_results/prokka

Output

AE004091.2_prokka
CP014999.1_prokka  
CP015001.1_prokka  
CP015003.1_prokka  
CP021380.2_prokka 
LVXB01.1_prokka  
PHSS01.1_prokka  
PHST01.1_prokka

Step 5: Gene alignment using Roary 3.13.0

#Roary-is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome

Setup

conda create -n roary

conda activate roary

mamba install roary
Version=roary 3.13.0

Prokka output files are in the.gff file format. The .gff files are used in roary pipeline to align genes.

We cd to all prokka results directory and copied the .gff files to new directory called GFF

GFF output

AE004091.2.gff  
CP014999.1.gff  
CP015001.1.gff  
CP015003.1.gff  
CP021380.2.gff  
LVXB01.1.gff  
PHSS01.1.gff  
PHST01.1.gff

Roary Command to alignment .gff files

roary -f roaryresults -e -n -v -mafft -p 16 *.gff

Output

accessory_binary_genes.fa         
accessory.header.embl          
clustered_proteins          
core_accessory.tab          
gene_presence_absence.csv       
number_of_genes_in_pan_genome.Rtab  
pan_genome_reference.fa
accessory_binary_genes.fa.newick  
accessory.tab                  
core_accessory_graph.dot    
core_alignment_header.embl  
gene_presence_absence.Rtab      
number_of_new_genes.Rtab            
summary_statistics.txt
accessory_graph.dot               
blast_identity_frequency.Rtab  
core_accessory.header.embl  
core_gene_alignment.aln     
number_of_conserved_genes.Rtab  
number_of_unique_genes.Rtab

The file that has aligned sequences is core_gene_alignment.aln

Step 6: Phylogenetic construction using RAxML-NG 0.9.0

Setup

conda create -n raxml

conda activate raxml

conda install raxml-ng

Run raxml-ng

raxml-ng --all --msa core_gene_alignment.aln --model LG+G8+F --tree pars{10} --bs-trees 200

Output

core_gene_alignment.aln.raxml.bestModel  
core_gene_alignment.aln.raxml.bootstraps  
core_gene_alignment.aln.raxml.mlTrees  
core_gene_alignment.aln.raxml.startTree
core_gene_alignment.aln.raxml.bestTree   
core_gene_alignment.aln.raxml.log         
core_gene_alignment.aln.raxml.rba      
core_gene_alignment.aln.raxml.support

Visualization of Phylogenetic tree using (Interactive Tree Of Life) ITOL

Upload this output file from raxml-ng core_gene_alignment.aln.raxml.bestTree
Once uploaded, on the left side of your screen, on Branch lengths tap "ignore" to visualize your tree.

Results

Tree

CSI phylogeny tree

Step 7: Comparative Genomic Analysis Using BRIG 0.95

We followed a youtube tutorial to use BRIG

To install we follow this youtube tutorial to install BRIG. Click here
After installing we still used youtube tutorial to run BRIG. Click here

Results

BRIG IMAGE