5. Generation of read mapping plasmid figure - maxlcummins/APEC-MGEN-2018 GitHub Wiki
APEC samples commonly carry large IncF virulence plasmids. Recently the complete sequence of an Australian ColV virulence plasmid, pCERC4, was published. This plasmid was decided as an appropriate reference sequence that could be used to investigate the possibility that our APEC strains carry a virulence plasmid similar to pCERC4.
We used BWA and samtools to map the reads of our APEC sequences to pCERC4, determined the median read depth across short bin sizes (length 10), and plotted these bins as present or absent depending on whether they had a read depth greater than or equal to 10. This allows visualisation of regions of plasmid homology on a strain by strain basis. Addition of an annotated schematic of the genetic content of pCERC4 shows which regions of pCERC4 are likely to be present within the APEC strains under analysis.
After downloading pCERC4 via NCBI, the figure was produced as follows:
bwa index pCERC4_KU578032.fasta
An array was defined as below to facilitate the use of a for loop
array=($(ls read_directory/*.fastq.gz))
A for loop was used to feed read pairs into an executable 'bwa.qsub'.
for ((i=0; i<${#array[@]}; i+=2));
do qsub -v REF=pCERC4_KU578032.fasta,R1=${array[i]},R2=${array[i+1]},OUT=${array[i]%R1_001.fastq.gz} bwa.qsub;
done
'bwa.qsub' contained, along with the appropriate job submission syntax, the following
bwa mem -t16 -MY $REF $R1 $R2 | samtools view -ubS -F 0x904 - | samtools sort -@8 -T $REF - -o ${R1}.bam           
for fn in `ls ./*.bam`;
do samtools depth ${fn} > ${fn%.bam}_coverage.txt;
done;
mkdir Coverage_files;
mv *_coverage.txt Coverage_files
A custom script based on that produced by Matt DeMaere was used to plot the heatmap. Here the binsize and ticksize were 250 and 100, respectively.
python chooklord.py -b <binsize> -t <ticksize> Coverage_files pCERC4_plasmidmap_b10_t100.pdf
Manual image manipulation was required to combine a simplified schematic of pCERC4 that visualises some genetic elements of interest associated with virulence and microbial evolution. This schematic was generated in SnapGene.
This SnapGene schematic was then added atop the plasmid read-map plot via an image editor to produce the following final image. You can find a PDF version of this final image here
- Python 2.7
- bwa - 0.7.17
- matplotlib - 2.1.0
- numpy - 1.13.3
- pandas - 0.20.3
- samtools - 1.9
- scipy - 0.19.1
Many thanks to Matt DeMaere for his efforts in putting together the original script and helping to adapt and repurpose it for our investigation.