PathoMap - PathoScope/PathoScope GitHub Wiki

3. The PathoMap module

This module will take your reads and map them against a target library. Then it will map the mapping reads against a filter library. All the reads that map to the filter library with a score equal or greater than the mapping score to the target library will be discarded. The reads mapping with a better score to the target library will be used downstream by the PathoID module.

This computational subtraction mapping approach starts by building genome indices for your target and filter libraries, splitting the libraries as necessary to comply to Bowtie2 architecture (in files of up to 4.3 Gb; Step 1). Then, PathoMap is going to map your reads (fastq) to whatever indices and create a single merged sam file (Step 2). The resulting file will be a filtered sam file, which serves as input for PathoID.

Diagram 2:

As we stated previously, we will be using reads derived from a Titi Monkey outbreak of unknown viral origin (Chen et al. 2012). Here, we are asking what kind of virus is likely responsible for the outbreak by mapping all the reads against our target library. is library contains all the sequence data from GenBank under the virus taxonomy ID and it was obtained using PathoScope’s library module. Originally, the authors didn’t know the etiologic agent of the outbreak. Our case is different because we have included the culprit’s genome in the target library. Let’s see how it goes.

First, let’s map the reads from SRR167721 to our Target library...

python pathoscope.py MAP -U ../data/SRR167721.fastq -targetRefFiles viral -filterRefFiles human.fa,phix174.fa  -outDir ../results -outAlign SRR167721.sam  -expTag tutorial

Let’s dissect the process. We told PathoMap that our input file is SRR167721.fastq (-U), we indicated our Target and Filter files (-targetRefFiles, -filterRefFiles), and output directory (-outDir), filename (-outAlign) and experiment tag (-expTag). PathoMap is designed so that you could ‘enter’ it at any point. For example, if you already have indices built, you could simply specify their prefixes (-targetIndexPrefixes, -filterIndexPre xes), or if you already have alignment les (sam) for your data against Target and Filter libraries you could start from there as well (-targetAlignFiles, -filterAlignFiles).

In this example, we use -U option to specify our input file. If you have paired-end data, you probably want to use -1 and -2 options, as in the example below:

python pathoscope.py MAP -1 ../data/SRR167721_R1.fastq -2 ../data/SRR167721_R2.fastq -targetRefFiles viral -filterRefFiles human.fa,phix174.fa  -outDir ../results -outAlign SRR167721.sam  -expTag tutorial

As you can see from the screenshot above, PathoMap is checking whether our target and filter libraries need be split and whether there are indices built, to then proceed with the mapping. By default, Bowtie2 settings are -p8 --very-sensitive-local -k 100 --score-min L,0,1.2. Basically a score-min of L,0,1.2 corresponds to about 90% identity Score = 0.9 * 2 * L - 0.1 * 6 * L = 1.2 L. In order to give some leeway for the case when there is a gap which has a penalty of 11, and to have some minimum cutoff etc., we selected the above default setting, which should work in most of the cases. There is an option for the user to provide custom bowtie2 parameters, if the user wishes to override the default.

Now, our reads file SRR167721.fastq had an overall aligning rate of 5.56%, 77.46%, and 0.00% against viral, human, and PhiX174 databases, respectively. Let’s move on to PathoID!