MGmapper - ACHG2018/metagenomics-classification-tools GitHub Wiki
Installation
To run MGmapper on a personal computer, KyotoCabinet needs to be installed as a tool to retrieve data from database.
First the package can be downloaded using following command:
git clone https://bitbucket.org/genomicepidemiology/mgmapper.git
After downloading the file, MGmapper.init needs to be set up properly
The content of the initiation file is listed below:
export MGmap_HOME=/Users/juichanglu/AcademicProject/AHG/mgmapper
export MGmap_MAIN_DB=/Users/juichanglu/AcademicProject/AHG/mgmapper/db
export MGmap_PYTHON=/Users/juichanglu/anaconda3/envs/py27/bin/python
export MGmap_BWA=/usr/local/bin/bwa
export MGmap_SAMTOOLS=/Users/juichanglu/anaconda3/envs/py27/bin/samtools
export MGmap_BEDTOOLS=/Users/juichanglu/anaconda3/envs/py27/bin/bedtools
export MGmap_PIGZ=/Users/juichanglu/anaconda3/envs/py27/bin/pigz
export MGmap_CUTADAPT=/Users/juichanglu/anaconda3/envs/py27/bin/cutadapt
export MGmap_MAKEBLASTDB=/Users/juichanglu/anaconda3/envs/py27/bin/makeblastdb
export MGmap_BLASTDBCMD=/Users/juichanglu/anaconda3/envs/py27/bin/blastdbcmd
After editing the file to point to the correct binary on your personal computer run the followin command:
source MGmapper.init
Following initiation, we can now download the official database for testing:
./MGmapper_makedb.pl -d 0,1 -r unix
The command will download bacteria and virus database on unix system. On a personal computer, it is not realistic to download the database. The bacteria database alone exceeds 400GB.
Current issue is to resolve database issue so I can test the tool with our database.
I have tried to construct the database using the following command:
samtools faidx AHG_REF
bwa index AHG_REF
../scripts/tax.pl -t taxid
../scripts/tax.pl -i REF_AHG.inp -p ../databases/mirror/taxonomy -t taxid -o AHG_REF.tax
The database directory contains the following files:
AHG_REF
AHG_REF.fai
AHG_REF.sa
AHG_REF.pac
AHG_REF.ann
AHG_REF.bwt
AHG_REF.tax
AHG_REF.kch
After the database was successfully constructed, the database directory should be added into mgmapper_root/database.txt. The file contains the following content:
drwxr-xr-x 4 juichanglu staff 128 Sep 1 02:11 MGmapper_makedb
# Databases to be use with the MGmapper programs.
# Column 1: Full path to database - exclude suffix names to a database
# Column 2: Name of database to be used in output - can be any name you want. Names in column 2 must be unique.
# The first database phiX174 must be present in the first line - it will not show up when running MGmapper_PE.pl -h or MGmapper_SE.pl -h
# The phiX174 database will be used as first step to clean reads for potential positive control reads
#
/Users/juichanglu/AcademicProject/AHG/mgmapper/db/db/phiX174/phiX174 phiX174
/Users/juichanglu/AcademicProject/AHG/mgmapper/db/db/AHG_REF/AHG_REF AHG_REF
With paired ended fastq reads, we use MGmapper_PE.pl to map the reads to the database we constructed. Upon finishing, the log has the following message:
[M::bwa_idx_load_from_disk] read 0 ALT contigs
[E::main_mem] fail to open file `-p'.
Furthremore, I digged into the code and find that the bwa is executed with follwoing command:
/Users/juichanglu/AcademicProject/AHG/mgmapper/scripts/readsInCommon_interleaved.pl -f mapper_7498/misc/all.F.fq -r mapper_7498/misc/all.R.fq -I | /usr/local/bin/bwa mem -t 1 -p /Users/juichanglu/AcademicProject/AHG/mgmapper/db/db/phiX174/phiX174 -|/Users/juichanglu/anaconda3/envs/py27/bin/samtools view -f1 -f12 -Sb - > mapper_7498/bam/cleaned.nophiX.bam
The error is that bwa can not open file -p. The error is in MGmapper_PE.pl line 2612 where they put the -p argument after dbname so bwa read '-p' as the reads.
I changed the line to the following:
$cmd = "$prog_readsInCommon -f $fastqF -r $fastqR -I | $prog_bwa mem -t $cores -p dbName -| $prog_samtools view -f1 -f12 -Sb - > $outFile";
and this problem is fixed.
The command is ran as follows:
perl MGmapper_PE.pl -i ../data/Challenge_Data_NTC_POS/C16_R1.fastq -j ../data/Challenge_Data_NTC_POS/C16_R2.fastq -c 4 -F 1