Preparing a custom database - seqan/slimm GitHub Wiki
You might want to download a custom set of reference genomes and use that for taxonomic profiling using SLIMM. For that, you need a corresponding SLIMM database file that can obtained via the slimm_build program.
CASE 1: you have your own set of reference genomes as a FASTA file.
Let's assume you have a multi-fasta file custom_refs.fna
as a set of reference genomes.
- Download the nodes.dmp and names.dmp taxonomy files from NCBI
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/taxdump.tar.gz
tar -xzvf taxdump.tar.gz
- Download the accession2taxaid files from NCBI
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/accession2taxid/{dead_nucl,nucl_wgs,nucl_gb}.accession2taxid.gz
gunzip {dead_nucl,nucl_wgs,nucl_gb}.accession2taxid.gz
- Use slimm_build to build your SLIMM database
./bin/slimm_build -v -b 10000000 -nm taxdump/names.dmp -nd taxdump/nodes.dmp -o slimm_db_custom.sldb custom_refs.fna *.accession2taxid.gz
CASE 2: you just have a SAM/BAM file and you don't know the references genomes
You can create a dummy representative FASTA file for the reference genomes used to produce your SAM/BAM file at hand. For example, if you have SRR_0921301.bam
file you may use the command below to get a toy reference fasta file.
samtools view -H SRR_0921301.bam|grep 'SN:'|awk -F":" '{print ">"$2}' ORS="\nACGT\n" > SRR_0921301_references.fna
Afterwards, you can follow the above steps to get your SLIMM database from SRR_0921301_references.fna
.