Example yara mapper - seqan/slimm GitHub Wiki

  1. Install slimm using your method of choice. We recommend conda. conda install -c bioconda slimm

    Other choices are:

  2. Create a fresh directory called slimm_tutorial

     mkdir ~/slimm_tutorial
     cd ~/slimm_tutorial
    
  3. Download and extract the viruses only version yara index (V_genomes_indices_20180924_yara.zip) from here. This yara index contains reference genome database of different viral species.

     wget https://ftp.mi.fu-berlin.de/pub/dadi/slimm/V_genomes_indices_20180924_yara.zip
     unzip V_genomes_indices_20180924_yara.zip 
    
  4. Download the corresponding SLIMM database (slimm_db_20180924.sldb) from here. The same SLIMM database can be used for other groups such as Archea Bacteria Fungi and Viral as well as their combinations.

     wget https://ftp.mi.fu-berlin.de/pub/dadi/slimm/slimm_db_20180924.sldb
    
  5. Create two new directories with the name alignment_files slimm_reports and place your metagenomic sequencing reads under a directory named mg_reads . If you don't have one yet, you may download SRR1057982 and use it. Now you should have the following folder/directory structure in your working directory:

       Working Directory
       │
       ├── V_genomes_indices_20180924_yara (indexed reference genomes)
       ├    ├── V_genomes.lf.drs
       ├    ├── V_genomes.lf.drv 
       ├    ├── V_genomes.rid.concat
       ├    ├── ...
       │
       ├── slimm_db_20180924.sldb (SLIMM taxonomic database)
       │
       ├── alignment_files (alignment files will be stored here)
       │
       ├── slimm_reports (slimm taxonomic reports will be stored here)
       │
       ├── mg_reads (metagenomic sequencing reads) 
       ├    ├── SRR1748536_1.fastq
       ├    ├── SRR1748536_2.fastq
    
  6. Use yara-mapper to map the metagenomic reads against reference genomes and produce alignment files.

     yara_mapper -v -t 30 -s 2 -sa record \
             -o ./alignment_files/SRR1748536.bam \
             .V_genomes_indices_20180924_yara/V_genomes \
             ./mg_reads/SRR1748536_1.fastq \
             ./mg_reads/SRR1748536_2.fastq \
    
  7. Run SLIMM on the output of the read mapper (SAM/BAM files)

     slimm -w 1000 \
           -o slimm_reports/ \
            slimm_db_20180924.sldb \
            alignment_files/SRR1748536.bam
    

You will find a taxonomic profile of your sample under the directory slimm_reports/ with the name SRR1748536_profile.tsv. The file contains a multi-level taxonomic profile of the sample that SLIMM reported. The first column indicates the taxonomic rank for easy filtering.

You can also tell SLIMM to report only a single rank using -r parameter. For example,

    slimm -w 1000 -r species \
          -o slimm_reports/ \
           slimm_db_20180924.sldb \
           alignment_files/SRR1748536.bam

Would generate species level report only.

(see slimm --help for more details)