Silva database - MeryemAk/dRNASeq GitHub Wiki

Silva database

The silva ARB files are curated rRNA sequence datasets formatted for use with the ARB software, mainly for phylogenetic and taxonomic analysis.

Main datasets:

  • SSU Ref NR 99: Non-redundant 16S/18S rRNA sequences (99% identity filtered).
  • SSU Ref: High-quality full-length SSU sequences.
  • LSU Ref NR 99: Non-redundant 23S/28S rRNA sequences.
  • LSU Ref: Full-length LSU sequences.

Note:
Sufficient RAM is needed. FASTA versions are also available.

The SSU Ref NR 99 and LSU Ref NR 99 datasets will be used to filter rRNA sequences from the FASTQ files through mapping with minimap2.

Download through Linux terminal:

Step 1: Navigate to the reference_genomes/ folder and download the zipped fasta files.

cd $HOME/dRNASeq/reference_genomes/
wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_2/Exports/SILVA_138.2_LSURef_NR99_tax_silva.fasta.gz  
wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_2/Exports/SILVA_138.2_SSURef_NR99_tax_silva.fasta.gz           

Step 2: Unzip the fasta.gz files

gunzip SILVA_138.2_*

Step 3: Concatenate both files into one big file and call it rRNA_database.fasta

cat SILVA_138.2_* > rRNA_database.fasta

Step 4: Activate the Conda environment (if not active already), and index the rRNA_database.fasta file with minimap2.

conda activate dRNAseq
minimap2 -x map-ont -d rRNA_database.mmi rRNA_database.fasta

Download through the silva website:

Step 1: Go to the silva website and click on File archive.

Step 2: Scroll down to find the latest release (at the time of writing: 138.2) and click on it.

Step 3: Click on Exports.

Step 4: Find the fasta.gz files for SSU and LSU Ref NR 99 and download them by clicking on the link.

Step 5: In the Linux terminal, navigate to the downloads directory. List the contents of the directory, both the SSU and LSU fasta.gz files should be present here.

cd Downloads/
ls

Step 6: Unzip both files with:

gunzip SILVA_138.2_*

Now the same files appear but with only a .fasta extension.

Step 7: Concatenate both files into one big file and call it rRNA_database.fasta

cat SILVA_138.2_* > rRNA_database.fasta

Step 8: Activate the Conda environment (if not active already), and index the rRNA_database.fasta file with minimap2.

conda activate dRNAseq
minimap2 -x map-ont -d rRNA_database.mmi rRNA_database.fasta

Step 9: Move the indexed file to the reference_genomes/ directory within the dRNASeq folder.

mv rRNA_database.mmi $HOME/dRNASeq/reference_genomes/