Silva database - MeryemAk/dRNASeq GitHub Wiki
Silva database
The silva ARB files are curated rRNA sequence datasets formatted for use with the ARB software, mainly for phylogenetic and taxonomic analysis.
Main datasets:
- SSU Ref NR 99: Non-redundant 16S/18S rRNA sequences (99% identity filtered).
- SSU Ref: High-quality full-length SSU sequences.
- LSU Ref NR 99: Non-redundant 23S/28S rRNA sequences.
- LSU Ref: Full-length LSU sequences.
Note:
Sufficient RAM is needed. FASTA versions are also available.
The SSU Ref NR 99 and LSU Ref NR 99 datasets will be used to filter rRNA sequences from the FASTQ files through mapping with minimap2.
Download through Linux terminal:
reference_genomes/
folder and download the zipped fasta files.
Step 1: Navigate to the cd $HOME/dRNASeq/reference_genomes/
wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_2/Exports/SILVA_138.2_LSURef_NR99_tax_silva.fasta.gz
wget https://www.arb-silva.de/fileadmin/silva_databases/release_138_2/Exports/SILVA_138.2_SSURef_NR99_tax_silva.fasta.gz
Step 2: Unzip the fasta.gz files
gunzip SILVA_138.2_*
rRNA_database.fasta
Step 3: Concatenate both files into one big file and call it cat SILVA_138.2_* > rRNA_database.fasta
rRNA_database.fasta
file with minimap2.
Step 4: Activate the Conda environment (if not active already), and index the conda activate dRNAseq
minimap2 -x map-ont -d rRNA_database.mmi rRNA_database.fasta
Download through the silva website:
File archive
.
Step 1: Go to the silva website and click on Step 2: Scroll down to find the latest release (at the time of writing: 138.2) and click on it.
Exports
.
Step 3: Click on fasta.gz
files for SSU and LSU Ref NR 99 and download them by clicking on the link.
Step 4: Find the Step 5: In the Linux terminal, navigate to the downloads directory. List the contents of the directory, both the SSU and LSU fasta.gz files should be present here.
cd Downloads/
ls
Step 6: Unzip both files with:
gunzip SILVA_138.2_*
Now the same files appear but with only a .fasta
extension.
rRNA_database.fasta
Step 7: Concatenate both files into one big file and call it cat SILVA_138.2_* > rRNA_database.fasta
rRNA_database.fasta
file with minimap2.
Step 8: Activate the Conda environment (if not active already), and index the conda activate dRNAseq
minimap2 -x map-ont -d rRNA_database.mmi rRNA_database.fasta
reference_genomes/
directory within the dRNASeq folder.
Step 9: Move the indexed file to the mv rRNA_database.mmi $HOME/dRNASeq/reference_genomes/