Kraken database - MeryemAk/dRNASeq GitHub Wiki

Downloading the Vaginal Microbiome Genome Collection database

The Vaginal Microbiome Genome Collection (VMGC) is a comprehensive reference genome resource focused on the human vaginal microbiome. This collection contains over 33,000 genomes from various microbial groups, including prokaryotes, fungi, and viruses. For this guide, we will specifically cover downloading the prokaryotic database, which consists of 786 genomes.

Source Repositories

Get started

1. Create a kraken directory

Move into the dRNASeq directory and create a new directory called 7.kraken, move into the folder.

cd $HOME/dRNASeq
mkdir 7.kraken
cd 7.kraken

image

2. Download the Kraken database

Download the VMGC Kraken dataset inside the newly created 7.kraken folder. This takes ±8 minutes to download.

wget --content-disposition https://zenodo.org/records/10457006/files/VMGC_prokaryote_SGB_KrakenDB.tar.gz?download=1

image

Note: if the link doesn't work, navigate to Zenodo and choose VMGC_prokaryote_SGB_KrakenDB.tar.gz

3. Extract the database

After downloading, extract the database. This should only take a few minutes.

tar -xvf VMGC_prokaryote_SGB_KrakenDB.tar.gz

image

4. Structure of the database

The database includes the following files:

VMGC_prokaryote_SGB_KrakenDB/
      ├── database150mers.kmer_distrib
      ├── database150mers.kraken
      ├── database.kraken
      ├── hash.k2d
      ├── library
      │   └── added
      │       ├── nFPH00iyWZ.fna
      │       ├── nFPH00iyWZ.fna.masked
      │       ├── prelim_map.txt
      │       └── prelim_map_ZLPtIxoxoZ.txt
      ├── opts.k2d
      ├── seqid2taxid.map
      ├── taxo.k2d
      └── taxonomy
          ├── db.accession2taxid
          ├── names.dmp
          ├── nodes.dmp
          └── prelim_map.txt

Additional Notes

  • Make sure you have sufficient storage space before downloading (±2GB).
  • Ensure a stable internet connection to prevent incomplete downloads.
  • For more details on the database, refer to the official VMGC Nature article.