Kraken database - MeryemAk/dRNASeq GitHub Wiki

Downloading the Vaginal Microbiome Genome Collection database

The Vaginal Microbiome Genome Collection (VMGC) is a comprehensive reference genome resource focused on the human vaginal microbiome. This collection contains over 33,000 genomes from various microbial groups, including prokaryotes, fungi, and viruses. For this guide, we will specifically cover downloading the prokaryotic database, which consists of 786 genomes.

Source Repositories

Get started

1. Download the Kraken database

To get started, download the VMGC Kraken dataset:

wget https://zenodo.org/records/10457006/files/VMGC_prokaryote_SGB_KrakenDB.tar.gz?download=1

2. Extract the database

After downloading, extract the database using:

tar -xvf VMGC_prokaryote_SGB_KrakenDB.tar.gz

3. Structure of the database

The database should include following files:

KBdb/
      ├── database150mers.kmer_distrib
      ├── database150mers.kraken
      ├── database.kraken
      ├── hash.k2d
      ├── library
      │   └── added
      │       ├── nFPH00iyWZ.fna
      │       ├── nFPH00iyWZ.fna.masked
      │       ├── prelim_map.txt
      │       └── prelim_map_ZLPtIxoxoZ.txt
      ├── opts.k2d
      ├── seqid2taxid.map
      ├── taxo.k2d
      └── taxonomy
          ├── db.accession2taxid
          ├── names.dmp
          ├── nodes.dmp
          └── prelim_map.txt

Additional Notes

  • Make sure you have sufficient storage space before downloading.
  • Ensure a stable internet connection to prevent incomplete downloads.
  • For more details on usage, refer to the official VMGC repository.