Kraken database - MeryemAk/dRNASeq GitHub Wiki
Downloading the Vaginal Microbiome Genome Collection database
The Vaginal Microbiome Genome Collection (VMGC) is a comprehensive reference genome resource focused on the human vaginal microbiome. This collection contains over 33,000 genomes from various microbial groups, including prokaryotes, fungi, and viruses. For this guide, we will specifically cover downloading the prokaryotic database, which consists of 786 genomes.
Source Repositories
- GitHub Repository: VMGC
- Dataset Download: Zenodo Record
Get started
1. Download the Kraken database
To get started, download the VMGC Kraken dataset:
wget https://zenodo.org/records/10457006/files/VMGC_prokaryote_SGB_KrakenDB.tar.gz?download=1
2. Extract the database
After downloading, extract the database using:
tar -xvf VMGC_prokaryote_SGB_KrakenDB.tar.gz
3. Structure of the database
The database should include following files:
KBdb/
├── database150mers.kmer_distrib
├── database150mers.kraken
├── database.kraken
├── hash.k2d
├── library
│ └── added
│ ├── nFPH00iyWZ.fna
│ ├── nFPH00iyWZ.fna.masked
│ ├── prelim_map.txt
│ └── prelim_map_ZLPtIxoxoZ.txt
├── opts.k2d
├── seqid2taxid.map
├── taxo.k2d
└── taxonomy
├── db.accession2taxid
├── names.dmp
├── nodes.dmp
└── prelim_map.txt
Additional Notes
- Make sure you have sufficient storage space before downloading.
- Ensure a stable internet connection to prevent incomplete downloads.
- For more details on usage, refer to the official VMGC repository.