Kraken database - MeryemAk/dRNASeq GitHub Wiki
Downloading the Vaginal Microbiome Genome Collection database
The Vaginal Microbiome Genome Collection (VMGC) is a comprehensive reference genome resource focused on the human vaginal microbiome. This collection contains over 33,000 genomes from various microbial groups, including prokaryotes, fungi, and viruses. For this guide, we will specifically cover downloading the prokaryotic database, which consists of 786 genomes.
Source Repositories
- GitHub Repository: VMGC
- Dataset Download: Zenodo Record
Get started
1. Create a kraken directory
Move into the dRNASeq directory and create a new directory called 7.kraken, move into the folder.
cd $HOME/dRNASeq
mkdir 7.kraken
cd 7.kraken
2. Download the Kraken database
Download the VMGC Kraken dataset inside the newly created 7.kraken folder. This takes ±8 minutes to download.
wget --content-disposition https://zenodo.org/records/10457006/files/VMGC_prokaryote_SGB_KrakenDB.tar.gz?download=1
Note: if the link doesn't work, navigate to Zenodo and choose
VMGC_prokaryote_SGB_KrakenDB.tar.gz
3. Extract the database
After downloading, extract the database. This should only take a few minutes.
tar -xvf VMGC_prokaryote_SGB_KrakenDB.tar.gz
4. Structure of the database
The database includes the following files:
VMGC_prokaryote_SGB_KrakenDB/
├── database150mers.kmer_distrib
├── database150mers.kraken
├── database.kraken
├── hash.k2d
├── library
│ └── added
│ ├── nFPH00iyWZ.fna
│ ├── nFPH00iyWZ.fna.masked
│ ├── prelim_map.txt
│ └── prelim_map_ZLPtIxoxoZ.txt
├── opts.k2d
├── seqid2taxid.map
├── taxo.k2d
└── taxonomy
├── db.accession2taxid
├── names.dmp
├── nodes.dmp
└── prelim_map.txt
Additional Notes
- Make sure you have sufficient storage space before downloading (±2GB).
- Ensure a stable internet connection to prevent incomplete downloads.
- For more details on the database, refer to the official VMGC Nature article.