Kraken2 Database - jacksonhturner/orthogarden GitHub Wiki
Running Orthogarden with Kraken2
OrthoGarden allows for decontamination of Illumina short reads using a Kraken2 database of choice. In order to pre-process read with, it will be necessary to provide a locally downloaded Kraken2 database.
Thankfully, it's easy to find pre-made Kraken2 databases, such as those hosted on Ben Langmead's GitHub. Be sure to check the index size in GB before downloading the database. If space is limited on your system you may consider using a limited database, for example the PlusPFP
database from December 2024 is 195.2GB but the PlusPFP-16
database is 14.9GB. The database you choose should reflect the contamination you expect to encounter in your reads.
Hypothetical download of Kraken2 database (assuming you want to remove Refeq archaea, bacteria, viral, plasmid, human1, UniVec_Core, protozoa, fungi & plant from your reads).
mkdir kraken2_pluspfp_16gb_20241228
cd kraken2_pluspfp_16gb_20241228
wget https://genome-idx.s3.amazonaws.com/kraken/k2_pluspfp_16gb_20241228.tar.gz
tar -xvzf k2_pluspfp_16gb_20241228.tar.gz
Running OrthoGarden with the Kraken2 database is as easy as including this parameter on your pipeline run:
--kraken_db /path/to/kraken2_pluspfp_16gb_20241228