4. Setting species of interest - FOI-Bioinformatics/nanometa_live GitHub Wiki

To define your species of interest, create a text file where each species is delineated on an individual line, as exemplified below:

Limosilactobacillus fermentum
Fusobacterium animalis
Akkermansia muciniphila
Paraburkholderia humisilvae

Upon executing the nanometa-new command, the tool integrates the specified species names into the configuration file. This step is crucial for aligning the species names with the taxonomic nomenclature found in your chosen database, be it NCBI or GTDB.

Following this setup, running the nanometa-prepare command triggers several actions. Depending on the selected mode of operation, it fetches the corresponding taxonomic IDs, downloads requisite reference genomes, and constructs BLAST databases for subsequent validation. The mode of operation can be set to one of the following:

  1. gtdb-api: This is the default mode. It fetches genome information via the GTDB API, providing an automated and streamlined way to retrieve data directly from GTDB.

  2. gtdb-file: In this mode, the tool utilizes a GTDB metadata file for genome information. This file can be sourced either locally or from an online location, offering flexibility in how genome data is acquired.

  3. local-species: When set to this mode, the tool retrieves genome data from a specified local directory. It uses the species names present in the file names as a reference, making it suitable for scenarios where genome data is already organized by species.

  4. local-taxid: Similar to the local-species mode, but instead of using species names, it uses taxonomy IDs in file names for retrieving genome data. This mode is particularly useful when dealing with a large number of genomes where taxonomic IDs are a more practical reference than species names.

Each mode offers a tailored approach to handling genome files, ensuring flexibility and efficiency in managing genomic data for different research needs and setups.