Usage Guide - prekijpatel/MetaMiner GitHub Wiki

Welcome to MetaMiner! This guide will walk you through installing, setting up, and using the MetaMiner tool for metadata exploration.

Minimum System Requirements

Component Minimum Requirement Notes
CPU Intel Core i5 (8th Generation) -
RAM 8 GB 16 GB or higher is recommended
Storage At least 2 GB free disk space SSD recommended for faster file operations
Internet Required for metadata download Offline use possible after download

Note : For larger datasets (E. coli or Salmonella or any other where genome no. is 50,000+), a system with 16 GB RAM or more would be required.

Installation🛠️

This repository provides two prebuilt installers📦:

To install:

  1. Download the appropriate installer for your system.
  2. For Windows: double-click the .exe file and follow the instructions. For Linux, install using:
    sudo dpkg -i MetaMiner.deb
    
  3. Once installed, launch MetaMiner as administrator from Start Menu or Desktop.

Running the app

On first launch, three windows will appear:

  1. MetaMiner Software Window

    • This is your main working interface.

    MetaMiner GUI

  2. Log Window📜

    • This window keeps a live log of all operations — downloading, transforming, and analyzing — so you can monitor the progress and any errors.

    Log Window

  3. Folder Selection Prompt📂

    • You’ll be asked to choose a location to save the data generated during your work. Select your preferred folder.

    Select Folder Window


Get Started!

MetaMiner operates in three stages:

  1. Downloading and/or Loading metadata
  2. Transforming metadata from JSON/JSONL to DataFrame for easier manipulation
  3. Normalizing metadata—cleaning, sorting, categorizing, etc.

Let's understand this using a example. Suppose we want to sort all Acinetobacter baumannii isolates' genomes that are from United States and are isolated from blood🩸.

1. Downloading and/or Loading Metadata

MetaMiner can be used in two ways. One, it can load the metadata directly that you already have from NCBI. Or, if you don’t have it, MetaMiner can download it for you.

In case of the former, simply choose file that you want to load and press Analyze...{}. Next? Go to Step 3.

While, if you don’t have the metadata, you can load the metadata from NCBI server:

  • In the Enter details:, type the taxon name inside inverted commas:

    "Acinetobacter baumannii"
    

Cautions :

  • Incorrect inputs will fail to fetch proper metadata and will pause the GUI (We are working on improving this behavior)
  • Always use full scientific names at valid taxonomic level (Genus, Species, Family, etc.)
✅ Correct Input Examples ❌ Incorrect Inputs Examples
"Acinetobacter baumannii" (full species) "baumannii" (only species)
"Acinetobacter" (full genus) "A. baumannii" (short form)
"Moraxellaceae" (family) "Acineto family" (wrong tax name)
"Enterobacteriaceae" (family) "Enterobacteriacaeae" (spelling error)
1280 (taxid for S. aureus) -
  • After entering the correct taxon, press the Download JSON {} button.
  • The Log Window will start showing messages, including the elapsed time.
  • When downloading is finished, a completion message will appear in the Log window.

Downloading Completed Log

2. Transforming Metadata

  • After the download has finished, press the Analyze...{} button just below the Download JSON {} button.

  • Once that is pressed, MetaMiner will:

    • Read the downloaded JSON files.
    • Extract the metadata of individual genomes and put it into an easy-to-use DataFrame format (rows/columns).
    • Removes any redundant data.
    • Save the raw extracted metadata for your referenece and future use.
    • and, display the metadata available for no. of genomes.
  • When the process is complete, the Log Window will show an update, and you’ll also see an Explore Data option at the bottom of the log window.

Explore Data Button in Log Window

Clicking Explore Data will take you to the next step.

3. Normalizing metadata

This entire process of normalization runs in the background, and once finished, it prompts a dashboard with the entire metadata plotted in one or more forms.

Dash Started Log

Dashboard SS

From the meta-mined dashboard, use:

  1. Isolation Source Filter:
    • Choose Blood from the multi-dropdown menu.
  2. Country Filter:
    • Choose United States.
  3. and Save! Done! Voila!🎉

From the saved tsv, one can simply extract the accession_id of the genome of interest and download them from NCBI for further analysis. Note: We're working on streamlining this process as well. Please bear with us, and we’ll get it to you soon!😄