Usage Guide - prekijpatel/MetaMiner GitHub Wiki
Welcome to MetaMiner! This guide will walk you through installing, setting up, and using the MetaMiner tool for metadata exploration.
Minimum System Requirements
Component | Minimum Requirement | Notes |
---|---|---|
CPU | Intel Core i5 (8th Generation) | - |
RAM | 8 GB | 16 GB or higher is recommended |
Storage | At least 2 GB free disk space | SSD recommended for faster file operations |
Internet | Required for metadata download | Offline use possible after download |
⚡ Note : For larger datasets (E. coli or Salmonella or any other where genome no. is 50,000+), a system with 16 GB RAM or more would be required.
Installation🛠️
This repository provides two prebuilt installers📦:
- Windows:
MetaMiner.exe
- Linux:
MetaMiner.deb
- Under development!
To install:
- Download the appropriate installer for your system.
- For Windows: double-click the
.exe
file and follow the instructions. For Linux, install using:sudo dpkg -i MetaMiner.deb
- Once installed, launch MetaMiner as administrator from Start Menu or Desktop.
Running the app
On first launch, three windows will appear:
-
MetaMiner Software Window
- This is your main working interface.
-
Log Window📜
- This window keeps a live log of all operations — downloading, transforming, and analyzing — so you can monitor the progress and any errors.
-
Folder Selection Prompt📂
- You’ll be asked to choose a location to save the data generated during your work. Select your preferred folder.
Get Started!
MetaMiner operates in three stages:
- Downloading and/or Loading metadata
- Transforming metadata from
JSON/JSONL
to DataFrame for easier manipulation - Normalizing metadata—cleaning, sorting, categorizing, etc.
Let's understand this using a example. Suppose we want to sort all Acinetobacter baumannii isolates' genomes that are from United States and are isolated from blood🩸.
1. Downloading and/or Loading Metadata
MetaMiner can be used in two ways. One, it can load the metadata directly that you already have from NCBI. Or, if you don’t have it, MetaMiner can download it for you.
In case of the former, simply choose file
that you want to load and press Analyze...{}
. Next? Go to Step 3.
While, if you don’t have the metadata, you can load the metadata from NCBI server:
-
In the
Enter details:
, type the taxon name inside inverted commas:"Acinetobacter baumannii"
⚡ Cautions :
- Incorrect inputs will fail to fetch proper metadata and will pause the GUI (We are working on improving this behavior)
- Always use full scientific names at valid taxonomic level (Genus, Species, Family, etc.)
✅ Correct Input Examples | ❌ Incorrect Inputs Examples |
---|---|
"Acinetobacter baumannii" (full species) |
"baumannii" (only species) |
"Acinetobacter" (full genus) |
"A. baumannii" (short form) |
"Moraxellaceae" (family) |
"Acineto family" (wrong tax name) |
"Enterobacteriaceae" (family) |
"Enterobacteriacaeae" (spelling error) |
1280 (taxid for S. aureus) |
- |
- After entering the correct taxon, press the
Download JSON {}
button. - The Log Window will start showing messages, including the elapsed time.
- When downloading is finished, a completion message will appear in the Log window.
2. Transforming Metadata
-
After the download has finished, press the
Analyze...{}
button just below theDownload JSON {}
button. -
Once that is pressed, MetaMiner will:
- Read the downloaded JSON files.
- Extract the metadata of individual genomes and put it into an easy-to-use DataFrame format (rows/columns).
- Removes any redundant data.
- Save the raw extracted metadata for your referenece and future use.
- and, display the metadata available for no. of genomes.
-
When the process is complete, the Log Window will show an update, and you’ll also see an
Explore Data
option at the bottom of the log window.
Clicking Explore Data
will take you to the next step.
3. Normalizing metadata
This entire process of normalization runs in the background, and once finished, it prompts a dashboard with the entire metadata plotted in one or more forms.
From the meta-mined
dashboard, use:
- Isolation Source Filter:
- Choose Blood from the multi-dropdown menu.
- Country Filter:
- Choose United States.
- and Save! Done! Voila!🎉
From the saved tsv
, one can simply extract the accession_id
of the genome of interest and download them from NCBI for further analysis.
Note: We're working on streamlining this process as well. Please bear with us, and we’ll get it to you soon!😄