How MetaMiner Works? - prekijpatel/MetaMiner GitHub Wiki

This section of the wiki will walk you through what MetaMiner does behind the scenes. Our goal is to make things as clear as possible and explain not just what it does, but also how it does it. That way, you can better understand the process and we can receive more targeted feedback on the methods and algorithms that we use. 🙌

We’ve done our best to keep it simple, but if anything’s unclear or you’d like more details, feel free to reach out!


Overall, MetaMiner works in three main steps:

  1. Retrieves or loads your data : If you don’t already have the metadata locally, MetaMiner will download it for you. If you do, it just loads it up from your system.

  2. Turns it into a DataFrame: The raw data usually comes in JSON or JSONL format, which isn’t always the easiest to work with. So MetaMiner converts it into a nice, clean DataFrame. That way, it’s much easier to explore, filter, and process.

  3. Cleans and normalizes the data: This is the core part where MetaMiner puts in a lot of effort. MetaMiner organizes and standardizes several fields to bring consistency across the dataset. It ensures everything is tidy and ready for downstream tasks.

The first two steps are explained here: Metadata Retrieval and Transformation

The third step — normalization — involves several processes. We’ve explained three major ones in detail: