Home - wkiri/MTE GitHub Wiki

Welcome to the MTE wiki!

The MTE system has three components:

  1. The MTE database (stored as an SQLite database following the MTE Database Schema
  2. The MTE ingestion pipeline (which populates the database)
  3. The MTE user interface (website) - used internally at JPL; not part of this repository

Ingestion Pipeline

  • The pipeline takes in a PDF file and applies these steps
    • Convert PDF to text (using Tika)
    • Obtain document information such as title, authors, etc. (lookup via ADS API and fall back to extraction from the text content using Grobid)
    • Extract (recognize) named entities, including Targets, Elements, Minerals, and Properties
    • Extract relations between entities (e.g., "contains") (using jSRE)
  • To generate an MTE database: see detailed instructions
  • MTE Wishlist - ideas for future improvements

Howto