Getting Started for Developers - everycure-org/matrix-validator GitHub Wiki

Getting started for Developers

Prerequisites

  • Python ≥ 3.11
  • Source data in KGX format (.tsv for nodes and edges).
    • The nodes file must include: id, name, category (and others as defined in schema)
    • The edges file must include: subject, predicate, object, provided_by, etc.
  • Make sure you have poetry installed
  • Run make install to install the poetry environment
  • Run make run_small_tests to see if it worked

The tool is currently divided in the following files (basic layout):

  • src/matrix_validator/cli.py contains all CLI methods (click-based) and should not contain any code other than CLI boilerplate (in particular no IO)
  • src/matrix_validator/validator.py contains the abstract validation class.
  • src/matrix_validator/datamodels.py contains the edge and nodes schemas.
  • src/matrix_validator/util.py contains any utility methods that we might need.
  • We currently experiment with a number of different implementations:
    • src/matrix_validator/validator_polars.py: A very efficient pure polars implementation.
    • src/matrix_validator/validator_purepython.py: A pure python implementation
    • src/matrix_validator/validator_schema.py: A schema-based validation approach based on LinkML generated pandera schemas.