Getting Started for Developers - everycure-org/matrix-validator GitHub Wiki
Getting started for Developers
Prerequisites
- Python ≥ 3.11
- Source data in KGX format (.tsv for nodes and edges).
- The nodes file must include:
id
,name
,category
(and others as defined in schema) - The edges file must include:
subject
,predicate
,object
,provided_by
, etc.
- The nodes file must include:
- Make sure you have poetry installed
- Run
make install
to install the poetry environment - Run
make run_small_tests
to see if it worked
The tool is currently divided in the following files (basic layout):
src/matrix_validator/cli.py
contains all CLI methods (click-based) and should not contain any code other than CLI boilerplate (in particular no IO)src/matrix_validator/validator.py
contains the abstract validation class.src/matrix_validator/datamodels.py
contains the edge and nodes schemas.src/matrix_validator/util.py
contains any utility methods that we might need.- We currently experiment with a number of different implementations:
src/matrix_validator/validator_polars.py
: A very efficient pure polars implementation.src/matrix_validator/validator_purepython.py
: A pure python implementationsrc/matrix_validator/validator_schema.py
: A schema-based validation approach based on LinkML generated pandera schemas.