Python API Architecture - GenomicsDB/GenomicsSampleAPIs GitHub Wiki

Architecture

Figure below shows the overall architecture for the Python API.
Architecture

  1. Backend has 2 databases.
    1. Tile DB - is the heart of where the variant information is efficiently stored and retrieved.
    2. Meta DB - is built currently with JSON format, and it stores the following information
      • Contig Names, length, and offset, where offset if the offset in Tile DB since the contigs are flattened as columns one after another.
      • Sample Names to Sample ID mapping, where the sample ID is the row where the sample information is stored in tile DB.
  2. Middleware - abstracts and implements the functionality to manage data between the frontend and the databases.
    1. C++ library - interfaces with Tile DB to query data and provides the information.
    2. MPI Layer - partitions the work based on the loaded configuration and sends it to the C++ library. It collects and translates the data from the C++ library into JSON string format that the API can catch it and make it available for the user on the Pyro4 interface.
  3. Frontend - provides the APIs that are detailed in the API Specification section.
  4. Pyro4 - is a python library that provides remote access to python objects. This enables the users to remotely fetch the data from the Tile DB server nodes and not be tied to the tile DB node.