SystemArchitecture - bandrewfox/exprdb_docs GitHub Wiki

Overview of system architecture

/images/1713345680-sys-arch.png

As seen above, there are 5 main components of the app. There design pattern is to have relatively limited set of connections between each component. Here are the components:

  • SQL database (MySQL, postgres, or AWS RDS): Most of the gene expression data viewable in the app is stored an SQL database.
  • REST API: The API handles requests from the client (could be the app itself or a different app or an analysis script by a bioinformatics scientist) and fetches the data from the SQL database and returns it to the client. Any access to the data must be made via the API (i.e. the main client app does not access the SQL database directly for data). Many of the API calls are handled by Django Rest Framework (DRF), although some API calls are handled by custom django code for efficiency.
  • Browse and Query: this is the main user interface for biologists to browse and query the gene expression data. It uses API calls to fetch the data from the REST API. It also uses the R language to draw plots. [Note: for the docker-based deployment, there is a separate docker container for R which responds to requests via flask to make plots and run any other R script. This is to keep the R environment separate from the python/django/webserver environment]. [Note 2: some of the user/group authorization info is directly handled from the main user interface to the SQL database, bypassing the REST API.]
  • Sessions: this began as the data staging area for manipulating gene expression datasets so they could be loaded to the SQL database via the REST API. Each folder on the filesystem was a different session (dataset). Over time, this became a permanent area on the filesystem so that end users could analyze full datasets without needing to access the data via the REST API. [Note: Custom R scripts can be deployed within a session so that the user can run the R script via a web form and view the results]
  • Data sources: A super-user can get new data into a session through several methods: upload data from their laptop, reach out to GEO or arbitrary URLs to download data, or access the Needle Genomics Data Library.

REST API

There are 3 main DRF endpoints: https://[your-server]/series_api, https://[your-server]/feature_api, https://[your-server]/user_content_api. If you go to these endpoints on your server, there is a browsable API which should contain enough help for how to use all the API functionality. [Note: these API docs contain enough information so that chatGPT can correctly access data]

Each series is a separate gene expression dataset. Each series has gene expression data for all the genes in that dataset, and the genes are called "features". The feature_api stores additional information about each feature, including if it is a gene or some other type of measurable entity. The user_content_api has additional information provided by the users and is incorporated into various aspects of browsing the app.