Application Design - HypatiaOrg/HySite GitHub Wiki
Website Server
The hypatiacatalog.com website server is designed from four networked Docker containers hosted on an Amazon Web Services (AWS) virtual machine (VM). Using module containers for each of the four sub-applications was chosen initially to solve the problem of getting what was working on my computer to work on the server. The uniformity of the container environments solves the problem of setting up each environment on any host system; the containers are the same on Linux, Mac, and Windows development machines. For more on these design choices, see our paper [PASP, arXiv] Database Design for SpExoDisks: A Database & Web Portal for Spectra of Exoplanet-Forming Disks, for our sibling database spexodisks.com.
While the exact structure and networking of the HypatiaCatalog containers are controlled with the Docker compose commands, the .env file configuration (more on .env file configurations), and the Docker compose.yaml, a schematic representation is shown in the Figure at the bottom of this section.
Four Docker containers are working in concert for the Hypatia Server
- NGINX Web Server routing URL traffic and serving files
- Web2py Frontend with HTML and JavaScript files of the HypatiaCalatog's user interface and user setting storage.
- The Python Django API that fetches Database Queries from a custom URL pattern
- The MongoDB Database that houses the indexes, Hypatia data, providing fast search, retrieval, and calculations.
Aside from the Docker containers with one running software per container, we also have several shared file systems called Volumes in Docker. These allow us to share files and have persistent storage between container builds/upgrades. We use several volumes in the composition.ymal file, we only show one in the Figure below, where we store the User Settings that save the state of the plots and data table at hypatiacatalog.com/hypatia/default/launch. There is also Persistent Storage on the file system of the AWS VM used to house the raw files for our MongoDB database.
While each container is modulized only to run a single piece of software, they are networked together to create the HypatiaCatalog website as a whole application. Specific ports are configured for inter-container communication. Outside ports on the AWS server pass https (port 443) and http (port 80) communication, as well as a port for SSH connections and an interface with the MongoDB database container for data upload from our processing pipeline.
Data Pipeline
Using a single remotely accessible database was a key upgrade for the HypatiaCatalog in 2024. Previously, each HypatiaCatalog developer kept a separate copy of intermediate processed data, such as star names from SIMBAD queries, Hypatia-formatted NASA Exoplanet Archive (NEA) data, and other contextual data from GAIA, TIC, & Pastel catalogs. The screenshot below shows a list of the intermediate data products HypatiaCatalog uses. Using the terminology for MongoDB, the metadata database houses several collections of intermediate data products that are processed versions of other databases optimized for HypatiaCatalogs usage.
Our intermediate data products include
- star_names linking SIMBAD name data that unites various HypatiaCatalog data products.
- nea which has data from the NASA Exoplanet Archive.
- gaia data in multiple formats, one for each data release.
- tic for the TESS Input Catalog.
- pastel, data from the Pastel catalog with effective temperature and Log Gravity for many stars.
Before the 2024 upgrades, each intermediate data product was stored in its own "mini-database" as custom text files and was updated only on the computers of a given developer. This created some extra steps, resulting in extra queries to online catalogs and versioning conflicts. Now all of our developers share a single central database, and updates made by one developer are immediately visible to all. For example, one HypatiaCatalog developer might want to refresh NEA data to get the latest exoplanets. Likely, new exoplanets have been added, automatically triggering queries to SIMBAD for name information and GAIA for stellar properties. So in this example, first the NEA process data will be updated on the online MongoDB database, then SIMBAD data in the star_names collection, and then multiple GAIA data collections will receive updates. A second developer working on the HypatiaCatalog pipeline, say uploading more stellar abundance measurements, will automatically use these same data products to make data associations with their work. As a result, HypatiaCatalog Developers can share intermediate data processing products via the MongoDB database in addition to posting the final HypatiaCatalog data, which is publicly accessible from the website's API. This can be seen in the figure at the bottom of this section.
Another key design feature is that our pipeline is designed to work in a Docker Python environment or any other Python Environment available on a given developer's computer, see the Figure at the bottom of this section. This solves two issues:
- Our developers can use Python in their favorite way, conda, virtual environment, or monolithic.
- Docker creates a standard environment across all platforms that can be used to debug across all developer computers.
For example, at the HypatiaCatalog, we like using the latest Python versions for the feature improvements. As a result, we are not looking at backward compatibility with older versions of Python. This is usually fine for our student developers or anyone installing Python for the first time. However, if you have used Python for years on many projects, you may not want to upgrade your favorite conda environment and risk breaking something else. This is where Docker can be helpful, as it allows a developer to run Python and the HypatiaCatalog code in the same environment supported for the Websites API. Once a developer is done with the HypatiaCatalog Docker environment, those files can be deleted, and no changes are ever needed to other Python environments.
Test Website
Before adding new data or new features to the public HypatiaCatalog website, we have one final step: to view the data in a test website. We have many quality checks throughout the build process, but this is a way to see all the systems working together and see the data as it will appear on hypatiacatalog.com.
As the entire system is deployed in containers, we can make instances of all the containers and run them locally on our computers. In practice, we do this with one difference: we still use the MongoDB database on the live HypataCatalog website. We configure the local API's network connection to point to a staging location where the HypatiaCatalog pipeline will have uploaded new data. This is a separate copy from the publicly visible data, but it is often structurally similar (unless we are testing data-structure changes). Please look at the Figure at the bottom of this section for a visual representation of this configuration.
Using Docker containers and reconfigurable network connections allows us to make any number of configurations, including a full server simulation, multiple APIs, or a local frontend only. However, the test website below is often the default choice for testing most new features and visualizing data upload before making it public. After verification from the test website, we usually rebuild the server containers with any new updates and migrate the staged test data to publicly viewable paths within the database.