Our Story - phenopolis/phenopolis_genomics_browser GitHub Wiki

We have been working on the Phenopolis Browser (PB) platform since 2017 (www.phenopolis.org) whilst working at UCL. We initially started as part of a charity funded project (Retina UK) to develop a database to host genetics and clinical (phenotypic) data for patients with inherited retinal diseases (Retina UK - Jing YU). We were inspired by the (ExAC browser)[https://github.com/konradjk/exac_browser] but we wanted to make our software more useful for rare disease researchers so we added support for Human Phenotype Ontology (HPO) terms and to explore the entirety of an individuals’ genetic data rather than just provide aggregate information. We published our research in a Bioinformatics journal later that year (Pontikos et al. 2017) and organised a conference to promote community building around the HPO . PB was cited in the 2017 HPO publication (Köhler et al. 2017) and has so far been cited over 30 times.

Since then we have been maintaining and improving our software mostly using our own money and that from our software company Phenopolis Ltd (£15,000 in salary costs and £3000 /annum in AWS cloud costs). We have received some support funding (£5000) from the Japan Society for the Promotion of Science to develop it further for the use of Japanese researchers. In 2018, in order to enable the continued funding of PB developers, we established the Phenopolis Ltd company ( www.phenopolis.com ). The Phenopolis Ltd is committed to supporting the continued development of the open-source PB. In 2020, Nikolas Pontikos established his lab at UCL which now also supports the development of the PB through academic collaborations.

The PB software has gone through a number of technological iterations. Due to the large and complex nature of the genomic data and the type of analysis we are performing, we have explored several designs in the search for improved efficiency: these including document databases (MongoDB) and graph databases Neo4J (Mughal et al. 2017) (Neo4j Online Meetup #12: Pheno4J: A Gene To Phenotype Graph Database ). We have now settled on PostgresSQL as the most efficient database solution and we are privileged to benefit from the expertise of the psycopg lead developer Daniele Varazzo that has joined our collaboration (see letter of support). We are currently sponsoring Daniele through Github Sponsors using our own money (https://github.com/dvarrazzo). We have also settled for a decoupled server architecture with a frontend implemented in React JS (mostly developed by Yuan Tian) and an API developed using Flask python (mostly developed by Alan Wilter, Nikolas Pontikos, Pablo Priesgo with oversight form Daniele Varazzo). However we intend to move our API to FastAPI before the start of this award, as this will offer us a number of advantages in terms of parameter validation and automatic generation of API doc, facilitating robust software development and external contributions.

The mission of PB was and still is to give clinical/genetic researchers, who do not have programming or bioinformatics skills, the means to analyse their genetic and phenotypic data. PB now benefits a community of researchers in the UK and in Japan and is part of the Global Alliance for Genomics and Health (GA4GH). It allows fast querying of genetic and phenotypic data and advanced analysis to find gene-phenotype associations. It allows research to better understand the function of certain genes and also for them to identify the likely genetic cause of disease in patients.

Unfortunately, at the moment PB is not financially sustainable as there is no way of running the analysis pipelines automatically nor a mechanism for billing users to recover costs.

Hence if this project is funded, we plan to develop a system that can automatically run pipelines on AWS and bill users of the system to recover costs.

We also plan on developing tools for collaboration so that users can cooperatively curate the genetic data and more effectively identify the genetic cause of rare diseases. We will also follow the GA4GH API standards to allow communication with other rare disease genetics databases.

Finally we will also be doing further outreach to grow our user community with a particular focus on bioinformaticians and software developers that will contribute to our software and build bespoke analytics and visualisation tools. By writing extensive documentation and tutorials we hope to attract other researchers to set up their own PB instances, allowing for a global network of PBs interfacing via GA4GH.