Annotation Resource: Human Phenotype Ontology - bcb420-2025/Clare_Gillis GitHub Wiki
Human Phenotype Ontology (HPO)
1. What sort of data is it? What sort of information does it offer us?
- HPO provides a standardized library of phenotypic abnormalities associated with genes. The dataset contains genes (denoted by an NCBI gene identifier and a HUGO gene symbol,) phenotypic abnormalities (denoted by a term identifier and name,) and diseases (denoted by a disease identifier and name.) Users can search by gene, phenotype, or disease to find genes, phenotypes, or diseases associated with their search term (ex. phenotypes and diseases associated with a gene.)
2. When and where was is published?
- Originally published in 2008: The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease with many updates since then. Their most recent publication is from 2024: The Human Phenotype Ontology in 2024: phenotypes around the world
Robinson, P. N., Köhler, S., Bauer, S., Seelow, D., Horn, D., & Mundlos, S. (2008). The Human Phenotype Ontology: a tool for annotating and analyzing human hereditary disease. American journal of human genetics, 83(5), 610–615. https://doi.org/10.1016/j.ajhg.2008.09.017
Gargano, M. A., Matentzoglu, N., Coleman, B., Addo-Lartey, E. B., Anagnostopoulos, A. V., Anderton, J., Avillach, P., Bagley, A. M., Bakštein, E., Balhoff, J. P., Baynam, G., Bello, S. M., Berk, M., Bertram, H., Bishop, S., Blau, H., Bodenstein, D. F., Botas, P., Boztug, K., Čady, J., … Robinson, P. N. (2024). The Human Phenotype Ontology in 2024: phenotypes around the world. Nucleic acids research, 52(D1), D1333–D1346. https://doi.org/10.1093/nar/gkad1005
3. Is this annotation set updated regularly or is it a static source?
- This dataset is updated regularly - every few months they release updates to their terms, annotations, and API.
4. Where can I find this data?
- HPO data (annotations and ontology) can be downloaded directly from their website (download ontology, download annotations) or from their GitHub
5. How is the data formatted and released? Does it exist in some sort of standard file format?
- Phenotype info is stored in HPOA format (a specific file format meant to hold HPO phenotype data) and associations (genes_to_phenotype, phenotype_to_genes, etc) are stored in TXT files. More info about these formats can be found here
- Ontology data can be downloaded in OBO (human readable), OWL (computer readable), or JSON (lightweight, computer readable) formats.
6. What identifiers are associated with these annotations?
- Genes: NCBI gene identifier and HUGO gene symbol
- Phenotypes: HPO term identifier and term name
- Diseases: OMIM (Online Mendelian Inheritance in Man) and ORPHA (Orphanet) identifiers