2024_PGO_V1.0 _summary_meetings - Pistoia-Alliance-Inc/Pistoia-Alliance-PGO GitHub Wiki

PGO V1.0 phase 1 2024 2025 summary of alignment meetings

Top of the document

Top of the document


Summary of the “Assay/Bioassay” Alignment Meeting

Date: 2025-05-08 (May 8, 2025)

This meeting aimed to reach consensus on the appropriate ontological representation of the concept currently referred to as “Assay” or “Bioassay” within the Pistoia Alliance’s Pharma General Ontology (PGO). Discussions focused on definitions, hierarchical relationships with related terms (e.g., “experiment”, “study”, “test”), and domain-specific applicability.

Conceptual Clarification

Participants discussed the semantic boundaries and overlaps between assay, bioassay, experiment, and test. Several distinctions and relationships were noted.

  • An assay is generally defined as a planned process that measures a specific activity or property (e.g., biological, chemical, physiochemical, sequence-based), often following a defined protocol.
  • An experiment is a broader, less formalized concept that may include multiple assays and can span various designs and methodologies.
  • A study may encompass multiple experiments or be used synonymously with an experiment in some contexts.
  • Runs refer to repeated executions of the same assay or experiment.
  • A bioassay is considered a subset of assays involving biological materials or reagents, or measuring biological properties.
  • The terms test and assay are sometimes used interchangeably in practical and regulatory contexts (e.g., pharmacopeial standards).

Definition Evaluation

Several candidate were proposed and evaluated for suitability.

  • NCIT_C16341 (Assay): Widely used in cancer research and pharmacology. Positively received by multiple experts for its comprehensiveness. Links conceptually to the PGO core concept “substance”. Endorsed by multiple participants ([PMQ], [BM]).
  • BAO_15 (Bioassay Ontology): Proposed by at least two project partners. Critiqued for being overly complex or broad, especially with inclusion of SOP-level information.
  • NCIT_C60819: Considered too narrow, focused on measuring quantities or drug effects.
  • OBI_0000070: Used extensively in biomedical domains (Ontology for Biomedical Investigations).considered sufficiently flexible and formally defined.

Ultimately, the group converged on the OBI 0000070 defnition to :o represent the concept of “Assay” “A planned process with the objective to produce information about the material entity that is the evaluant, by physically examining it or its proxies.”

Recommendations to the PGO Steering Group

  1. Change of Label:Update the existing core concept label from “Bioassay” or “Assay/Bioassay” to “Assay” to improve clarity and generality.
  2. Adopt the definition from OBI_0000070 as the formal representation of “Assay” in the PGO.
  3. Future Concept Expansion: Consider the addition of a distinct core concept for “Bioassay” in future ontology releases to better capture biological assay specificity.

Top of the document


Summary of the “Biomarker” Alignment Meeting

Date 2025-05-15. May 15, 2025

The alignment meeting focused on refining and selecting an ontological definition for the core concept of “biomarker” within the Pistoia Alliance’s Pharma General Ontology (PGO). Despite the recognised complexity and multidimensionality of the term, only two candidate definitions were formally proposed and evaluated.

Conceptual Context

The term biomarker has been acknowledged in earlier PGO discussions as representing a role, rather than a concrete material entity. This ontological nuance may necessitate additional care in representation, potentially through modelling techniques such as LinkML, to ensure semantic precision and interoperability.

Evaluation of Candidate Definitions

Two primary ontology entries were considered.

  • NCIT_C16342 (Biomarker, NCI Thesaurus): Broadly defined and widely used in clinical and translational research contexts. Received multiple endorsements ([PMQ], [BM], [DH]). Recognised as sufficiently inclusive of the diverse modalities of biomarkers, encompassing molecular, physiological, digital, and behavioural indicators. The definition references core functions such as monitoring normal or pathogenic biological processes, assessing disease risk or prognosis, and predicting or evaluating therapeutic response. While the wording was deemed largely appropriate, some participants suggested that the definition could be further refined to explicitly list specific examples (e.g., “digital biomarkers”, “imaging biomarkers”).

  • CHEBI_59163 (Biomarker, ChEBI): Also proposed by multiple partners, but ultimately considered overly narrow. The definition is constrained by ChEBI’s chemical substance-centric framework, limiting its applicability to non-molecular or non-substance-based biomarkers (e.g., behavioural or digital indicators).

Agreed Text Definition

The following text definition, drawn from NCIT_C16342, was proposed for adoption. “A characteristic that can be objectively measured and serves as an indicator for normal biologic processes, pathogenic processes, state of health or disease, the risk for disease development and/or prognosis, or responsiveness to a particular therapeutic intervention.”

This definition aligns with established biomedical usage and supports a variety of research and clinical applications.

Recommendations to the PGO Steering Group

  1. Definition Adoption: Officially adopt NCIT_C16342 as the definition for “biomarker” within the PGO.
  2. Implementation Considerations: Acknowledge the conceptualisation of biomarker as a role, and ensure that implementation models reflect this distinction for ontological consistency and extensibility.

Top of the document


Summary of the "Biospecimen" Alignment Discussion

date of the meeting 2025-05-08

Definition Evaluation

The experts reached a consensus on adopting the NCIT_C70699 definition from the NCI Thesaurus as the preferred representation of the core concept “biospecimen”. This is specifically in the context of Research and Early Development (R&ED) in pharmaceutical domains.

The group emphasised that this definition effectively excludes purely physical-chemical specimens or samples not derived from biological entities, thereby offering necessary conceptual precision. Although it broadly aligns with the term 'sample', its use is more targeted and appropriate for biomedical research involving biological material.

The NCIT_C70699 definition is “Any material sample taken from a biological entity for testing, diagnostic, propagation, treatment or research purposes, including a sample obtained from a living organism or taken from the biological object after halting of all its life functions. Biospecimen can contain one or more components including but not limited to cellular molecules, cells, tissues, organs, body fluids, embryos, and body excretory products.”

Recommendation to the PGO Steering Group

It is recommended that the PGO Steering Group formally adopt NCIT_C70699 as the standard definition of biospecimen, given its alignment with pharmaceutical R&ED usage, conceptual clarity, and institutional support.

Top of the document


Summary of the “Cell” Alignment Meeting

Date 2024-11-21 November 21, 2024

The alignment meeting focused on the selection and harmonisation of a suitable ontological definition for the core concept “Cell” within the Pharma General Ontology (PGO). As one of the foundational biological units, “Cell” was among the initial core concepts identified by the PGO working group, with related concepts such as “Cell type” and “Cell line” emerging in subsequent discussions.

Conceptual Scope and Clarifications

Participants agreed that the term “Cell” should refer to the highest-level concept, distinct from its more specific derivatives (“cell type” and “cell line”), each of which warrants its own dedicated definition and ontological treatment. The distinction allows for clearer semantic modeling and better alignment with experimental and biomedical data usage patterns.

Evaluation of Candidate Definitions

Three candidate definitions were discussed, each reflecting usage within major biomedical ontology frameworks:

  • EFO_0000324 (Experimental Factor Ontology): Commonly employed in the annotation of experimental datasets. Relevant to data-driven contexts but may lack the broader biological specificity preferred for a core concept.
  • NCIT_C12508 (NCI Thesaurus): Widely used in cancer research and clinical domains. Offers a clear, biologically grounded definition with appropriate granularity for foundational biomedical modeling.
  • CL_0000000 (Cell Ontology): Extensively used in cell biology and bioinformatics.Provides a highly structured, hierarchically organized ontology of cell types.

All three definitions were deemed acceptable within specific use cases. However, for the purposes of PGO’s general ontology layer, a single, broadly applicable definition was required.

The NCIT definition was preferred due to several factors.

  • Frequency of use within relevant biomedical applications.
  • The need to anchor PGO in a generalizable and foundational biological concept.
  • The ability to accommodate related but distinct sub-concepts such as cell type and cell line under separate entries.
  • The existing support for alternative definitions and annotations through PGO’s planned LinkML framework, allowing for ontological flexibility and layered specificity.

Recommended Text Definition

The NCIT definition was endorsed. “The smallest units of living structure capable of independent existence, composed of a membrane-enclosed mass of protoplasm and containing a nucleus or nucleoid.”

This formulation reflects a generalizable understanding of cellular structures across domains.

Recommendations to the PGO Steering Group

  1. Definition Adoption: Adopt NCIT_C12508 from the NCI Thesaurus as the reference definition for “Cell” in the PGO.
  2. Extensions: consider the separate definition of “cell type” and “cell line” as distinct, core concepts in subsequent ontology development phases." "please refer to the discussion on "Cell". on 2024-11-21 This core-concept could further added to the scope for Phase 2.

Top of the document


Summary of the "Clinical Study" Alignment Discussion

Meetings on 2024-07-03 and 2024-08-01

Meeting notes

Two expert meetings were held on 3 July and 1 August 2024 as part of the Pistoia Alliance’s Pharma General Ontology (PGO) initiative to converge on a standardised definition of the concept “clinical study.” The discussions demonstrated a strong consensus among experts on adopting the definition provided by the NCI Thesaurus (NCIT), specifically concept code C15206 (NCIT_C15206).

During the 3 July meeting, it was noted that this definition aligns with the terminology requirements of several pharmaceutical partners. AstraZeneca emphasised the importance of establishing a common identifier source for clinical trials to ensure unambiguous referencing across systems. GSK proposed that a preference hierarchy for resolution services (e.g., prioritising identifiers.org) may be required. Merck KGaA expressed a preference for using the original source text of definitions, a position unanimously endorsed by the group as a guiding principle.

At the 1 August meeting (GitHub issue #34), experts considered input from the Pistoia Alliance’s Clinical Operations (ClinOps) group, which utilises a use-case-specific definition from the Unified Study Definition Model (USDM): “involves research using human volunteers (also called participants) that is intended to add to medical knowledge.” PGO experts concluded that this definition is not in conflict with NCIT_C15206. The more expansive second part of the NCIT definition—covering development of new technologies and various research domains—was viewed as descriptive commentary rather than essential for semantic interoperability. The equivalence of terms such as “human subject,” “volunteer,” and “participant” was acknowledged as an open point but not a critical obstacle.

Recommendation to the PGO Steering Group

The expert group recommends the adoption of NCIT_C15206 from the NCI Thesaurus as the preferred definition of clinical study within the PGO controlled vocabulary. The recommended definition is “Research conducted with human subjects or on material of human origin in which an investigator directly interacts with human subjects; includes development of new technologies, study of mechanisms of human diseases, therapy, clinical trials, epidemiologic, behaviour, and health services research.”

Top of the document


Summary of the “Site” alignment discussions

Dates of meetings and updates 2025-01-30; 2025-05-28 and 2025-06-05

The PGO expert group convened to assess candidate definitions for the core ontological concept of “Site”, with specific focus on its relevance for pharmaceutical research and development (R&ED). The discussion revealed significant conceptual complexity around the term “site,” which spans anatomical, geographic, institutional, and procedural domains. Consequently, experts distinguished between a high-level generic concept and more context-specific subtypes.

1. High-Level Concept: “Site”

Several ontological sources were considered for defining “Site”.

  • [BFO:0000029] Defines a site as a “three-dimensional immaterial entity… bounded by a material entity.” While philosophically rigorous and broadly applicable, its abstractness was perceived as a barrier to practical implementation in PGO.
  • in OMG Commons “Site” is defined as “a place, setting, or context in which something is situated or to which something is, or may be, bound.” This was generally preferred for its clarity and pragmatic scope. However, its implementation is currently impeded by unresolved IRIs (Issue #48), rendering it unsuitable for immediate adoption in the PGO.

The group recommended deferring a decision on the general “Site” concept to Phase 2, pending the availability of resolvable identifiers. For a definition of the ""Site""general concept, the experts are in favour of the OMG definition - which is also used by IDMP-O - namely: "" place, setting, or context in which something is situated or to which something is, or may be, bound "" There is however an issue (Github#48) : the OMG URL/ IRI cannot be resolved to date (2025-06-05).

2. Domain-Specific Concepts for "Site"

Recognising the limitations of a generic “Site” definition for some practical use cases in the R&ED domain, the experts proposed a stratified approach based on domain specificity.

  • “Study_Site” example source – [NCIT_C80403], described as a generic facility or institution where study-related activities occur. Experts found this definition sufficiently flexible for both clinical and non-clinical research contexts.
  • “Clinical_Study_Site” example source – [NCIT_C70777], which defines a site as “a healthcare organisation, an institution, a facility, a healthcare provider, or a part or a constituent of any of the above entities directly involved in conducting a particular clinical study.” This definition was strongly endorsed for representing the real-world entities engaged in clinical research and was considered well aligned with industry practices.

Recommendations to the PGO Steering Committee

  1. Defer the adoption of a high-level “Site” definition until a resolvable and implementable URI is available (notably for the OMG Site) with a definition such as: ""place, setting, or context in which something is situated or to which something is, or may be, bound ""
  2. Adopt a new core concept labeled “Clinical_Study_Site”, with the definition from NCIT_C70777: ""A healthcare organisation, an institution, a facility, a healthcare provider, or a part or a constituent of any of the above entities directly involved in conducting a particular clinical study.""
  3. Include Phase 2 additional context-specific core concepts:
  • “Study_Site”: as per NCIT_C80403.
  • “Anatomical_Site”: relevant for study subjects and tissue-specific applications.
  • “Binding_Site”: applicable to molecular and compound interaction contexts. • These labels are provisional and may require revision during expert alignment discussions.

Top of the document


Summary of the "Compound" Alignment Discussion

Meeting of 2025-03-25

In the meeting held on March 25, 2025, the working group evaluated multiple candidate definitions for the core concept “compound”, with particular attention to its usage in pharmaceutical and regulatory contexts. Among the options assessed, the definition provided by NCIT_C43366—“A substance formed by chemical union of two or more elements or ingredients in definite proportion by weight”—emerged as the least controversial and most broadly accepted. It was endorsed by at least three partner organisations and positively appraised by participants ([BM]: “most appropriate”; [PMQ]: “best definition out of those available”).

Alternative definitions, including those derived from IDMP-O (specifically, Hydrate), ISO 11238, and ChEBI (CHEBI:59999), were discussed but ultimately found less suitable for the PGO context:

  • The IDMP-O Hydrate concept was deemed too specific, referring to hydrated compounds and lacking a generalised definition of “compound.”
  • The ISO11238/11615 formulation was considered overly abstract, with a manufacturing-oriented framing that blends substance, product, and packaging components.
  • The ChEBI definition of chemical substance (CHEBI:59999) was viewed as too broad or generic, lacking the necessary precision for this context.

"Compound" seems to have a more precise, accurate and chemically composition than ""substance"". ""Chemical compound"" may be a better label, incidentally and compatible with [NCIT_C43366 (https://evsexplore.semantics.cancer.gov/evsexplore/concept/ncit/C43366).

Despite general agreement on NCIT_C43366, some reservations were noted concerning the imprecision of constituent terms such as “substance”, “element”, and “ingredient”, which are not formally defined within the NCIT framework and may introduce ambiguity in high-stakes regulatory or ontological mappings.

Recommendation to the PGO Steering Group

1- Consider changing the label from "compound" to "chemical_compound". 2- Adopt NCIT_C43366 as the working definition of “compound” (or "" chemical_compound"".) . While not ideal in its terminological rigor, it currently represents the most contextually appropriate, consensus-driven option available, offering sufficient clarity for practical implementation in R&ED

Top of the document


Summary of the “Device” Alignment Meeting

Date: 2025-02-13 ; February 13, 2025

The alignment meeting focused on defining the core concept “Device” for the Pharma General Ontology (PGO), addressing its foundational role in pharmaceutical research, development, and production. Given the heterogeneity of devices used across laboratory research, diagnostics, therapeutics, and manufacturing processes, the working group acknowledged the need for a broad and inclusive ontological treatment of the term.

Scope and Context for "device"

The term “Device” was reaffirmed as a necessary high-level entry point within the PGO structure. Participants emphasised that this concept should not be limited to medical or investigational contexts alone but should instead encompass a wide array of equipment and tools—including laboratory instruments, production machinery, and diagnostic platforms. A more specific concept such as “Medical Device” could be modelled as a subclass within this hierarchy.

The discussion also acknowledged the critical role of context in interpreting device-related data in R&ED workflows, especially in relation to regulatory, clinical, and manufacturing use cases.

Evaluation of Candidate Ontology Definitions

Six definitions from established ontologies were evaluated for suitability:

  • OBI:0000968 (Ontology for Biomedical Investigations): Recognised for BFO alignment but criticised for being overly narrow, tied to research investigations, and not easily generalisable to manufacturing or therapeutic devices.
  • NCIT_C62103 (NCI Thesaurus): Broadly defined and pragmatic. Considered sufficiently flexible to accommodate a wide range of device types, including research and medical contexts. Seen as the most appropriate candidate for the core concept “Device” due to its existing vocabulary structure and potential for subclassing.
  • IDMP-O_Medical_device: Offers an FDA-aligned definition suitable for “Medical Device” as a subclass. Detailed and specific, but viewed as too narrow for the overarching “Device” concept.
  • NCIT_C16830 (Medical Device): Suitable as a ""child class"" under the broader NCIT_C62103 “Device” definition.
  • FHIR:Device: Considered broad but potentially too imprecise in its scope, with overlapping non-medical interpretations.
  • NCIT_C19238: Describes manufactured objects used in diagnostic, therapeutic, or research activities. Excludes manufacturing equipment, and thus not considered comprehensive enough for the PGO core concept.

Preferred text Definition (NCIT_C62103) "A device used in medical, surgical, laboratory, or production settings, including instruments, apparatus, implements, or machines involved in diagnostic, therapeutic, or research activities."

This definition provides adequate breadth to accommodate the multifunctional nature of devices in pharmaceutical settings, while also supporting future hierarchical expansion.

Recommendations to the PGO Steering Committee

  1. Core Concept Adoption Adopt the label “Device” as a high-level core concept in the PGO.
  2. Use NCIT_C62103 as the reference definition for this concept.
  3. Consider additional related core-concepts
  • “Medical Device”: Suggested reference NCIT_C16830.
  • “Instrument”: Encompassing laboratory measurement tools and scientific equipment.
  • “Manufacturing Equipment”: Including pharmaceutical production machinery (e.g., tablet presses), to be included in future PGO extensions beyond R&ED.

Summary of the “Disease” Alignment Meeting

Date: December 4, 2024

The purpose of this alignment meeting was to evaluate existing definitions of the concept “disease” across multiple biomedical ontologies and controlled vocabularies, with the goal of recommending a harmonised definition suitable for adoption within the Pharmaceutical Global Ontology (PGO). The discussion also addressed the potential need to distinguish “disease” from the related but broader concept of “condition.”

##bKey Discussion Points

The working group evaluated different proposed sources which are currently used within pharmaceutical contexts:

  • NCIT:C2991 (National Cancer Institute Thesaurus) is widely used in the clinical domain.
  • DOID:4 (Disease Ontology) is used in the research domain and supported by at least two steering group members.
  • MeSH D004194 (Medical Subject Headings) is used to a lesser extent across both domains but is valued for the clarity of its definition.

The MeSH D004194 was generally preferred for its concise and clinically meaningful text definition: “A definite pathologic process with a characteristic set of signs and symptoms. It may affect the whole body or any of its parts, and its etiology, pathology, and prognosis may be known or unknown.”

  • NCIT C2991, while favoured by three steering group members for its ontological hierarchy and integration in clinical workflows, was criticised for being overly generic and potentially conflating “disease” with broader “conditions”.
  • DOID:4 was supported for its relevance to the research domain, but its definition was not discussed in depth during this session.

The group highlighted persistent ambiguity in the use of “disease” and “condition” in biomedical contexts. This led to a consensus that these concepts should be explicitly distinguished within the PGO to avoid confusion and improve semantic precision in downstream applications.

Further (after conversation that occurred in April 2025), "disease" and "condition" are not to be considered "material entities" (also referred to as "things"" in the expert exchanges) or "roles", but rather as "processes". This distinction my play a role in further versions of PGO.

Recommendations to the PGO Steering Group

  1. Definition Selection: Adopt the MeSH D004194 definition as the core definition of “disease” in the PGO, due to its clarity, clinical relevance, and expert consensus.
  2. Conceptual Separation: Introduce a distinct core concept labeled “Condition”, with a separate definition, to better capture broader or less well-defined health states that may not meet the criteria of a pathologic process.

Top of the document


Summary of the "Drug" Alignment Discussions

Date of meeting 2025-02-27.

The group has undertaken a comparative analysis of existing ontological and semantic representations of the term “drug”, highlighting its definitional complexity and contextual variability across biomedical domains. While the term is widely used, it lacks a universally accepted definition that captures both its pharmacological characteristics and the intentionality of use (e.g., therapeutic, performance-enhancing, diagnostic). The discussion emphasises that the concept of “drug” may evolve depending on the stage of the biomedical pipeline (e.g., research, development, regulatory submission, post-market surveillance), and is often embodied in physical products containing active substances.

Several reference models and ontologies were reviewed

  • CHEBI_23888: Used in chemical biology, this definition is chemically precise and versatile but lacks explicit mention of therapeutic intent. While it offers a good balance of broadness and precision, some critique its inclusion of “abused substances” and the absence of use-purpose, which could cause ambiguity in regulatory and clinical contexts. Despite this, it remains a strong candidate for use, especially as a short-form, ontology-aligned option.
  • NCIT_C1909 (Pharmacologic Substance): Defined by the National Cancer Institute Thesaurus (NCIT), this term is widely used in oncology and pharmacology. It is comprehensive for cancer-related substances and integrates “drug” as an alternative label. Reviewers noted it provides a good balance between specificity and breadth, making it a preferred choice for contexts demanding clarity and precision. However, some flagged its verbose hierarchy as potentially cumbersome for practical integration.
  • DRON_00000005 (Drug Product, Drug Ontology): This definition is very precise, focusing specifically on the product-level aspects of drugs, including dosage forms, routes of administration, and packaging. While useful in pharmaceutical manufacturing contexts, its narrow scope and conflation of “product” may be too limiting or confusing for broader biomedical applications.
  • BioLink Model: Drug: Compatible with the FDA’s regulatory definition, this term emphasises intentional therapeutic use, diagnosis, and structural/function modification, excluding devices. It provides a clear framework aligned with legal and clinical standards. This definition is useful in translational research and regulatory informatics, but some members found it too narrow, particularly in terms of broader biomedical data modelling.
  • Schema.org Drug: Designed primarily for web-based medical schemas, this definition was considered redundant and less precise than alternatives like NCIT_C1909. Its narrow utility and alignment with “medical therapy” contexts limit its applicability for comprehensive biomedical modelling.

Further, several PGO Core Concepts (Compound, Product) are noted as closely related to “drug” but do not fully resolve its semantic ambiguity. The group acknowledges the importance of aligning terminology across related concepts while striving for definitional clarity.

Recommendation to the PGO Steering group:

  1. The experts team recommends adopting CHEBI_23888 as the working definition of “drug” within the Pharma General Ontology. This term provides a widely accepted and practical balance between generality and specificity for use in life sciences and pharmaceutical contexts.
  2. However, it is also recommended that
  • The limitations of the CHEBI_23888 definition around intentionality and therapeutic purpose be explicitly noted.
  • “Drug” be treated as a synonym for more specific terms such as “pharmacologic substance” (NCIT_C1909) when contextual clarity is required.
  • A future proposal may be submitted to CHEBI to revise or clarify the definition to better accommodate contextual and regulatory dimensions.

Top of the document


Summary of the "Gene" Alignment Discussions

Meeting date 2024-08-29

The primary goal was to identify a broadly accepted, ontologically sound, and biologically accurate definition of “gene” that aligns with the needs of both research and applied pharmaceutical contexts.

Three candidate definitions were reviewed:

  1. NCIT_C16612 (National Cancer Institute Thesaurus): “A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.” Source: NCIT_C16612
  2. NCIT Definition: “A functional unit of heredity that occupies a specific position (locus) on a particular chromosome, is capable of reproducing itself exactly at each cell division, and directs the formation of a protein or other product.” This version emphasises heritability and expression mechanisms.
  3. Wikipedia/Schema.org Definition: “A discrete unit of inheritance which affects one or more biological traits.” (Source; see also Schema.org) This definition was noted as overly general and lacking specificity regarding molecular substrates (DNA/RNA).

The expert group agreed that while additional clarification on the molecular basis (DNA, RNA) may be beneficial, the least controversial and most widely accepted definition is the one provided by NCIT_C16612. This definition effectively balances biological precision with general applicability across research and clinical domains.

Recommendation to the PGO Steering Committee:

  1. The expert group recommends the following definition from (NCIT_C16612) as the preferred representation of “gene” within the PGO framework:“A functional unit of heredity which occupies a specific position on a particular chromosome and serves as the template for a product that contributes to a phenotype or a biological function.”

Top of the document


Summary of the “Indication” Alignment Meeting

Date: 2024-09-11

The alignment meeting focused on defining the core concept “Indication” within the Pharma General Ontology (PGO), a term central to the regulated lifecycle of medicinal products and clinical research. The group evaluated a series of definitions sourced from authoritative biomedical ontologies and regulatory glossaries to identify one that is broad, resolvable via a URI, and suitable for multi-context use in pharmaceutical R&D.

Conceptual Considerations

The group acknowledged the complexity of the term “Indication”, which spans both pre-approval (e.g., clinical investigation) and post-approval (e.g., regulatory labelling) contexts. Core challenges in terminology harmonisation included:

  • The restrictive use of terms such as “abnormal” or “pathological”, which may unnecessarily narrow the scope.
  • Differentiation between “sign” and “symptom”.
  • The regulatory distinction between investigational and approved uses of a medicinal product.
  • The need for a resolvable and publicly licensed URI, a key requirement for incorporation into PGO.

While definitions from IDMP-O and CDISC were considered valuable, the absence of resolvable unique identifiers in some cases limited their immediate applicability. Notably, the CDISC glossary version of “Indication” appears to reuse the NCIt concept C41184, but with a modified definition—raising concerns about versioning and semantic traceability.

Evaluation of Candidate Definitions

The following definitions were examined:

  • IDMP-O_Indication: Provides a structured approach, classifying indications as disease/symptom/procedure. Treated more as a classifier than a semantically rich standalone concept.
  • NCIT_C41184: Definition: Viewed as a robust, semantically acceptable candidate.
  • CDISC Glossary (no URI): Offers a practical and clinically-oriented definition

Text Definition and Label

The NCIT_C41184 definition was retained as the preferred reference due to its formal status, semantic breadth, and identifier accessibility. The CDISC phrasing was also recognised as valuable, especially for clinical trial contexts, and can be treated as an alternative expression of the recommended definition.

Recommendations to the PGO Steering Group

  1. Adopt NCIT_C41184 as the authoritative definition for the core concept “Indication” in the PGO.
  2. Acknowledge alternate phrasings, such as those from CDISC and IDMP-O, as contextually valid variants; both are resolvable by same URL.

Top of the document


Summary of the “Molecular Target” Alignment Meeting

Date: 2025-04-10, April 10, 2025

The alignment meeting focused on evaluating proposed, existing ontology and controlled vocabulary definitions for the term “target” in the context of pharmaceutical research and early development (R&ED). The objective was to identify a suitable reference concept that reflects both scientific precision and applicability across research and clinical domains.

The expert group emphasised that “molecular target” is a more appropriate and precise term than the broader “target”, particularly in R&ED contexts. The term “target” was considered too vague and potentially misleading (see previous discussion, April 10 2025).

Controlled Vocabulary, Thesaurus and Ontology Term Evaluation

Multiple sources were reviewed

  • NCIT:C16128 (NCI Thesaurus)- Preferred by multiple participants for its alignment with therapeutic and pharmacological applications. However, the entry lacks a formal definition and is framed as a descriptive statement, with emphasis on cellular processes and malignancies, thereby limiting its generalisability.
  • BAO_0003064 (BioAssay Ontology)- Considered too specific to assay design; not suitable for broader R&ED or clinical trial contexts.
  • GO:0003674 (Gene Ontology)- While it offers a strong definition of molecular function, its focus on gene products excludes other molecular entities (e.g., DNA), limiting its relevance.
  • DTO_00000004 (Drug Target Ontology)- Appropriate for drug development, yet overly narrow for exploratory research purposes.
  • NCIT_C25702 (NCI Thesaurus, general definition of “drug target”)- Provides a material-entity-based framing but lacks detailed annotation or critique.Conceptual Considerations The notion of molecular target appears to function more as a ""role"" than as a material entity, or ""thing"". This is consistent with other domain concepts such as drug and compound, which also carry role-based semantics. This ontological distinction poses challenges for standardisation and definition consistency across domains.

Recommendations to the PGO Steering Group

  1. Label Adjustment: Replace the core concept label “target” with the more precise “molecular_target”.
  2. Definition Selection: Provisionally adopt NCIT_C16128 as the working definition of molecular target, acknowledging its current limitations in scope and formalism.
  3. Future Recommended Actions
  • Consider minting a new, role-aware definition that more accurately captures the semantics of molecular target across R&ED, clinical, and regulatory domains.
  • Request modifications or enhancements to existing ontology definitions (e.g., NCI Thesaurus).

Top of the document


Summary of the “Product” alignment discussion

##Date: 2025-05-23, May 23, 2025

The meeting aimed to evaluate and refine the conceptual and semantic alignment of the term “Product”. This discussion was framed by the need for clarity and specificity in labelling within the PGO and broader ontological and terminology models used across the pharmaceutical and biomedical domains. Several alignment meetings related to the core concepts "compound", "substance" and "drug" also had elements related to this discussion.

Source Definitions Considered included

  • OMG Product Term: The IDMP Ontology (IDMP-O) builds on the OMG definition of “Product,” with specific refinements such as “pharmaceutical product” and “medicinal product.” It distinguishes between physical instances (e.g., “physical pharmaceutical product”) and specification-level abstractions. The IDMP-O terms are currently reused in ClinOps for regulatory and clinical operations purposes.
  • NCIT C42639: Sourced from CDISC and based on ISO 11615:2017 (3.1.60), this definition is widely accepted but may exclude non-medicinal or device-inclusive scenarios. Stakeholders noted this as the most promising candidate, though its current scope may require supplementation.
  • IDMP Medicinal Product: Also considered a robust definition, but viewed as potentially too narrow for inclusion of certain use cases such as combination products or non-drug entities.

Meeting notes

  • The expert address the PGO Steering Committee recommendations (Cambridge, UK, March 2025) to provide clarity and granularity as to the concept of ""product"", which is rather generic.
  • There is broad consensus that a more precise label such as “Pharmaceutical_Product” is preferred over the generic term “Product” to reduce semantic ambiguity, particularly for use in early R&D, regulatory documentation, and CMC (Chemistry, Manufacturing, and Controls) processes.
  • It was discussed whether medical devices—especially those integrated with drugs (e.g., inhalers, EpiPens)—should fall within the “Product” concept and actually help to clarify it. IDMP-O, being designed for medicinal products, currently does not encompass such hybrid or device-only entities. IDMP-O definitions overlap with ""substance"".
  • The CDISC-derived NCIT definition was identified as suitable for most PGO use cases. However, digital therapeutics, implanted mechanical devices, and companion diagnostics are not covered by any of the current definitions and may require a broader core concept or an additional ontology term to address their commercial and functional relevance.
  • A need was identified to recognise the commercial aspect of “product,” which includes digital solutions and other non-pharmaceutical items marketed by life science companies.

Recommendations to the PGO steering group

  1. Adopt as a label: "Pharmaceutical_Product" instead of "Product"
  2. Adopt as the NCIT_C42639/CDISC/ISO 11615 Definition of "Pharmaceutical_Product". “Qualitative and quantitative composition of a medicinal product in the dose form authorised by the regulatory authority for administration to patients, and as represented with any corresponding regulated product information. NOTE: A medicinal product may contain one or more pharmaceutical products. In many instances, the pharmaceutical product is the manufactured item. However, there are instances where the manufactured item undergoes further preparation before being administered to the patient (as the pharmaceutical product).”
  3. However, digital therapeutics, implanted mechanical devices, and companion diagnostics are not covered by any of the current definitions and may require a broader core concept or an additional ontology term to address their commercial and functional relevance.

Top of the document


Summary of "Protein" Alignment Discussions

Meetings on 2024-08-01 and 2024-08-15

During the meetings held on 1st and 15th August 2024, participants reviewed multiple definitions of “protein” from various ontological and organisational sources to inform the PGO initiative’s conceptual alignment. A majority of funding partners (three out of the group) expressed preference for the NCI Thesaurus ID: C17021 (http://purl.obolibrary.org/obo/NCIT_C17021), which was recognised as providing a broadly applicable and general definition of proteins. However, discussions also identified several limitations and points of divergence related to this definition.

Key areas of definitional variation highlighted by PGO partners include: 1. Synthesis route – a distinction between naturally occurring and synthetically produced proteins. 2. Functional role – ambiguity over whether proteins should be classified as therapeutic agents (e.g., monoclonal antibodies). 3. Ontological role – whether a protein is conceptualised as a substance or as a target entity in biomedical contexts.

Further, the IDMP Ontology (IDMP-O) defines “protein substance” as “a single substance with a defined sequence of alpha-amino acids connected through peptide bonds” (https://spec.pistoiaalliance.org/idmp/ontology/ISO/ISO11238-Substances/ProteinSubstance). However, this specification-oriented approach may limit its utility for the broader semantic scope required by PGO. The relevance of adjacent domains, such as production, was also noted.

Organisational perspectives added additional nuances:

  • AstraZeneca (AZ) provides a classical biochemical definition, emphasising proteins as nitrogenous organic compounds comprising amino acid chains, critical to structural and enzymatic functions, though no URI was provided.
  • Roche offers a more ontologically grounded definition, adapted from NCIt, focusing on proteins as genetically encoded entities with specific structural and functional properties, ranging significantly in molecular weight.

The follow-up meeting on 15th August 2024 identified specific issues with NCIT_C17021, such as its focus on naturally occurring proteins, plural group-level semantics, and explicit mention of L- versus D-amino acids. As an alternative, PR:000000001 (Protein Ontology) was discussed, which defines a protein as “an amino acid chain produced de novo by ribosome-mediated translation of genetically-encoded mRNA,” though its emphasis on natural biosynthesis was seen as a constraint.

There was a general consensus among experts to prefer definitions that describe the intrinsic substance-based identity of proteins, rather than definitions constrained by the method of synthesis.

The definition of ""protein"" from the Molecular Interactions controlled vocabulary was noted for its general applicability, broad expert support, and compatibility with existing ontologies such as IDMP-O. It represented the most favoured formulation within the expert group.

Proposal and Recommendation to the PGO steering committee

Adopt the Molecular Interactions controlled vocabulary for the definition of "Protein" , namely: “A linear polymer of amino acids joined by peptide bonds in a specific sequence,”

Top of the document


Summary of “Species" Alignment Discussions

Date of meeting: 2024-08-29 and 2025-09-11 (29 August 2024 and 11 September 2025)

First meeting notes 2024-08-29 (August 29 2024)

A series of expert consultations and meetings were conducted to refine the definition of the core concept “Species” within the Pharma General Ontology (PGO). The discussions focused particularly on the challenge of inclusively representing viruses, which are critical to pharmaceutical contexts yet often excluded by conventional species definitions.

Several concerns were raised regarding existing definitions:

  • The criterion “capable of breeding and producing fertile offspring” was deemed overly restrictive, particularly in the context of microorganisms and viruses.
  • The use of the term “living” risks excluding viruses, which are not universally considered living entities but are nonetheless central to pharmaceutical research.
  • The inclusion of viruses in the concept of species was emphasised as essential.

Additional questions arose concerning taxonomic rank granularity (e.g., Phylum, Class, Species) and the appropriateness of existing ontology references for accommodating viruses.

Experts from the PGO Expert Group were consulted on 30 August 2024. They were asked to recommend a reference definition of “species” that could inclusively represent viruses. Respondents included representatives from Bayer and Novo Nordisk.

  • Bayer noted the philosophical complexity of virus inclusion and advocated for a pragmatic solution that accepts viruses within the concept of species despite their taxonomic ambiguity (e.g., as discussed in https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4222810/).
  • Novo Nordisk indicated reliance on the NCBITaxon vocabulary, although acknowledging its limitations regarding explicit virus inclusion.

Second Meeting notes , 2024-09-11 (September 11 2024)

The follow-up meeting addressed persistent ambiguities:

  • The NCBITaxon vocabulary, while globally recognised, does not provide a formal definition of “species” and inherently excludes viruses in its definition but includes them in practice.
  • Attempts to reference conceptual frameworks such as Kevin De Queiroz’s work on species delimitation were found insufficient due to their self-referential nature. Systematic Biology, Volume 56, Issue 6, December 2007, Pages 879–886. - (https://doi.org/10.1016%2Fj.sjbs.2017.04.013)
  • A proposed definition — “a type of taxonomic rank qualifying a living entity or virus” — lacks a citable public reference.

Recommendation to the PGO Steering Group

  1. To reconcile conceptual inclusivity with practical needs, the expert group recommends adopting the UniProt definition Uniprot_Taxon for the PGO core concept labelled "species", “An element of a taxonomy for classifying life forms.”
  2. It is also advised that the inclusion of viruses be explicitly acknowledged in the PGO’s application of the concept."

Top of the document


Summary of the “Study_Subject” Alignment Meeting

Date: 2025-03-12

The purpose of this alignment meeting was to evaluate appropriate ontological definitions and labels for the concept currently referred to as “Subject-Person” within the context of the Pharma General Ontology (PGO). The expert group aimed to harmonise terminology relevant to both research and early development (R&ED) and clinical domains, while accounting for use cases involving non-human entities.

Label Clarification and Scope clarification

The group noted that the term “Study Subject” must be sufficiently general to encompass a wide range of entities observed or manipulated in the course of a scientific study. These entities may include:

  • Human participants (clinical trials)
  • Animal models
  • Environmental systems (e.g., soil, water, air)
  • Microbiome-related samples
  • Cell cultures and other laboratory models Thus, the term “subject” should not be restricted to humans. This led to consensus that the existing label “Subject-Person” is overly narrow and should be replaced.

Definition Selection

Multiple ontological sources were reviewed:

  • OMG (Object Management Group) term “subject” was dismissed as unsuitable due to its narrow interpretation as an “area of interest or expertise”.
  • NCIT_C14225 and NCIT_C25190 were considered too broad or not directly applicable to the investigative context of studies.
  • NCIT_C70668 (Clinical Study Subject, from CDISC) was endorsed for the clinical domain but seen as overly specific for broader R&ED purposes.
  • NCIT_C41189 was recommended as the most appropriate definition for general study subject use, with the text: “A matter or an individual that is observed, analyzed, examined, investigated, experimented upon, or/and treated in the course of a particular study.”

Conceptual Separation Between Domains

It was agreed that a distinction should be made between general “Study Subjects” and “Clinical Study Subjects” due to domain-specific requirements:

  • “Study_Subject” will serve as a general core concept for R&ED, inclusive of all types of observed entities.
  • “Clinical_Study_Subject” will be added as a distinct core concept to represent human participants in clinical trials, adopting NCIT_C70668 as its definition.

Data Provenance Considerations

The importance of provenance tracking for study subjects was emphasized. It was recommended that identifiers for “Study_Subject” should be structured to capture associations with study protocols, samples, data sets, and dates (e.g., using concatenated Subject_ID and Study_ID values to create unique identifiers).

Recommendations to the PGO Steering Group

  1. Label change: Replace the label “Subject-Person” with “Study_Subject”.
  2. Adopt the definition from NCIT_C41189.
  3. Introduction of a Clinical-Specific Concept, Add “Clinical_Study_Subject” as a distinct core concept, and adopt the definition from NCIT_C70668.
  4. Document Provenance and Identifier Best Practices. Ensure traceability of study subjects by linking them to relevant study metadata via structured identifiers.

Top of the document


Summary of the “Substance” Alignment Meeting

Date:2025-03-06 ; 2025-03-27 and 2025-05-23

The purpose of these alignment meetings was to evaluate candidate definitions for the concept of “Substance” within the Pharma General Ontology (PGO), in order to ensure semantic consistency and interoperability across pharmaceutical research, development, and regulatory domains. The meetings on 2025-03-27 and 2025-05-23 examined the definitions of "Compound" and "Product"" (later relabelled "pharmaceutical Product"), respectively.

Scope and Conceptual Considerations

The concept of “Substance” plays a foundational role in biomedical ontologies and regulatory vocabularies, yet its precise definition varies by context. Participants acknowledged the term’s broad use—ranging from material sciences to regulated pharmaceutical entities—and emphasized the need for a definition that balances scientific accuracy, regulatory relevance, and practical applicability.

The discussion addressed several tensions:

  • Material vs. informational definitions: The IDMP-O representation of “Substance” appears to frame it as a specification or information entity, diverging from ontologies such as NCIT that treat it as a material entity.
  • Semantic overlap with related terms such as Compound, Drug, and Product, which may act as roles or subclasses depending on usage and context.
  • The customary and functional use of “substance” in research settings (e.g., a 'plastic' cup, a tongue suppressor made of 'wood' ) to address materials.

Evaluation of Candidate Definitions

The following sources were critically reviewed:

  • IDMP-O_Substance: Utilized in regulated medicinal product contexts. Two partners indicated active use of this definition. Concern raised regarding its conceptual shift from material to specification-based definition.
  • NCIT_C1913: Broad inclusion but limited by scope (focused on naturally occurring substances). Intended for use in structural categorization within the NCIt hierarchy.
  • CHEBI_24431: Extensive chemical coverage but deemed too expansive; includes atoms as substances, which participants found semantically unsuitable for PGO use.
  • schema.org_Substance Appears to reuse the NCIT definition; no additional unique value identified.
  • FHIR_Substance: “A homogeneous material with a definite composition.” Considered too broad yet lacking specificity.
  • NCIT_C45306: Gained broad support from participating experts. Three partners confirmed alignment with existing usage. Considered both inclusive and sufficiently precise to support PGO implementation. Preferred over alternatives due to clarity, material-based interpretation, and resolvable URI.

Related Discussions

The experts noted that:

  • “Substance” should be distinguished from Compound and Product, though overlaps exist.
  • “Compound” was evaluated in a follow-up session on 27 March 2025, confirming that while some ""substances"" are ""compounds"", not all are.
  • The evaluation of ""Product"" in relation to ""Substance"" was addressed on 23 May 2025, indicating that whilke there is semantic overlap, not all ""Substances"" are ""products"" (re-labelled ""pharmaceutical product). Further examples are needed to illustrate substances that are not compounds, drugs, or pharmaceutical products, to clarify usage boundaries.

Text Definition: NCIT_C45306 “Any matter of defined composition that has discrete existence, whose origin may be biological, mineral or chemical.”

This definition offers both breadth and specificity, suitable for research and regulated domains alike.

Recommendations to the PGO Steering Group

  1. Retain “Substance” as a core concept, recognising its general relevance across the R&D continuum.
  2. Adopt NCIT_C45306 as the definition of “Substance” for the Pharma General Ontology.
  3. Acknowledge semantic relationships with Compound and Product, which may be further modelled in future version of PGO but maintained as distinct concepts."

Top of the document


Summary of the "Unit" Alignment Discussions-

Date of Meeting: 2024-10-24 and 2025-01-03 (24 October 2024 and 3 January 2025)

Notes of the meeting : 2024-10-24

The definitions under discussion were found to be broadly aligned, with no substantive disagreements noted. A preference was expressed for the QUDT Unit definition, due to its greater accuracy and precision.

The QUDT ontology, as used in the CMC Process Ontology, also defines a similar concept. Unless conflicting input is received from the CMC Process Ontology, the proposed approach is to adopt the QUDT Unit definition for the term unit. Should any such conflict arise, further consultation and alignment with the expert group will be necessary.

Additional Expert Input 2025-03-01

[IDMP-O – Elisa Kendall] In the IDMP-O initiative, QUDT was not adopted due to its initial lack of logical consistency. Although improvements have been made—particularly through efforts by the Industrial Ontologies Foundry (IOF)—some issues remain unresolved. Instead, the OMG Commons Ontology for Quantities and Units is being used in IDMP-O. This ontology is grounded in work from the systems engineering community for SysML v2 and has been developed collaboratively over several years. Participants include the European Space Agency, NASA/JPL, NIST, John Deere, Boeing, Airbus, Dassault Systèmes, and others.

Currently, the model supports scalar quantities, with plans to incorporate tensor and vector quantities. Additionally, a comprehensive library of over 700 units of measure will be released as a reference data vocabulary based on the Commons ontology. Importantly, QUDT has recently transitioned from OWL to SHACL, making it increasingly incompatible with IDMP-O. It is therefore recommended that the PGO and CMC initiatives review the IDMP-O work, including the available extensions and examples on GitHub, as a potential foundation for future ontology development.

[CMC Ontology – Birthe Nielsen, Project Manager, Pistoia Alliance] A related concept within the Process CMC Ontology is measurement process, defined as “Description of a planned process to determine the value of an attribute (specifically dependent continuant or temporal region or process characteristic) of an entity of interest.”

The CMC Process Ontology currently imports from QUDT to define quantities and units of measure consistently. However, the term unit is not explicitly defined in the current MVP version. As a result, the group has no objections to adopting the QUDT definition for unit.

Recommendation to the PGO Steering Group

The expert group recommends that the definition of unit from the QUDT ontology (http://qudt.org/schema/qudt/Unit) be adopted, with the label standardised as "unit".

Top of the document