Identifiers: the ISO IEC 11179‐3 Scoped_Identifier Model - Hadden-Industries/universal-ontology GitHub Wiki

1. Introduction: The Deceptive Simplicity of Identifiers

In any data system, the identifier is the bedrock of identity. It is the primary key, the unique handle, the supposedly immutable string of characters that allows us to distinguish one entity from another. The common practice in application development is to treat this as a solved problem: generate a UUID, create an auto-incrementing integer, or define a product_code column, and the task is complete. This approach, however, reveals its profound inadequacy when systems must scale, integrate, and interoperate. In a complex enterprise landscape, the question "What is the ID for this item?" is rarely simple and almost never has a single, universal answer.

The challenges of managing identifiers mirror those of managing names, but with higher stakes. While an ambiguous name can cause confusion, a mismanaged identifier can lead to catastrophic data corruption. The key failure modes are:

Identifier Collision: The most obvious risk. Two different systems, developed in isolation, might independently assign the identifier 12345 to two completely different things—one a customer, the other a purchase order. When these systems are integrated, identity is lost, and data integrity is compromised.
Context-Dependency: An identifier is rarely global. A person has a government-issued national insurance number, a company-issued employee ID, and a customer number for their bank. Each of these is a valid identifier, but only within its specific issuing context. A model that stores only one person_id cannot accommodate this reality.
Lack of Governance: Who has the authority to issue identifiers? What is their format? Are they versioned? A simple string or integer column in a database captures none of this essential governance metadata, leaving the rules for identification implicit and unenforceable.

To navigate this complexity, a data model must treat identification not as a simple attribute but as a managed, contextual, and governed relationship between an entity and its identifier. The ISO/IEC 11179-3 metamodel provides exactly this. It deconstructs the act of identification into its fundamental components, offering a robust and theoretically sound framework for managing identifiers at enterprise scale. This analysis will explore the model's core construct, the Scoped_Identifier, and its relationship with neighbouring concepts like Namespace to demonstrate how it provides a superior solution for real-world data identification challenges.

2. The Core Principle: Separating the 'What' from the 'How'

Just as the ISO/IEC 11179-3 Designation model separates the concept from its name, the identification model begins with a crucial separation of concerns. It distinguishes the abstract "thing" being identified from the mechanism of its identification. This is achieved through a chain of three distinct classes:

Item: This is the abstract superclass for "anything perceivable or conceivable" that the registry needs to manage. It is the conceptual entity itself—the product, the customer, the regulation—independent of any name or identifier.
Identified_Item: This class acts as an intermediary, representing the fact that an Item is being identified. It is linked to the Item via the item_identification association. This may seem like an unnecessary layer, but it is a deliberate design choice that allows the model to cleanly separate the core Item from the administrative details of its registration and identification.
Scoped_Identifier: This is the class that holds the identifier itself. It contains the actual string of characters and its version, and it is linked to the Identified_Item it identifies.

This three-part structure forces a disciplined approach. Instead of a Product table with a product_id column, the model establishes a relationship: an Item (the product concept) is identified by an Identified_Item, which in turn has one or more Scoped_Identifiers. This relational structure correctly models reality: an identifier is not an intrinsic property of an item but a separate entity that points to it within a specific context. This architecture is what enables the model to handle the real-world complexity of multiple, context-dependent identifiers for a single conceptual thing.

3. The Anatomy of an Identifier: The `Scoped_Identifier` Class

The Scoped_Identifier class is where the actual identifier is modelled. Its design is lean but powerful, focusing on the essential characteristics of an identifier in a governed system.

The key attributes are:

Attribute	Datatype	Description
`identifier`	`String`	The actual sequence of characters used to identify the item within a given scope. Unlike a name, an identifier is linguistically neutral.
`version`	`String`	A string that uniquely identifies the version of the identifier. This is a critical feature for managing the lifecycle of metadata, allowing for clear distinction between, for example, version 1.0 and 1.1 of a data standard.

However, the true innovation of the model is not contained within the Scoped_Identifier class itself, but in its mandatory relationship with the Namespace class. The very name Scoped_Identifier signals its core design principle: an identifier has no meaning outside of its scope.

4. The Solution to Ambiguity and Collision: The `Namespace` Class

The Namespace class is the model's primary mechanism for managing the context and scope of identifiers, thereby preventing collisions and resolving ambiguity. A Namespace is a "scoping construct" that groups a set of identifiers for a particular business need. Every Scoped_Identifier must be associated with exactly one Namespace via the identifier_scope association.

This single relationship solves the identifier collision problem by design. The identifier 12345 from the "HR System Namespace" is a completely distinct Scoped_Identifier instance from the identifier 12345 in the "Finance System Namespace," even though their identifier attribute strings are identical. The uniqueness of an identifier is guaranteed only within its Namespace.

Furthermore, the Namespace class provides a powerful set of governance attributes that allow an organisation to enforce specific rules for the identifiers within its scope:

naming_authority: An attribute that links to the Organization responsible for assigning identifiers within that Namespace. This makes governance explicit: we know who "owns" the identifiers.
one_name_per_item_indicator: A boolean flag that, when TRUE, enforces that a single Item can only have one identifier within this Namespace. This prevents aliases and is crucial for domains requiring absolute precision, such as pharmaceutical drug identifiers or financial instrument codes.
one_item_per_name_indicator: A boolean flag that, when TRUE, enforces that a given identifier string can only point to one Item within the Namespace. This prevents the same ID from being accidentally reused for different concepts, eliminating ambiguity.

These features transform identification from an uncontrolled, application-level concern into a governed, enterprise-wide capability. The business can now model complex, real-world identification scenarios with precision.

Real-World Example: Identifying a Person

Consider a single conceptual Item representing a person, Jane Doe. This single Item can be linked to multiple Scoped_Identifier instances, each within a different Namespace, to capture their various official identifiers:

`Scoped_Identifier.identifier`	`Namespace`	`Namespace.naming_authority`	Notes
"AB123456C"	"UK National Insurance Number"	"HM Revenue & Customs"	The identifier is unique and unambiguous within the national system.
"EMP98765"	"ACME Corp Employee IDs"	"ACME Corp HR Department"	The identifier is unique within the company's HR system. The `one_item_per_name_indicator` would be TRUE.
"CUST001-45B"	"Global Bank Customer IDs"	"Global Bank"	The identifier is unique within the bank's global customer database.

This model correctly and elegantly represents the reality that Jane Doe has multiple, valid identifiers. It avoids the flawed approach of trying to pick one "primary" ID or adding multiple id_type_1, id_type_2 columns to a Person table. The model provides the structure to ask precise questions like, "What is the National Insurance Number for the person identified as 'EMP98765' in the ACME HR system?"

5. Conclusion: A Robust Framework for Enterprise Identity

The ISO/IEC 11179-3 model for identification, centred on the Scoped_Identifier and Namespace classes, provides a robust and sophisticated framework that is far superior to simplistic, attribute-based approaches. Its strength lies in its adherence to sound data modelling principles and its direct acknowledgment of the complexities of managing identity in the real world.

The key advantages of the model are:

It Prevents Collision by Design: By making the Namespace a mandatory context for every Scoped_Identifier, the model eliminates the risk of identifier collision between different systems or authorities.
It Makes Governance Explicit: Through the Namespace class's attributes, the model provides machine-readable rules for uniqueness, authority, and format, moving governance from policy documents into the operational metadata layer.
It Reflects Reality: The model's relational structure correctly represents the fact that a single conceptual entity can have multiple valid identifiers issued by different authorities for different purposes.
It is Universally Applicable: Like the Designation model, the use of the abstract Item superclass means this identification framework can be applied to any type of asset an organisation needs to manage, from data elements and business processes to physical devices and legal contracts.

Adopting this model is a strategic decision to treat identification as a first-class data governance discipline. It provides the necessary architectural foundation to build integrated, interoperable, and trustworthy data ecosystems, ensuring that when a system asks for an item by its ID, it gets the right thing, every time, without ambiguity.