Declarative Knowledge Bases - joehubert/ai-agent-design-patterns GitHub Wiki

Classification

Intent

To provide agents with structured, factual information repositories that can be queried when specific domain knowledge is needed, enhancing accuracy and reducing hallucinations by grounding agent responses in verified facts.

Also Known As

Knowledge Graphs
Fact Stores
Domain Knowledge Repositories
Semantic Knowledge Bases

Motivation

Large language models (LLMs) have impressive knowledge embedded in their parameters, but this knowledge has limitations:

It may be incomplete or outdated
It can't be directly verified or audited
It often lacks domain-specific expertise
Updates require retraining or fine-tuning the entire model

Consider a healthcare assistant that needs to provide accurate information about medications, treatments, and diseases. Relying solely on the LLM's internal knowledge:

Risks providing outdated or incorrect medical advice
Cannot guarantee compliance with the latest clinical guidelines
Has no structured way to verify information sources
Makes it difficult to incorporate new medical research

A Declarative Knowledge Base addresses these challenges by creating an external, structured repository of facts that:

Contains verified, authoritative information from trusted sources
Can be queried precisely when domain knowledge is needed
Provides explicit provenance for factual information
Can be updated independently of the underlying LLM
Enables transparent auditing of knowledge sources

Applicability

Use the Declarative Knowledge Base pattern when:

Your application requires high accuracy for specific domains (medicine, law, finance, science)
You need verifiable sources for factual claims
Knowledge must be frequently updated without retraining the underlying model
Your application serves regulated industries with compliance requirements
Domain expertise is specialized and not likely to be well-represented in general LLM training
You need to manage conflicting information from different authoritative sources
Applications require reasoning over a large body of structured facts

Structure

To do...

Components

Knowledge Repository: The primary storage mechanism holding structured facts, which may be implemented as:
- A graph database storing entities and relationships
- Vector databases for semantic retrieval
- Structured data in SQL/NoSQL databases
- Ontologies with formal logic rules
Knowledge Schema: The organizational framework defining:
- Entity types and their properties
- Relationship types between entities
- Constraints and validation rules
- Confidence scores and uncertainty metrics
Knowledge Acquisition System: Mechanisms for populating and updating the knowledge base:
- Manual curation by domain experts
- Automated extraction from authoritative documents
- Integration with existing knowledge bases
- Feedback loops to correct and refine knowledge
Query Interface: Methods for the agent to access the knowledge:
- Natural language to structured query conversion
- Semantic search capabilities
- Relevance ranking algorithms
- Support for complex logical queries
Knowledge Integrator: Components that blend knowledge base results with LLM capabilities:
- Context window optimization for retrieved facts
- Reasoning mechanisms over retrieved facts
- Source attribution and citation generation
- Confidence evaluation of knowledge base outputs

Interactions

When an agent needs domain-specific information, it formulates a knowledge query based on the current context and user request.
The Query Interface translates the natural language or semantic query into a structured format that matches the Knowledge Schema.
The Knowledge Repository processes the query against its stored facts, retrieving relevant information.
Retrieved facts include metadata about their sources, confidence levels, and last verification dates.
The Knowledge Integrator combines these facts with the agent's reasoning capabilities to produce a response that is:
- Grounded in verified information
- Properly attributed to sources
- Clear about confidence levels and any knowledge gaps
User feedback about the accuracy or utility of provided information may be routed back to the Knowledge Acquisition System to improve the knowledge base.

Consequences

Benefits

Enhanced Accuracy: Grounds agent responses in verified facts, reducing hallucinations
Transparency: Enables clear attribution of sources for factual claims
Maintainability: Knowledge can be updated independently of the LLM
Specialization: Supports deep domain expertise beyond general LLM training
Regulatory Compliance: Provides audit trails for factual information in regulated domains
Consistency: Ensures coherent information across multiple agent interactions
Scalability: Knowledge can grow without increasing model size

Limitations

Integration Complexity: Requires sophisticated mechanisms to blend LLM reasoning with structured knowledge
Maintenance Overhead: Knowledge bases require ongoing curation and updates
Coverage Gaps: Difficult to achieve comprehensive coverage of all potentially relevant facts
Query Limitations: Not all natural language queries map cleanly to knowledge structure
Schema Design Challenges: Creating effective knowledge schemas is difficult and may require iterations
Cold Start Problem: Initial population of the knowledge base requires significant effort
Reasoning Complexity: Some types of knowledge are difficult to represent as discrete facts

Performance Implications

Query Latency: Knowledge base queries add processing time to agent responses
Storage Requirements: Comprehensive knowledge bases may require substantial storage
Scaling Considerations: Performance may degrade as the knowledge base grows without proper indexing
Caching Strategies: Frequently accessed facts should be cached for performance
Batch Updates: Knowledge base updates may need to be performed in batches to minimize disruption

Implementation

Define Knowledge Scope:
- Identify the specific domains requiring declarative knowledge
- Determine the appropriate level of granularity for facts
- Establish fact verification and confidence assessment protocols
Design Knowledge Schema:
- Create entity-relationship models for the domain
- Define properties, relationships, and constraints
- Establish provenance and metadata requirements
Select Storage Technology:
- Choose appropriate databases based on query patterns and scale
- Implement indexing strategies for efficient retrieval
- Set up backup and recovery mechanisms
Develop Knowledge Acquisition Workflows:
- Build tools for expert knowledge entry and verification
- Implement automated extraction from trusted sources
- Create validation processes for new knowledge entries
Create Query Mechanisms:
- Develop natural language to structured query conversion
- Implement relevance ranking and semantic matching
- Build support for complex logical queries
Integrate with Agent System:
- Design prompting strategies that incorporate retrieved facts
- Implement attribution and confidence signaling
- Create fallback mechanisms when knowledge is incomplete
Establish Maintenance Processes:
- Schedule regular reviews and updates
- Create version control for knowledge base changes
- Develop metrics for knowledge base health and coverage

Code Examples

To do...

Variations

Domain-Specific Knowledge Graphs

Specialized knowledge structures optimized for particular domains, such as:

Medical knowledge graphs with detailed relationships between conditions, treatments, and outcomes
Legal knowledge bases capturing statutes, case law, and jurisdictional variations
Scientific knowledge repositories organizing research findings and experimental data

Federated Knowledge Bases

Systems that query multiple specialized knowledge sources and aggregate the results:

Distribute queries across domain-specific repositories
Resolve conflicts between different authoritative sources
Maintain separate update cycles for different knowledge domains

Probabilistic Knowledge Bases

Repositories that explicitly model uncertainty in factual information:

Assign confidence scores to facts and relationships
Track conflicting claims from different sources
Update beliefs based on new evidence using Bayesian methods

Hybrid Semantic-Vector Approaches

Systems that combine structured knowledge representation with vector-based retrieval:

Use vector embeddings for semantic similarity queries
Fall back to strict logical queries when precision is critical
Blend retrieved information from both approaches

Real-World Examples

IBM Watson for Oncology: Uses a medical knowledge base containing information about cancer treatments, clinical guidelines, and research findings to provide evidence-based treatment recommendations to clinicians.
Legal Research Systems: Platforms like LexisNexis and Westlaw maintain extensive knowledge bases of case law, statutes, and legal commentary that can be queried to support legal research and case preparation.
Semantic Scholar: Academic search engine that maintains a knowledge graph of scientific papers, authors, and concepts to improve search relevance and support literature discovery.
Google's Knowledge Graph: Powers Google's information boxes in search results, containing billions of facts about people, places, and things that enrich search responses with structured information.

Related Patterns

Retrieval-Augmented Generation (RAG): While RAG typically works with unstructured documents, Declarative Knowledge Bases provide structured facts. The two patterns can be combined for both narrative and factual knowledge retrieval.
Episodic Memory: Complements Declarative Knowledge by storing interaction history rather than domain facts. Together they provide both factual and conversational context.
Reflection: Agents can use Declarative Knowledge to verify their reasoning against established facts, supporting more accurate self-assessment.
Fallback Chains: When Declarative Knowledge is incomplete, systems can fall back to other knowledge sources or explicitly state knowledge limitations.
Decision Trail Recording: Can document which facts from the knowledge base were used in forming responses, supporting auditability.