Declarative Knowledge Bases - joehubert/ai-agent-design-patterns GitHub Wiki
Classification
Intent
To provide agents with structured, factual information repositories that can be queried when specific domain knowledge is needed, enhancing accuracy and reducing hallucinations by grounding agent responses in verified facts.
Also Known As
- Knowledge Graphs
- Fact Stores
- Domain Knowledge Repositories
- Semantic Knowledge Bases
Motivation
Large language models (LLMs) have impressive knowledge embedded in their parameters, but this knowledge has limitations:
- It may be incomplete or outdated
- It can't be directly verified or audited
- It often lacks domain-specific expertise
- Updates require retraining or fine-tuning the entire model
Consider a healthcare assistant that needs to provide accurate information about medications, treatments, and diseases. Relying solely on the LLM's internal knowledge:
- Risks providing outdated or incorrect medical advice
- Cannot guarantee compliance with the latest clinical guidelines
- Has no structured way to verify information sources
- Makes it difficult to incorporate new medical research
A Declarative Knowledge Base addresses these challenges by creating an external, structured repository of facts that:
- Contains verified, authoritative information from trusted sources
- Can be queried precisely when domain knowledge is needed
- Provides explicit provenance for factual information
- Can be updated independently of the underlying LLM
- Enables transparent auditing of knowledge sources
Applicability
Use the Declarative Knowledge Base pattern when:
- Your application requires high accuracy for specific domains (medicine, law, finance, science)
- You need verifiable sources for factual claims
- Knowledge must be frequently updated without retraining the underlying model
- Your application serves regulated industries with compliance requirements
- Domain expertise is specialized and not likely to be well-represented in general LLM training
- You need to manage conflicting information from different authoritative sources
- Applications require reasoning over a large body of structured facts
Structure
To do...
Components
-
Knowledge Repository: The primary storage mechanism holding structured facts, which may be implemented as:
- A graph database storing entities and relationships
- Vector databases for semantic retrieval
- Structured data in SQL/NoSQL databases
- Ontologies with formal logic rules
-
Knowledge Schema: The organizational framework defining:
- Entity types and their properties
- Relationship types between entities
- Constraints and validation rules
- Confidence scores and uncertainty metrics
-
Knowledge Acquisition System: Mechanisms for populating and updating the knowledge base:
- Manual curation by domain experts
- Automated extraction from authoritative documents
- Integration with existing knowledge bases
- Feedback loops to correct and refine knowledge
-
Query Interface: Methods for the agent to access the knowledge:
- Natural language to structured query conversion
- Semantic search capabilities
- Relevance ranking algorithms
- Support for complex logical queries
-
Knowledge Integrator: Components that blend knowledge base results with LLM capabilities:
- Context window optimization for retrieved facts
- Reasoning mechanisms over retrieved facts
- Source attribution and citation generation
- Confidence evaluation of knowledge base outputs
Interactions
-
When an agent needs domain-specific information, it formulates a knowledge query based on the current context and user request.
-
The Query Interface translates the natural language or semantic query into a structured format that matches the Knowledge Schema.
-
The Knowledge Repository processes the query against its stored facts, retrieving relevant information.
-
Retrieved facts include metadata about their sources, confidence levels, and last verification dates.
-
The Knowledge Integrator combines these facts with the agent's reasoning capabilities to produce a response that is:
- Grounded in verified information
- Properly attributed to sources
- Clear about confidence levels and any knowledge gaps
-
User feedback about the accuracy or utility of provided information may be routed back to the Knowledge Acquisition System to improve the knowledge base.
Consequences
Benefits
- Enhanced Accuracy: Grounds agent responses in verified facts, reducing hallucinations
- Transparency: Enables clear attribution of sources for factual claims
- Maintainability: Knowledge can be updated independently of the LLM
- Specialization: Supports deep domain expertise beyond general LLM training
- Regulatory Compliance: Provides audit trails for factual information in regulated domains
- Consistency: Ensures coherent information across multiple agent interactions
- Scalability: Knowledge can grow without increasing model size
Limitations
- Integration Complexity: Requires sophisticated mechanisms to blend LLM reasoning with structured knowledge
- Maintenance Overhead: Knowledge bases require ongoing curation and updates
- Coverage Gaps: Difficult to achieve comprehensive coverage of all potentially relevant facts
- Query Limitations: Not all natural language queries map cleanly to knowledge structure
- Schema Design Challenges: Creating effective knowledge schemas is difficult and may require iterations
- Cold Start Problem: Initial population of the knowledge base requires significant effort
- Reasoning Complexity: Some types of knowledge are difficult to represent as discrete facts
Performance Implications
- Query Latency: Knowledge base queries add processing time to agent responses
- Storage Requirements: Comprehensive knowledge bases may require substantial storage
- Scaling Considerations: Performance may degrade as the knowledge base grows without proper indexing
- Caching Strategies: Frequently accessed facts should be cached for performance
- Batch Updates: Knowledge base updates may need to be performed in batches to minimize disruption
Implementation
-
Define Knowledge Scope:
- Identify the specific domains requiring declarative knowledge
- Determine the appropriate level of granularity for facts
- Establish fact verification and confidence assessment protocols
-
Design Knowledge Schema:
- Create entity-relationship models for the domain
- Define properties, relationships, and constraints
- Establish provenance and metadata requirements
-
Select Storage Technology:
- Choose appropriate databases based on query patterns and scale
- Implement indexing strategies for efficient retrieval
- Set up backup and recovery mechanisms
-
Develop Knowledge Acquisition Workflows:
- Build tools for expert knowledge entry and verification
- Implement automated extraction from trusted sources
- Create validation processes for new knowledge entries
-
Create Query Mechanisms:
- Develop natural language to structured query conversion
- Implement relevance ranking and semantic matching
- Build support for complex logical queries
-
Integrate with Agent System:
- Design prompting strategies that incorporate retrieved facts
- Implement attribution and confidence signaling
- Create fallback mechanisms when knowledge is incomplete
-
Establish Maintenance Processes:
- Schedule regular reviews and updates
- Create version control for knowledge base changes
- Develop metrics for knowledge base health and coverage
Code Examples
To do...
Variations
Domain-Specific Knowledge Graphs
Specialized knowledge structures optimized for particular domains, such as:
- Medical knowledge graphs with detailed relationships between conditions, treatments, and outcomes
- Legal knowledge bases capturing statutes, case law, and jurisdictional variations
- Scientific knowledge repositories organizing research findings and experimental data
Federated Knowledge Bases
Systems that query multiple specialized knowledge sources and aggregate the results:
- Distribute queries across domain-specific repositories
- Resolve conflicts between different authoritative sources
- Maintain separate update cycles for different knowledge domains
Probabilistic Knowledge Bases
Repositories that explicitly model uncertainty in factual information:
- Assign confidence scores to facts and relationships
- Track conflicting claims from different sources
- Update beliefs based on new evidence using Bayesian methods
Hybrid Semantic-Vector Approaches
Systems that combine structured knowledge representation with vector-based retrieval:
- Use vector embeddings for semantic similarity queries
- Fall back to strict logical queries when precision is critical
- Blend retrieved information from both approaches
Real-World Examples
-
IBM Watson for Oncology: Uses a medical knowledge base containing information about cancer treatments, clinical guidelines, and research findings to provide evidence-based treatment recommendations to clinicians.
-
Legal Research Systems: Platforms like LexisNexis and Westlaw maintain extensive knowledge bases of case law, statutes, and legal commentary that can be queried to support legal research and case preparation.
-
Semantic Scholar: Academic search engine that maintains a knowledge graph of scientific papers, authors, and concepts to improve search relevance and support literature discovery.
-
Google's Knowledge Graph: Powers Google's information boxes in search results, containing billions of facts about people, places, and things that enrich search responses with structured information.
Related Patterns
-
Retrieval-Augmented Generation (RAG): While RAG typically works with unstructured documents, Declarative Knowledge Bases provide structured facts. The two patterns can be combined for both narrative and factual knowledge retrieval.
-
Episodic Memory: Complements Declarative Knowledge by storing interaction history rather than domain facts. Together they provide both factual and conversational context.
-
Reflection: Agents can use Declarative Knowledge to verify their reasoning against established facts, supporting more accurate self-assessment.
-
Fallback Chains: When Declarative Knowledge is incomplete, systems can fall back to other knowledge sources or explicitly state knowledge limitations.
-
Decision Trail Recording: Can document which facts from the knowledge base were used in forming responses, supporting auditability.