08 developer overview - the-omics-os/lobster-local GitHub Wiki
Developer Overview - Lobster AI Architecture
๐๏ธ Overview
This guide provides a comprehensive introduction to developing within the Lobster AI codebase, covering architecture patterns, design principles, and development workflows. Lobster AI is a professional multi-agent bioinformatics analysis platform that combines specialized AI agents with proven scientific tools.
๐ฏ Core Design Principles
1. Agent-Based Architecture
- Specialized Agents: Each agent handles specific bioinformatics domains (transcriptomics, proteomics)
- Centralized Registry: Single source of truth for agent configuration via
AGENT_REGISTRY - Natural Language Interface: Users describe analyses in plain English
2. Modular Service Design
- Stateless Services: All analysis services are stateless and return
(processed_adata, statistics_dict) - Separation of Concerns: Agents coordinate workflows, services handle computation
- Reusable Components: Services can be used independently or composed in workflows
3. Multi-Modal Data Management
- DataManagerV2: Centralized orchestrator for multi-omics data with modality management
- Professional Naming: Consistent naming conventions for dataset versions and analysis stages
- Provenance Tracking: W3C-PROV compliant analysis history for reproducibility
4. Cloud/Local Hybrid Architecture
- BaseClient Interface: Consistent API for local and cloud execution
- Seamless Switching: Automatic detection and fallback between cloud and local modes
- Unified CLI: Single interface supporting both execution environments
๐๏ธ Architecture Components
Core Directories
lobster/
โโโ agents/ # Specialized AI agents for bioinformatics domains
โโโ core/ # Data management, client infrastructure, interfaces
โโโ tools/ # Stateless analysis services
โโโ config/ # Configuration management and agent registry
โโโ cli.py # Modern terminal interface with autocomplete
โโโ utils/ # Shared utilities and logging
Key Architectural Patterns
1. Agent Registry Pattern
# lobster/config/agent_registry.py
@dataclass
class AgentRegistryConfig:
name: str # Unique identifier
display_name: str # Human-readable name
description: str # Agent capabilities
factory_function: str # Module path to factory
handoff_tool_name: Optional[str] # Auto-generated tool name
AGENT_REGISTRY = {
'data_expert_agent': AgentRegistryConfig(...),
'transcriptomics_expert': AgentRegistryConfig(...),
'proteomics_expert': AgentRegistryConfig(...),
# ... more agents
}
2. Service Pattern
class QualityService:
"""Stateless service for data quality assessment."""
def assess_quality(self, adata: anndata.AnnData, **params) -> Tuple[anndata.AnnData, Dict]:
"""
Returns:
Tuple of (processed_adata, statistics_dict)
"""
# Stateless processing logic
return processed_adata, statistics
3. Agent Tool Pattern
@tool
def assess_data_quality(modality_name: str, **params) -> str:
"""Standard pattern for all agent tools."""
# 1. Validate modality exists
if modality_name not in data_manager.list_modalities():
raise ModalityNotFoundError(f"Modality '{modality_name}' not found")
# 2. Get data and call stateless service
adata = data_manager.get_modality(modality_name)
result_adata, stats = service.assess_quality(adata, **params)
# 3. Store results with descriptive naming
new_modality = f"{modality_name}_quality_assessed"
data_manager.modalities[new_modality] = result_adata
# 4. Log operation for provenance
data_manager.log_tool_usage("assess_data_quality", params, stats)
return formatted_response(stats, new_modality)
4. Client Adapter Pattern
# lobster/core/interfaces/base_client.py
class BaseClient(ABC):
@abstractmethod
def query(self, user_input: str, stream: bool = False) -> Dict[str, Any]:
pass
@abstractmethod
def get_status(self) -> Dict[str, Any]:
pass
# Implementations: AgentClient (local), CloudLobsterClient (cloud)
๐ง Development Setup
1. Environment Setup
# Clone repository
git clone <repository-url>
cd lobster
# Install development dependencies
make dev-install
# Activate environment
source .venv/bin/activate
# Verify installation
python -m lobster --help
2. Required Environment Variables
# Required API Keys
export AWS_BEDROCK_ACCESS_KEY="your-aws-access-key"
export AWS_BEDROCK_SECRET_ACCESS_KEY="your-aws-secret-key"
# Optional
export NCBI_API_KEY="your-ncbi-api-key"
export LOBSTER_CLOUD_KEY="your-cloud-api-key" # Enables cloud mode
3. Development Commands
# Run all tests
make test
# Fast parallel testing
make test-fast
# Code formatting
make format
# Linting
make lint
# Type checking
make type-check
# Start CLI
lobster chat
๐งช Scientific Workflows
Professional Naming Convention
geo_gse12345 # Raw downloaded data
โโโ geo_gse12345_quality_assessed # QC metrics added
โโโ geo_gse12345_filtered_normalized # Preprocessed data
โโโ geo_gse12345_doublets_detected # Doublet annotations
โโโ geo_gse12345_clustered # Leiden clustering + UMAP
โโโ geo_gse12345_markers # Differential expression
โโโ geo_gse12345_annotated # Cell type annotations
โโโ geo_gse12345_pseudobulk # Aggregated for DE analysis
Data Flow Architecture
User Input (CLI)
โ
LobsterClientAdapter โ BaseClient (AgentClient | CloudLobsterClient)
โ
Agent Registry โ Specialized Agent (data_expert, transcriptomics_expert, etc.)
โ
Agent Tools โ Stateless Services (QualityService, ClusteringService, etc.)
โ
DataManagerV2 โ Modality Management โ Storage Backends (H5AD, MuData)
โ
Results โ CLI Response with Visualizations
๐จ Code Style Guidelines
1. Python Standards
- Follow PEP 8 style guidelines
- Use type hints for all functions and methods
- Line length: 88 characters (Black formatting)
- Comprehensive docstrings for all public functions
2. Scientific Accuracy
- Prioritize scientific accuracy over performance optimizations
- Include comprehensive QC metrics at each analysis step
- Support batch effect detection and correction
- Implement proper missing value handling strategies
3. Error Handling
# Use specific exceptions
class ModalityNotFoundError(Exception):
pass
class ServiceError(Exception):
pass
# Proper error handling in tools
try:
result = service.process(data)
except ServiceError as e:
logger.error(f"Service error: {e}")
return f"Analysis failed: {str(e)}"
๐ Development Workflow
1. Adding New Features
- Design First: Consider how the feature fits into existing patterns
- Use Registry: For agents, add to
AGENT_REGISTRYinstead of manual graph edits - Follow Patterns: Use established service, tool, and adapter patterns
- Test Thoroughly: Include unit, integration, and scientific validation tests
- Document: Update relevant documentation files
2. Code Quality Checklist
- Type hints on all functions
- Comprehensive docstrings
- Error handling with specific exceptions
- Unit tests with 80%+ coverage
- Integration tests with real data
- Scientific validation where applicable
- CLI compatibility (local and cloud)
3. Pre-commit Hooks
# Install pre-commit hooks
pre-commit install
# Run manually
pre-commit run --all-files
๐ Performance Considerations
1. Memory Management
- Use memory-efficient data loading for large datasets
- Implement lazy loading where possible
- Monitor memory usage in long-running analyses
2. Computation Optimization
- Leverage GPU acceleration when available (ScVI, rapids)
- Use efficient algorithms for large-scale data
- Implement progress tracking for long operations
3. Caching Strategy
- File operations: 60s cache for cloud, 10s for local
- Intelligent caching for expensive computations
- Clear cache invalidation strategies
๐ Debugging and Troubleshooting
1. Common Issues
- Import Errors: Check environment activation and dependencies
- Agent Registry: Verify factory function paths are correct
- Data Loading: Check file permissions and formats
- Cloud Integration: Verify API keys and network connectivity
2. Debugging Tools
# Use structured logging
from lobster.utils.logger import get_logger
logger = get_logger(__name__)
# Enable debug mode
logger.setLevel(logging.DEBUG)
# Check system status
lobster chat
/status
3. Testing Connectivity
# Test agent registry
python -c "from lobster.config.agent_registry import AGENT_REGISTRY; print(list(AGENT_REGISTRY.keys()))"
# Test CLI with both clients
LOBSTER_CLOUD_KEY="" python -m lobster chat # Local mode
LOBSTER_CLOUD_KEY="key" python -m lobster chat # Cloud mode
๐ Further Reading
- Creating Agents Guide - Detailed agent development
- Creating Services Guide - Service implementation patterns
- Creating Adapters Guide - Data adapter development
- Testing Guide - Comprehensive testing framework
- CLAUDE.md - Complete architectural documentation
๐ฏ Quick Reference
Key Files to Know
lobster/config/agent_registry.py- Agent configuration registrylobster/core/interfaces/base_client.py- Client interface definitionlobster/core/data_manager_v2.py- Multi-modal data orchestratorlobster/cli.py- CLI implementation with autocompletetests/conftest.py- Test configuration and fixtures
Essential Commands
make dev-install # Development setup
make test # Run all tests
lobster chat # Start interactive CLI
/help # Show available commands
/status # System status
/files # List workspace files
This overview provides the foundation for contributing to Lobster AI. Each component follows established patterns that promote consistency, maintainability, and scientific rigor.