13 api overview - the-omics-os/lobster-local GitHub Wiki

API Reference Overview

Introduction

The Lobster AI API provides a comprehensive set of interfaces for multi-omics bioinformatics analysis through a professional agent-based architecture. This reference documentation covers all public APIs, classes, and interfaces available in the Lobster AI system.

API Organization

The Lobster AI API is organized into five main categories:

1. Core API (`lobster.core`)

The foundational layer providing data management, client interfaces, and system orchestration:

DataManagerV2: Multi-modal data orchestration with provenance tracking
Client Interfaces: BaseClient, AgentClient, APIAgentClient for local/cloud execution
Provenance System: W3C-PROV compliant tracking of analysis operations
Schema Validators: Data validation for transcriptomics and proteomics

2. Agent API (`lobster.agents`)

Specialized AI agents for different analytical domains:

SingleCell Expert: Single-cell RNA-seq analysis with formula-guided differential expression
Bulk RNA-seq Expert: Bulk RNA-seq analysis with pyDESeq2 integration
Proteomics Experts: MS and affinity proteomics analysis (DDA/DIA workflows)
Data Expert: Data loading, quality assessment, and GEO dataset management
Research Agent: Literature mining and computational parameter extraction
ML Expert: Machine learning transformations and model preparation

3. Services API (`lobster.tools`)

Stateless analysis services implementing scientific algorithms:

Transcriptomics Services: Preprocessing, clustering, differential analysis
Proteomics Services: Missing value handling, normalization, statistical testing
Utility Services: GEO downloading, publication mining, visualization

4. Interface Definitions (`lobster.core.interfaces`)

Abstract base classes defining system contracts:

BaseClient: Client interface for local/cloud consistency
IDataBackend: Storage backend abstraction (H5AD, MuData, cloud)
IModalityAdapter: Data format adapters for different modalities
IValidator: Flexible validation with warnings instead of hard failures

5. Configuration API (`lobster.config`)

Agent registry and system configuration:

Agent Registry: Centralized agent configuration and discovery
Model Configuration: Per-agent LLM settings with fallback mechanisms
System Settings: Environment-based configuration management

Key Design Principles

Agent-Based Architecture

Specialist Agents: Each agent handles specific biological domains
Tool Pattern: All agent tools follow validate → service → store → log pattern
Centralized Registry: Single source of truth for agent configuration
Dynamic Handoffs: Automatic agent-to-agent task routing

Modular Data Management

Multi-Modal Support: Unified handling of transcriptomics, proteomics, and future modalities
Professional Naming: Consistent dataset naming conventions throughout pipeline
Provenance Tracking: Complete audit trail of all processing operations
Schema Validation: Type-safe data handling with modality-specific requirements

Cloud/Local Hybrid Design

Interface Consistency: Same API works for local and cloud execution
Graceful Fallback: Automatic switching between execution modes
Unified CLI: Single command-line interface for all deployment types
Session Management: Consistent state handling across environments

Scientific Rigor

Publication Quality: All analyses meet scientific publication standards
Error Handling: Comprehensive validation with actionable error messages
Reproducibility: Complete provenance and parameter tracking
Best Practices: Implementation of current bioinformatics standards

API Conventions

Method Signatures

All service methods follow the stateless pattern:

def analyze_method(adata: anndata.AnnData, **params) -> Tuple[anndata.AnnData, Dict[str, Any]]

Agent Tool Pattern

All agent tools follow the standard pattern:

@tool
def agent_tool(modality_name: str, **params) -> str:
    # 1. Validate modality exists
    # 2. Call stateless service
    # 3. Store results with descriptive naming
    # 4. Log operation for provenance
    # 5. Return formatted response

Error Handling

Specific Exceptions: Custom exception hierarchy for different error types
Validation Results: Flexible validation with errors, warnings, and info messages
Graceful Degradation: Continue analysis when possible, warn about limitations

Return Types

Services: Return Tuple[AnnData, Dict] with processed data and statistics
Agent Tools: Return formatted strings for LLM consumption
Clients: Return structured dictionaries with success, response, and metadata

Data Flow Architecture

User Input (CLI/API)
       ↓
BaseClient Implementation
       ↓
Agent System (LangGraph)
       ↓
Agent Tools (@tool decorated)
       ↓
Stateless Services
       ↓
DataManagerV2 (storage)
       ↓
Backends (H5AD/MuData/Cloud)

Authentication & Configuration

Environment Variables

# Required for LLM access
AWS_BEDROCK_ACCESS_KEY=your-aws-key
AWS_BEDROCK_SECRET_ACCESS_KEY=your-aws-secret

# Optional for enhanced features
NCBI_API_KEY=your-ncbi-key
LOBSTER_CLOUD_KEY=your-cloud-key  # Enables cloud mode

Model Configuration

Per-Agent Settings: Each agent can use different LLM configurations
Fallback Mechanisms: Automatic fallback to alternative models
Thinking Mode: Support for reasoning-capable models (Claude, GPT-4)
Temperature Control: Fine-tuned parameters per agent type

Getting Started

Basic Client Usage

from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2

# Initialize data manager
data_manager = DataManagerV2(workspace_path="./my_workspace")

# Create client
client = AgentClient(data_manager=data_manager)

# Query the system
result = client.query("Load GSE194247 and perform quality assessment")

Cloud Client Usage

# Set environment variable
os.environ['LOBSTER_CLOUD_KEY'] = 'your-key'

# Import cloud client (external package)
from lobster_cloud.client import CloudLobsterClient

client = CloudLobsterClient()
result = client.query("Analyze my single-cell data")

CLI Usage

# Interactive mode with autocomplete
lobster chat

# Direct commands
lobster --help

API Documentation Structure

This API reference is organized as follows:

Core API Reference: DataManagerV2, clients, provenance
Agents API Reference: All agent tools and capabilities
Services API Reference: Analysis services and algorithms
Interfaces API Reference: Abstract base classes and contracts

Each section provides:

Class and method signatures with type hints
Parameter descriptions and expected types
Return value specifications
Usage examples and common patterns
Error conditions and exception handling
Integration notes for cloud/local environments

Version Compatibility

This API documentation reflects the current version of Lobster AI. The system maintains:

Backward Compatibility: Existing agent tools remain functional
Interface Stability: Core interfaces follow semantic versioning
Deprecation Warnings: Clear migration paths for deprecated features
Cloud Synchronization: API compatibility between local and cloud versions