12 testing guide - the-omics-os/lobster-local GitHub Wiki
Testing Guide - Lobster AI Testing Framework
๐ฏ Overview
This guide provides comprehensive documentation for the Lobster AI testing framework, targeting 95%+ code coverage with scientifically accurate testing scenarios. The testing infrastructure covers unit tests, integration tests, system tests, and performance benchmarks across all bioinformatics workflows.
๐๏ธ Testing Architecture
Test Categories
1. Unit Tests (tests/unit/)
- Purpose: Test individual functions, classes, and methods in isolation
- Duration: ~2 minutes for full suite
- Coverage: Individual components (services, agents, utilities)
- Execution:
pytest tests/unit/
2. Integration Tests (tests/integration/)
- Purpose: Test component interactions and workflows
- Duration: ~15 minutes for full suite
- Coverage: Agent-service integration, data flow validation
- Execution:
pytest tests/integration/
3. System Tests (tests/system/)
- Purpose: Test complete end-to-end workflows
- Duration: ~30 minutes for full suite
- Coverage: Full analysis pipelines, CLI interactions
- Execution:
pytest tests/system/
4. Performance Tests (tests/performance/)
- Purpose: Benchmark performance and memory usage
- Duration: ~45 minutes for full suite
- Coverage: Large dataset handling, algorithmic efficiency
- Execution:
pytest tests/performance/
Directory Structure
tests/
โโโ ๐ unit/ # Unit tests (20+ files)
โ โโโ core/ # Core system components
โ โ โโโ test_data_manager_v2.py
โ โ โโโ test_client.py
โ โ โโโ test_adapters.py
โ โโโ agents/ # AI agent functionality
โ โ โโโ test_data_expert.py
โ โ โโโ test_transcriptomics_expert.py
โ โโโ services/ # Analysis services (7+ files)
โ โ โโโ test_quality_service.py
โ โ โโโ test_clustering_service.py
โ โ โโโ test_differential_service.py
โ โโโ tools/ # Analysis tools (12+ files)
โโโ ๐ integration/ # Integration tests (5 files)
โ โโโ test_agent_workflows.py
โ โโโ test_data_pipelines.py
โ โโโ test_service_chains.py
โโโ ๐ system/ # System tests (3 files)
โ โโโ test_end_to_end.py
โ โโโ test_cli_commands.py
โ โโโ test_multi_modal.py
โโโ ๐ performance/ # Performance tests (3 files)
โโโ ๐ mock_data/ # Synthetic data generation
โ โโโ generators.py # High-level data generators
โ โโโ factories.py # Data factory classes
โ โโโ base.py # Base configurations
โโโ conftest.py # Global fixtures and configuration
โโโ README.md # Testing documentation
๐ง Test Configuration
Pytest Configuration (pytest.ini)
[tool:pytest]
minversion = 6.0
addopts =
--strict-markers
--disable-warnings
--verbose
--tb=short
--cov=lobster
--cov-report=html:htmlcov
--cov-report=xml:coverage.xml
--cov-report=term-missing
--cov-fail-under=80
--durations=10
testpaths = tests
markers =
unit: mark test as a unit test
integration: mark test as an integration test
system: mark test as a system test
performance: mark test as a performance benchmark
slow: mark test as slow running
requires_gpu: mark test as requiring GPU
requires_network: mark test as requiring network access
filterwarnings =
ignore::DeprecationWarning
ignore::PendingDeprecationWarning
ignore::FutureWarning:scanpy
Global Test Fixtures (conftest.py)
# Core fixtures available to all tests
import pytest
import tempfile
from pathlib import Path
from unittest.mock import Mock
from lobster.core.data_manager_v2 import DataManagerV2
from lobster.core.client import AgentClient
from tests.mock_data.generators import (
generate_synthetic_single_cell,
generate_synthetic_bulk_rnaseq,
generate_synthetic_proteomics
)
@pytest.fixture
def temp_workspace():
"""Create temporary workspace for testing."""
with tempfile.TemporaryDirectory(prefix="lobster_test_") as tmpdir:
workspace_path = Path(tmpdir)
yield workspace_path
@pytest.fixture
def mock_data_manager(temp_workspace):
"""Create DataManagerV2 instance with temporary workspace."""
return DataManagerV2(workspace_path=temp_workspace)
@pytest.fixture
def sample_single_cell_data():
"""Generate synthetic single-cell data for testing."""
return generate_synthetic_single_cell(
n_cells=100,
n_genes=50,
n_cell_types=3
)
@pytest.fixture
def sample_bulk_data():
"""Generate synthetic bulk RNA-seq data."""
return generate_synthetic_bulk_rnaseq(
n_samples=12,
n_genes=100
)
@pytest.fixture
def sample_proteomics_data():
"""Generate synthetic proteomics data."""
return generate_synthetic_proteomics(
n_samples=20,
n_proteins=80,
missing_rate=0.3
)
@pytest.fixture
def mock_agent_client(mock_data_manager):
"""Create mock AgentClient for testing."""
client = Mock(spec=AgentClient)
client.data_manager = mock_data_manager
return client
@pytest.fixture(scope="session")
def test_config():
"""Test configuration parameters."""
return {
'timeout': 300,
'max_memory': '2GB',
'test_data_size': 'medium',
'enable_gpu': False
}
๐งช Writing Unit Tests
Service Unit Test Template
# tests/unit/tools/test_your_service.py
import pytest
import numpy as np
import pandas as pd
from unittest.mock import Mock, patch, MagicMock
from lobster.tools.your_service import YourService, YourServiceError
from tests.mock_data.generators import generate_synthetic_single_cell
class TestYourService:
"""Comprehensive unit tests for YourService."""
@pytest.fixture
def service(self):
"""Create service instance for testing."""
return YourService()
@pytest.fixture
def mock_adata(self):
"""Create mock AnnData for testing."""
return generate_synthetic_single_cell(n_cells=50, n_genes=30)
def test_service_initialization(self, service):
"""Test service initializes correctly."""
assert service is not None
assert service.progress_callback is None
assert service.current_progress == 0
assert hasattr(service, 'total_steps')
def test_progress_callback_setting(self, service):
"""Test progress callback functionality."""
callback_calls = []
def mock_callback(progress, message):
callback_calls.append((progress, message))
service.set_progress_callback(mock_callback)
assert service.progress_callback is not None
# Test progress update
service.total_steps = 2
service._update_progress("Test step")
assert len(callback_calls) == 1
assert callback_calls[0][0] == 50 # 50% progress
assert "Test step" in callback_calls[0][1]
def test_main_analysis_success(self, service, mock_adata):
"""Test successful analysis execution."""
# Test with default parameters
result_adata, statistics = service.main_analysis_method(
mock_adata,
parameter1=1.0,
parameter2="default"
)
# Validate results structure
assert result_adata is not None
assert isinstance(statistics, dict)
# Validate AnnData structure preservation
assert result_adata.n_obs == mock_adata.n_obs
assert result_adata.n_vars == mock_adata.n_vars
# Validate statistics content
required_stats = ['n_observations', 'n_features', 'analysis_timestamp']
for stat in required_stats:
assert stat in statistics
# Validate analysis metadata stored
assert 'your_analysis' in result_adata.uns
assert result_adata.uns['your_analysis']['method'] == 'default'
def test_parameter_validation(self, service, mock_adata):
"""Test comprehensive parameter validation."""
# Test invalid parameter1 (should be positive)
with pytest.raises(ValueError, match="Parameter1 must be positive"):
service.main_analysis_method(mock_adata, parameter1=-1.0)
with pytest.raises(ValueError, match="Parameter1 must be positive"):
service.main_analysis_method(mock_adata, parameter1=0.0)
# Test invalid parameter2 (should be from allowed options)
with pytest.raises(ValueError, match="Invalid parameter2"):
service.main_analysis_method(mock_adata, parameter2="invalid_option")
# Test empty parameter3 list
with pytest.raises(ValueError, match="Parameter3 cannot be empty list"):
service.main_analysis_method(mock_adata, parameter3=[])
def test_empty_data_handling(self, service):
"""Test handling of edge cases in data."""
# Empty observations
empty_obs_adata = generate_synthetic_single_cell(n_cells=0, n_genes=10)
with pytest.raises(ValueError, match="Input data is empty"):
service.main_analysis_method(empty_obs_adata)
# Empty features
empty_vars_adata = generate_synthetic_single_cell(n_cells=10, n_genes=0)
with pytest.raises(ValueError, match="Input data has no features"):
service.main_analysis_method(empty_vars_adata)
def test_statistical_accuracy(self, service, mock_adata):
"""Test statistical calculations are mathematically correct."""
result_adata, statistics = service.main_analysis_method(mock_adata)
# Verify basic statistics match input data
assert statistics['n_observations'] == mock_adata.n_obs
assert statistics['n_features'] == mock_adata.n_vars
# Test statistical calculations if applicable
if 'mean_expression' in statistics:
expected_mean = np.mean(mock_adata.X)
np.testing.assert_almost_equal(
statistics['mean_expression'],
expected_mean,
decimal=5
)
def test_error_propagation(self, service, mock_adata, monkeypatch):
"""Test error handling and propagation."""
# Mock internal method to raise exception
def mock_preprocess_error(*args, **kwargs):
raise RuntimeError("Preprocessing failed")
monkeypatch.setattr(service, '_preprocess_data', mock_preprocess_error)
# Should wrap in service-specific error
with pytest.raises(YourServiceError, match="Unexpected error"):
service.main_analysis_method(mock_adata)
def test_reproducibility(self, service, mock_adata):
"""Test that analyses are reproducible."""
# Run same analysis twice
result1_adata, stats1 = service.main_analysis_method(
mock_adata, parameter1=1.5, parameter2="option1"
)
result2_adata, stats2 = service.main_analysis_method(
mock_adata.copy(), parameter1=1.5, parameter2="option2"
)
# Results should be deterministic for same inputs
if 'option1' == 'option1': # Same parameters
np.testing.assert_array_almost_equal(
result1_adata.X, result1_adata.X # Compare with self for structure
)
@pytest.mark.parametrize("param1,param2,expected_error", [
(-1.0, "default", ValueError),
(1.0, "invalid", ValueError),
(1.0, "default", None), # Should succeed
])
def test_parameter_combinations(self, service, mock_adata, param1, param2, expected_error):
"""Test various parameter combinations."""
if expected_error:
with pytest.raises(expected_error):
service.main_analysis_method(mock_adata, parameter1=param1, parameter2=param2)
else:
result_adata, statistics = service.main_analysis_method(
mock_adata, parameter1=param1, parameter2=param2
)
assert result_adata is not None
assert isinstance(statistics, dict)
def test_memory_efficiency(self, service):
"""Test memory usage doesn't grow excessively."""
import psutil
import os
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss
# Run analysis on moderately sized data
large_adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500)
result_adata, statistics = service.main_analysis_method(large_adata)
final_memory = process.memory_info().rss
memory_growth = final_memory - initial_memory
# Memory growth should be reasonable (less than 500MB for this test)
assert memory_growth < 500 * 1024 * 1024 # 500MB threshold
def test_concurrent_usage(self, service, mock_adata):
"""Test service can be used concurrently (stateless requirement)."""
import threading
results = []
errors = []
def run_analysis(data):
try:
result = service.main_analysis_method(data.copy())
results.append(result)
except Exception as e:
errors.append(e)
# Run multiple analyses concurrently
threads = []
for i in range(3):
thread = threading.Thread(target=run_analysis, args=(mock_adata,))
threads.append(thread)
thread.start()
# Wait for all to complete
for thread in threads:
thread.join()
# Should have no errors and expected number of results
assert len(errors) == 0
assert len(results) == 3
Agent Unit Test Template
# tests/unit/agents/test_your_agent.py
import pytest
from unittest.mock import Mock, patch, MagicMock
from lobster.agents.your_agent import your_agent_factory
from lobster.core.data_manager_v2 import DataManagerV2
class TestYourAgent:
"""Unit tests for YourAgent."""
@pytest.fixture
def mock_data_manager(self):
"""Create mock DataManagerV2."""
mock_dm = Mock(spec=DataManagerV2)
mock_dm.list_modalities.return_value = []
mock_dm.get_modality.return_value = None
mock_dm.log_tool_usage = Mock()
return mock_dm
@pytest.fixture
def agent(self, mock_data_manager):
"""Create agent instance for testing."""
with patch('lobster.agents.your_agent.get_settings') as mock_settings:
mock_settings.return_value.get_agent_llm_params.return_value = {
'model': 'test-model',
'temperature': 0.1
}
return your_agent_factory(mock_data_manager)
def test_agent_creation(self, mock_data_manager):
"""Test agent factory creates agent successfully."""
with patch('lobster.agents.your_agent.get_settings') as mock_settings:
mock_settings.return_value.get_agent_llm_params.return_value = {}
agent = your_agent_factory(mock_data_manager)
assert agent is not None
def test_agent_tools_available(self, agent):
"""Test that agent has expected tools."""
# Agent tools are typically accessible via agent.get_graph().nodes
# Implementation depends on LangGraph structure
# This is a conceptual test - actual implementation may vary
tools = getattr(agent, 'tools', [])
tool_names = [tool.name for tool in tools if hasattr(tool, 'name')]
expected_tools = ['check_available_modalities', 'perform_domain_analysis']
for expected_tool in expected_tools:
# Check if tool exists (test implementation may need adjustment)
pass
@patch('lobster.agents.your_agent.YourService')
def test_tool_service_integration(self, mock_service_class, agent, mock_data_manager):
"""Test that agent tools properly integrate with services."""
# Setup mock service
mock_service = Mock()
mock_service.perform_analysis.return_value = (Mock(), {'metric': 1.0})
mock_service_class.return_value = mock_service
# Setup data manager
mock_adata = Mock()
mock_data_manager.list_modalities.return_value = ['test_data']
mock_data_manager.get_modality.return_value = mock_adata
# This test would need to be implemented based on how tools are exposed
# in the actual LangGraph agent structure
๐ Writing Integration Tests
Agent-Service Integration Template
# tests/integration/test_agent_service_integration.py
import pytest
from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2
from tests.mock_data.generators import generate_synthetic_single_cell
class TestAgentServiceIntegration:
"""Integration tests for agent-service workflows."""
@pytest.fixture
def client_with_data(self, temp_workspace, sample_single_cell_data):
"""Create client with loaded data."""
data_manager = DataManagerV2(workspace_path=temp_workspace)
data_manager.modalities['test_data'] = sample_single_cell_data
client = AgentClient(data_manager=data_manager)
return client
def test_quality_assessment_workflow(self, client_with_data):
"""Test complete quality assessment workflow."""
# Request quality assessment
response = client_with_data.query(
"Assess the quality of test_data modality"
)
# Validate response structure
assert response['success'] is True
assert 'quality_assessed' in response['response']
# Check that new modality was created
modalities = client_with_data.data_manager.list_modalities()
assert any('quality_assessed' in mod for mod in modalities)
def test_clustering_analysis_workflow(self, client_with_data):
"""Test complete clustering workflow."""
# First, assess quality
quality_response = client_with_data.query(
"Assess quality of test_data"
)
assert quality_response['success']
# Then perform clustering
cluster_response = client_with_data.query(
"Perform clustering analysis on the quality-assessed data"
)
assert cluster_response['success']
assert 'clustered' in cluster_response['response']
def test_multi_agent_handoff(self, client_with_data):
"""Test handoffs between different agents."""
# Start with data expert
data_response = client_with_data.query(
"Load and prepare the test data for transcriptomics analysis"
)
assert data_response['success']
# Should handoff to transcriptomics expert
analysis_response = client_with_data.query(
"Now perform clustering and find marker genes"
)
assert analysis_response['success']
def test_error_handling_integration(self, client_with_data):
"""Test error handling across agent-service boundaries."""
# Request analysis on non-existent data
error_response = client_with_data.query(
"Analyze the modality called 'nonexistent_data'"
)
# Should handle error gracefully
assert error_response['success'] is False
assert 'not found' in error_response['response'].lower()
๐ Writing System Tests
End-to-End Test Template
# tests/system/test_end_to_end.py
import pytest
from pathlib import Path
import tempfile
from lobster.core.client import AgentClient
from lobster.core.data_manager_v2 import DataManagerV2
class TestEndToEndWorkflows:
"""System tests for complete analysis workflows."""
@pytest.fixture
def full_client_setup(self):
"""Set up complete client with real configuration."""
with tempfile.TemporaryDirectory() as tmpdir:
workspace = Path(tmpdir)
data_manager = DataManagerV2(workspace_path=workspace)
client = AgentClient(data_manager=data_manager)
yield client
def test_complete_single_cell_pipeline(self, full_client_setup):
"""Test complete single-cell analysis pipeline."""
client = full_client_setup
# Step 1: Load synthetic data
response1 = client.query(
"Generate synthetic single-cell data with 500 cells and 100 genes"
)
assert response1['success']
# Step 2: Quality assessment
response2 = client.query(
"Assess the quality of the generated data"
)
assert response2['success']
# Step 3: Preprocessing
response3 = client.query(
"Filter and normalize the data based on quality metrics"
)
assert response3['success']
# Step 4: Clustering
response4 = client.query(
"Perform clustering analysis and generate UMAP visualization"
)
assert response4['success']
# Step 5: Marker gene analysis
response5 = client.query(
"Find marker genes for each cluster"
)
assert response5['success']
# Validate final state
modalities = client.data_manager.list_modalities()
expected_stages = ['generated', 'quality_assessed', 'normalized', 'clustered']
for stage in expected_stages:
assert any(stage in mod for mod in modalities)
def test_bulk_rnaseq_differential_expression(self, full_client_setup):
"""Test bulk RNA-seq differential expression workflow."""
client = full_client_setup
# Generate bulk RNA-seq data with conditions
response1 = client.query(
"Generate bulk RNA-seq data with treatment and control conditions"
)
assert response1['success']
# Perform differential expression
response2 = client.query(
"Perform differential expression analysis between conditions"
)
assert response2['success']
# Validate results contain expected elements
assert 'differential' in response2['response']
assert any('differential' in mod for mod in client.data_manager.list_modalities())
@pytest.mark.slow
def test_large_dataset_handling(self, full_client_setup):
"""Test system performance with larger datasets."""
client = full_client_setup
# Generate larger dataset
response = client.query(
"Generate single-cell data with 10000 cells and 2000 genes"
)
assert response['success']
# Perform computationally intensive analysis
cluster_response = client.query(
"Perform clustering with high resolution and generate comprehensive visualizations"
)
assert cluster_response['success']
๐ Performance Testing
Performance Test Template
# tests/performance/test_service_performance.py
import pytest
import time
import psutil
import os
from pathlib import Path
from lobster.tools.clustering_service import ClusteringService
from tests.mock_data.generators import generate_synthetic_single_cell
class TestServicePerformance:
"""Performance benchmarks for services."""
@pytest.mark.performance
def test_clustering_performance_small(self, benchmark):
"""Benchmark clustering on small dataset."""
service = ClusteringService()
adata = generate_synthetic_single_cell(n_cells=1000, n_genes=500)
def run_clustering():
return service.cluster_and_visualize(adata)
result = benchmark(run_clustering)
# Validate performance
assert benchmark.stats.stats.mean < 30.0 # Should complete in <30 seconds
@pytest.mark.performance
@pytest.mark.slow
def test_clustering_performance_large(self, benchmark):
"""Benchmark clustering on larger dataset."""
service = ClusteringService()
adata = generate_synthetic_single_cell(n_cells=10000, n_genes=2000)
def run_clustering():
return service.cluster_and_visualize(adata, demo_mode=True)
result = benchmark(run_clustering)
# Should still complete within reasonable time in demo mode
assert benchmark.stats.stats.mean < 120.0 # 2 minutes max
def test_memory_usage_monitoring(self):
"""Monitor memory usage during analysis."""
service = ClusteringService()
process = psutil.Process(os.getpid())
initial_memory = process.memory_info().rss
# Run analysis
adata = generate_synthetic_single_cell(n_cells=5000, n_genes=1000)
result_adata, stats = service.cluster_and_visualize(adata)
peak_memory = process.memory_info().rss
memory_growth = peak_memory - initial_memory
# Memory growth should be reasonable (less than 2GB)
assert memory_growth < 2 * 1024**3
# Clean up and check memory returns close to initial
del adata, result_adata
import gc
gc.collect()
final_memory = process.memory_info().rss
memory_leak = final_memory - initial_memory
# Should not have significant memory leaks (less than 100MB)
assert memory_leak < 100 * 1024**2
๐ ๏ธ Mock Data Generation
Synthetic Data Generators
# tests/mock_data/custom_generators.py
"""Custom generators for specific test scenarios."""
import numpy as np
import pandas as pd
import anndata as ad
from typing import Optional, Dict, Any
def generate_realistic_single_cell(
n_cells: int = 1000,
n_genes: int = 500,
cell_types: Optional[list] = None,
batch_effects: bool = False,
doublet_rate: float = 0.05
) -> ad.AnnData:
"""
Generate realistic single-cell data with biological features.
Args:
n_cells: Number of cells
n_genes: Number of genes
cell_types: List of cell type names
batch_effects: Whether to include batch effects
doublet_rate: Proportion of doublet cells
Returns:
AnnData with realistic single-cell features
"""
if cell_types is None:
cell_types = ['T_cells', 'B_cells', 'NK_cells', 'Monocytes']
# Generate base expression matrix with biological structure
np.random.seed(42)
# Create cell type-specific expression patterns
n_cell_types = len(cell_types)
cells_per_type = n_cells // n_cell_types
X = np.zeros((n_cells, n_genes))
cell_type_labels = []
for i, cell_type in enumerate(cell_types):
start_idx = i * cells_per_type
end_idx = start_idx + cells_per_type if i < n_cell_types - 1 else n_cells
# Base expression for this cell type
base_expression = np.random.negative_binomial(10, 0.3, size=(end_idx - start_idx, n_genes))
# Add cell type-specific marker genes
marker_genes = slice(i * 50, (i + 1) * 50) # 50 markers per type
base_expression[:, marker_genes] *= np.random.uniform(2, 5, size=(end_idx - start_idx, 50))
X[start_idx:end_idx, :] = base_expression
cell_type_labels.extend([cell_type] * (end_idx - start_idx))
# Create AnnData object
adata = ad.AnnData(X=X.astype(np.float32))
# Add cell metadata
adata.obs['cell_type'] = cell_type_labels
adata.obs['n_genes'] = (adata.X > 0).sum(axis=1)
adata.obs['total_counts'] = adata.X.sum(axis=1)
# Add mitochondrial genes
mt_genes = np.random.choice(n_genes, size=int(0.05 * n_genes), replace=False)
adata.var['mt'] = False
adata.var.iloc[mt_genes, adata.var.columns.get_loc('mt')] = True
# Calculate mitochondrial percentage
adata.obs['pct_counts_mt'] = (
adata[:, adata.var['mt']].X.sum(axis=1) / adata.obs['total_counts'] * 100
)
# Add batch effects if requested
if batch_effects:
n_batches = 3
batch_assignments = np.random.choice(n_batches, size=n_cells)
adata.obs['batch'] = [f'batch_{i}' for i in batch_assignments]
# Apply batch-specific scaling
for batch_id in range(n_batches):
batch_mask = batch_assignments == batch_id
batch_effect = np.random.uniform(0.8, 1.2)
adata.X[batch_mask, :] *= batch_effect
# Add doublets if requested
if doublet_rate > 0:
n_doublets = int(n_cells * doublet_rate)
doublet_indices = np.random.choice(n_cells, size=n_doublets, replace=False)
adata.obs['is_doublet'] = False
adata.obs.iloc[doublet_indices, adata.obs.columns.get_loc('is_doublet')] = True
# Doublets have higher total counts
adata.X[doublet_indices, :] *= np.random.uniform(1.5, 2.0, size=(n_doublets, 1))
# Add gene metadata
gene_names = [f'Gene_{i:04d}' for i in range(n_genes)]
adata.var.index = gene_names
adata.var['highly_variable'] = False
# Mark some genes as highly variable
hv_genes = np.random.choice(n_genes, size=int(0.2 * n_genes), replace=False)
adata.var.iloc[hv_genes, adata.var.columns.get_loc('highly_variable')] = True
return adata
def generate_differential_expression_data(
n_samples_per_group: int = 6,
n_genes: int = 1000,
n_de_genes: int = 100,
effect_size: float = 2.0
) -> ad.AnnData:
"""Generate bulk RNA-seq data with known differential expression."""
n_samples = n_samples_per_group * 2
# Generate base counts
X = np.random.negative_binomial(20, 0.3, size=(n_samples, n_genes))
# Add differential expression
de_gene_indices = np.random.choice(n_genes, size=n_de_genes, replace=False)
# Treatment group gets higher expression for DE genes
treatment_samples = slice(n_samples_per_group, n_samples)
X[treatment_samples, :][:, de_gene_indices] *= effect_size
# Create AnnData
adata = ad.AnnData(X=X.astype(np.float32))
# Add sample metadata
conditions = ['control'] * n_samples_per_group + ['treatment'] * n_samples_per_group
adata.obs['condition'] = conditions
adata.obs['sample_id'] = [f'sample_{i:02d}' for i in range(n_samples)]
# Add gene metadata
gene_names = [f'Gene_{i:04d}' for i in range(n_genes)]
adata.var.index = gene_names
adata.var['is_de'] = False
adata.var.iloc[de_gene_indices, adata.var.columns.get_loc('is_de')] = True
return adata
๐ Running Tests
Basic Test Execution
# Run all tests
make test
# Run specific test categories
pytest tests/unit/ # Unit tests only
pytest tests/integration/ # Integration tests only
pytest tests/system/ # System tests only
pytest tests/performance/ # Performance tests only
# Run tests with specific markers
pytest -m "unit and not slow" # Fast unit tests only
pytest -m "integration" # Integration tests
pytest -m "performance" # Performance benchmarks
# Run specific test files
pytest tests/unit/test_clustering_service.py
pytest tests/integration/test_agent_workflows.py
# Run with coverage reporting
pytest --cov=lobster --cov-report=html
# Run with performance benchmarks
pytest --benchmark-only # Only benchmark tests
pytest --benchmark-compare # Compare with previous runs
Advanced Test Options
# Run tests in parallel (requires pytest-xdist)
pytest -n auto # Auto-detect CPU cores
pytest -n 4 # Use 4 processes
# Run with specific output formats
pytest --tb=long # Long traceback format
pytest --tb=short # Short traceback format
pytest -v # Verbose output
pytest -s # Don't capture stdout
# Run tests matching pattern
pytest -k "test_clustering" # Tests with 'clustering' in name
pytest -k "not slow" # Skip slow tests
# Run with timeout (requires pytest-timeout)
pytest --timeout=300 # 5 minute timeout per test
# Generate test reports
pytest --html=reports/report.html # HTML report
pytest --junitxml=reports/junit.xml # JUnit XML report
๐ฏ End-to-End Acceptance Testing System
Lobster AI features a sophisticated 3-tier testing architecture that includes a comprehensive End-to-End acceptance testing system for validating complete user workflows.
๐ Testing Architecture Overview
- Traditional pytest Framework - Unit/Integration/System/Performance tests
- End-to-End Acceptance Testing - Natural language workflow validation
- Hybrid Execution - Combined reporting and analytics
๐ E2E Testing Components
Core Files
tests/test_cases.json- 30+ realistic user scenarios with validation criteriatests/run_integration_tests.py- Advanced test runner with performance monitoringtests/run_tests.sh- User-friendly bash wrapper for easy execution
Key Features
- Natural Language Testing: Validates actual conversational interface users experience
- Performance Monitoring: Real-time CPU, memory, disk I/O tracking during execution
- Scalable Architecture: Tag-based filtering, priorities, parallel execution
- Response Validation: Keyword matching, length checks, error detection
- Workspace Management: Isolated test environments with automatic cleanup
๐ฎ Quick Start Commands
# User-friendly bash wrapper (recommended)
./tests/run_tests.sh # Run all scenarios sequentially
./tests/run_tests.sh --parallel # Run in parallel
./tests/run_tests.sh --parallel -w 8 # 8 parallel workers
# Advanced Python runner with full control
python tests/run_integration_tests.py --categories basic,advanced --parallel
python tests/run_integration_tests.py --performance-monitoring --workers 4
python tests/run_integration_tests.py --run-pytest-integration --output results.json
๐ Test Categorization & Filtering
# Filter by categories
python tests/run_integration_tests.py --categories basic,advanced,performance,error_handling
# Filter by biological domains
python tests/run_integration_tests.py --tags geo,proteomics,multiomics,spatial,qc
# Filter by priority levels (1-5)
python tests/run_integration_tests.py --priorities 1,2,3
# Combine filters for targeted testing
python tests/run_integration_tests.py --categories advanced --tags geo,qc --parallel
๐งช Test Scenarios (30+ Realistic Workflows)
Categories:
basic- Simple workflows (GEO download, basic QC)advanced- Complex analysis (multi-omics, trajectory analysis)performance- Large dataset processingerror_handling- Edge cases and error recovery
Biological Domain Tags:
geo- GEO dataset workflowsqc- Quality control processesvisualization- Plotting and visual analysismultiomics- Cross-platform integrationspatial- Spatial transcriptomicsproteomics- Mass spec and affinity proteomicsclustering- Cell/sample grouping analysis
Example Test Scenarios:
{
"test_geo_download_with_qc_umap": {
"inputs": [
"Download GEO dataset GSE291670 and do the quality control",
"Generate the UMAP with resolution 0.7"
],
"category": "basic",
"description": "Test complete workflow from download to UMAP visualization",
"expected_duration": 120.0,
"timeout": 400.0,
"tags": ["geo", "qc", "umap", "visualization"],
"priority": 3,
"validation_criteria": {
"input_0": {
"required_keywords": ["quality control", "downloaded"],
"no_errors": true
},
"input_1": {
"required_keywords": ["UMAP", "resolution"],
"no_errors": true
}
}
}
}
โก Performance Monitoring
The E2E system includes comprehensive performance monitoring:
# Enable performance monitoring
python tests/run_integration_tests.py --performance-monitoring
# Features monitored:
# - CPU usage percentage (average and peak)
# - Memory consumption (RSS, peak usage)
# - Disk I/O operations (read/write MB)
# - Network activity (sent/received MB)
# - Test execution duration vs expected
# - Resource usage trends across test categories
๐ Hybrid pytest Integration
Combine traditional pytest tests with E2E scenarios for comprehensive validation:
# Run both pytest and E2E tests together
python tests/run_integration_tests.py --run-pytest-integration
# Features:
# - Unified success/failure reporting
# - Combined coverage analytics
# - Category-wise performance breakdowns
# - Comprehensive JSON output with both test types
๐ฏ Advanced E2E Features
Dependency Resolution
Tests can specify dependencies for automatic ordering:
{
"dependencies": ["test_geo_download", "test_basic_qc"],
"priority": 4
}
Retry Logic
Configurable retry attempts for flaky tests:
{
"retry_count": 2,
"timeout": 300.0
}
Response Validation
Sophisticated validation of AI responses:
{
"validation_criteria": {
"input_0": {
"required_keywords": ["downloaded", "GSE109564"],
"forbidden_keywords": ["error", "failed"],
"min_length": 50,
"no_errors": true
}
}
}
๐ Adding New E2E Test Scenarios
Add realistic user scenarios to tests/test_cases.json:
{
"test_my_custom_workflow": {
"inputs": [
"Download GSE123456 and perform quality control",
"Apply batch correction using Harmony",
"Create publication-ready UMAP plot"
],
"category": "advanced",
"description": "Test batch correction workflow",
"tags": ["geo", "batch_correction", "visualization"],
"priority": 3,
"timeout": 600.0,
"expected_duration": 240.0,
"validation_criteria": {
"input_0": {
"required_keywords": ["downloaded", "quality control"],
"no_errors": true
},
"input_1": {
"required_keywords": ["batch correction", "Harmony"],
"no_errors": true
},
"input_2": {
"required_keywords": ["UMAP", "publication"],
"no_errors": true
}
}
}
}
๐ E2E Test Results & Analytics
The E2E system generates comprehensive reports:
{
"summary": {
"test_execution_summary": {
"total_tests": 25,
"passed_tests": 23,
"failed_tests": 2,
"success_rate": 0.92,
"total_duration": 1800.0,
"average_duration": 72.0
},
"category_breakdown": {
"basic": {"passed": 8, "failed": 0, "total": 8},
"advanced": {"passed": 12, "failed": 2, "total": 14},
"performance": {"passed": 3, "failed": 0, "total": 3}
},
"performance_summary": {
"avg_cpu_percent": 15.2,
"avg_memory_mb": 1024.5,
"max_memory_mb": 2048.0
}
}
}
๐ Coverage and Quality Metrics
Coverage Requirements
- Minimum Coverage: 80% (enforced by CI)
- Target Coverage: 95%
- Critical Components: 100% coverage required for core services and agents
Coverage Analysis
# Generate coverage report
pytest --cov=lobster --cov-report=html --cov-report=term
# View coverage in browser
open htmlcov/index.html
# Check coverage for specific modules
pytest --cov=lobster.tools --cov-report=term-missing
# Fail build if coverage below threshold
pytest --cov=lobster --cov-fail-under=80
Quality Metrics
- Test Execution Time: Unit tests <2min, Integration <15min, System <30min
- Memory Usage: No test should use >2GB RAM
- Test Reliability: <1% flaky test rate
- Scientific Accuracy: All biological algorithms must be validated
๐ Debugging Tests
Common Debugging Techniques
# Add debug logging in tests
import logging
logging.basicConfig(level=logging.DEBUG)
# Use pytest debugging
pytest --pdb # Drop to debugger on failures
pytest --pdbcls=IPython.terminal.debugger:TerminalPdb # Use IPython debugger
# Add debug prints (use capsys to capture)
def test_with_debug(capsys):
print("Debug information here")
# ... test code ...
captured = capsys.readouterr()
print(f"Captured output: {captured.out}")
# Temporary test isolation
pytest -x # Stop on first failure
pytest --lf # Run only last failed tests
pytest --ff # Run failures first
Mock Data Debugging
# Inspect generated data
def test_inspect_mock_data():
adata = generate_synthetic_single_cell()
print(f"Shape: {adata.shape}")
print(f"Obs columns: {adata.obs.columns.tolist()}")
print(f"Var columns: {adata.var.columns.tolist()}")
print(f"Uns keys: {list(adata.uns.keys())}")
# Save for manual inspection
adata.write_h5ad('/tmp/debug_data.h5ad')
๐ฏ Best Practices Summary
Test Design Principles
- Test Isolation: Each test should be independent
- Realistic Data: Use biologically plausible synthetic data
- Scientific Validation: Verify biological correctness, not just code correctness
- Performance Awareness: Monitor memory and time usage
- Error Coverage: Test both success and failure paths
Naming Conventions
class TestServiceName:
def test_method_success_case(self): # Happy path
pass
def test_method_edge_case_empty_data(self): # Edge cases
pass
def test_method_error_invalid_params(self): # Error conditions
pass
def test_method_performance_large_data(self): # Performance
pass
Test Organization
- One test class per component being tested
- Group related tests in the same class
- Use descriptive test names that explain the scenario
- Keep tests focused - one concept per test
- Use fixtures for common setup/teardown
This comprehensive testing guide ensures that the Lobster AI platform maintains high quality, reliability, and scientific accuracy across all bioinformatics workflows.