38 workspace content service - the-omics-os/lobster-local GitHub Wiki
Workspace Content Service
Overview
The WorkspaceContentService provides structured, type-safe caching of research content (publications, datasets, metadata) in the DataManagerV2 workspace. Introduced in Lobster v0.2+, it replaces manual JSON file operations with a centralized service using Pydantic schemas for validation and enum-based type safety.
Key Benefits:
- Type Safety: Pydantic models validate all cached content
- Enum-Based Validation: ContentType and RetrievalLevel enums prevent string typos
- Automatic File Management: Professional naming conventions and directory organization
- Level-Based Retrieval: Flexible detail levels (summary/methods/samples/platform/full)
- Workspace Integration: Seamless integration with DataManagerV2 and research_agent tools
Two-Tier Architecture:
research_agent tools (write_to_workspace, get_content_from_workspace)
↓
WorkspaceContentService (validation, file I/O)
↓
DataManagerV2 workspace directory
↓
literature/ data/ metadata/ exports/ (JSON/CSV files)
Architecture
Content Types (Enum)
from lobster.tools.workspace_content_service import ContentType
class ContentType(str, Enum):
PUBLICATION = "publication" # Research papers (PubMed, PMC, bioRxiv)
DATASET = "dataset" # GEO, SRA, PRIDE datasets
METADATA = "metadata" # Sample mappings, validation results, QC reports
EXPORTS = "exports" # Analysis results and data exports
DOWNLOAD_QUEUE = "download_queue" # Download queue entries (JSONL)
PUBLICATION_QUEUE = "publication_queue" # Publication queue entries (JSONL)
Workspace Directory Mapping:
ContentType.PUBLICATION→workspace/literature/*.jsonContentType.DATASET→workspace/data/*.jsonContentType.METADATA→workspace/metadata/*.jsonContentType.EXPORTS→workspace/exports/*.*ContentType.DOWNLOAD_QUEUE→workspace/.lobster/queues/download_queue.jsonlContentType.PUBLICATION_QUEUE→workspace/.lobster/queues/publication_queue.jsonl
Retrieval Levels (Enum)
from lobster.tools.workspace_content_service import RetrievalLevel
class RetrievalLevel(str, Enum):
SUMMARY = "summary" # Key-value overview (title, authors, sample count)
METHODS = "methods" # Methods section (publications only)
SAMPLES = "samples" # Sample IDs and metadata (datasets only)
PLATFORM = "platform" # Platform/technology info (datasets only)
FULL = "full" # All available content
Level-Specific Fields:
| Content Type | Summary | Methods | Samples | Platform | Full |
|---|---|---|---|---|---|
| Publication | identifier, title, authors, journal, year, keywords | identifier, title, methods | N/A | N/A | All fields |
| Dataset | identifier, title, sample_count, organism | N/A | identifier, sample_count, samples | identifier, platform, platform_id | All fields |
| Metadata | identifier, content_type, description, related_datasets | N/A | N/A | N/A | All fields |
Pydantic Content Schemas
PublicationContent
from lobster.tools.workspace_content_service import PublicationContent
pub = PublicationContent(
identifier="PMID:35042229",
title="Single-cell RNA-seq reveals...",
authors=["Smith J", "Jones A"],
journal="Nature",
year=2022,
abstract="We performed single-cell RNA-seq...",
methods="Cells were processed using 10X Chromium...",
full_text="...", # Complete paper text
keywords=["single-cell", "RNA-seq", "cancer"],
source="PMC", # PMC, PubMed, bioRxiv
cached_at="2025-01-12T10:30:00", # ISO 8601 timestamp
url="https://pubmed.ncbi.nlm.nih.gov/35042229/"
)
Fields:
identifier(required): PMID, DOI, or bioRxiv IDtitle,authors,journal,year: Bibliographic metadataabstract,methods,full_text: Content sectionskeywords: Publication keywords (MeSH terms, author keywords)source(required): Provider (PMC, PubMed, bioRxiv, medRxiv)cached_at(required): ISO 8601 timestampurl: Publication URL
DatasetContent
from lobster.tools.workspace_content_service import DatasetContent
dataset = DatasetContent(
identifier="GSE123456",
title="Single-cell RNA-seq of aging brain",
platform="Illumina NovaSeq 6000",
platform_id="GPL24676",
organism="Homo sapiens",
sample_count=12,
samples={
"GSM1": {"age": 25, "tissue": "brain"},
"GSM2": {"age": 65, "tissue": "brain"}
},
experimental_design="Age comparison: young (n=6) vs old (n=6)",
summary="Dataset comparing transcriptional changes...",
pubmed_ids=["35042229"],
source="GEO", # GEO, SRA, PRIDE
cached_at="2025-01-12T10:30:00",
url="https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE123456"
)
Fields:
identifier(required): GSE, SRA, PRIDE accessiontitle,summary: Dataset descriptionsplatform,platform_id: Technology informationorganism: Species (e.g., Homo sapiens, Mus musculus)sample_count(required): Number of samples (≥0)samples: Dictionary mapping sample IDs to metadataexperimental_design: Study design descriptionpubmed_ids: Associated publicationssource(required): Repository (GEO, SRA, PRIDE)cached_at(required): ISO 8601 timestampurl: Dataset URL
MetadataContent
from lobster.tools.workspace_content_service import MetadataContent
metadata = MetadataContent(
identifier="gse12345_to_gse67890_mapping",
content_type="sample_mapping",
description="Sample ID mapping between two datasets",
data={
"exact_matches": 10,
"fuzzy_matches": 5,
"unmapped": 2,
"mapping_rate": 0.88
},
related_datasets=["GSE12345", "GSE67890"],
source="SampleMappingService",
cached_at="2025-01-12T10:30:00"
)
Fields:
identifier(required): Unique metadata identifiercontent_type(required): Type descriptor (sample_mapping, validation, qc_report, etc.)description: Human-readable descriptiondata(required): Arbitrary JSON-serializable contentrelated_datasets: Related dataset accessionssource(required): Tool or service namecached_at(required): ISO 8601 timestamp
Service API
Initialization
from lobster.core.data_manager_v2 import DataManagerV2
from lobster.tools.workspace_content_service import WorkspaceContentService
data_manager = DataManagerV2(workspace_path="~/.lobster_workspace")
workspace_service = WorkspaceContentService(data_manager=data_manager)
Directory Structure Created:
workspace_path/
├── literature/ # Publications (PublicationContent)
├── data/ # Datasets (DatasetContent)
└── metadata/ # Metadata (MetadataContent)
Writing Content
from lobster.tools.workspace_content_service import (
PublicationContent,
ContentType,
WorkspaceContentService
)
from datetime import datetime
# Create content model
pub_content = PublicationContent(
identifier="PMID:35042229",
title="Single-cell analysis of aging",
authors=["Smith J", "Jones A"],
journal="Nature",
year=2022,
abstract="Abstract text...",
methods="Methods text...",
source="PMC",
cached_at=datetime.now().isoformat()
)
# Write to workspace
cache_path = workspace_service.write_content(
content=pub_content,
content_type=ContentType.PUBLICATION
)
# Returns: "/workspace/literature/pmid_35042229.json"
Naming Convention:
- Identifier sanitized: lowercase, special characters → underscores
PMID:35042229→pmid_35042229.jsonGSE123456→gse123456.jsonDOI:10.1038/s41586-021-12345-6→doi_10_1038_s41586_021_12345_6.json
Reading Content
Basic Retrieval
from lobster.tools.workspace_content_service import ContentType, RetrievalLevel
# Read full content
full_content = workspace_service.read_content(
identifier="PMID:35042229",
content_type=ContentType.PUBLICATION,
level=RetrievalLevel.FULL
)
# Returns: Dict with all fields
# Read summary only
summary = workspace_service.read_content(
identifier="PMID:35042229",
content_type=ContentType.PUBLICATION,
level=RetrievalLevel.SUMMARY
)
# Returns: Dict with identifier, title, authors, journal, year, keywords
# Read methods section
methods = workspace_service.read_content(
identifier="PMID:35042229",
content_type=ContentType.PUBLICATION,
level=RetrievalLevel.METHODS
)
# Returns: Dict with identifier, title, methods
Dataset Retrieval Examples
# Get dataset summary
summary = workspace_service.read_content(
identifier="GSE123456",
content_type=ContentType.DATASET,
level=RetrievalLevel.SUMMARY
)
# Returns: identifier, title, sample_count, organism
# Get sample metadata
samples = workspace_service.read_content(
identifier="GSE123456",
content_type=ContentType.DATASET,
level=RetrievalLevel.SAMPLES
)
# Returns: identifier, sample_count, samples, experimental_design
# Get platform information
platform = workspace_service.read_content(
identifier="GSE123456",
content_type=ContentType.DATASET,
level=RetrievalLevel.PLATFORM
)
# Returns: identifier, platform, platform_id, organism
Listing Content
# List all cached content
all_content = workspace_service.list_content()
# Returns: List[Dict] with all publications, datasets, metadata
# List only publications
publications = workspace_service.list_content(
content_type=ContentType.PUBLICATION
)
# Returns: List[Dict] with publication metadata
# List only datasets
datasets = workspace_service.list_content(
content_type=ContentType.DATASET
)
# Returns: List[Dict] with dataset metadata
List Result Format:
[
{
"identifier": "PMID:35042229",
"title": "Single-cell analysis...",
"authors": ["Smith J", "Jones A"],
"cached_at": "2025-01-12T10:30:00",
"_content_type": "publication", # Added by service
"_file_path": "/workspace/literature/pmid_35042229.json" # Added by service
},
# ... more items
]
Deleting Content
# Delete cached publication
deleted = workspace_service.delete_content(
identifier="PMID:35042229",
content_type=ContentType.PUBLICATION
)
# Returns: True if deleted, False if not found
Workspace Statistics
stats = workspace_service.get_workspace_stats()
# Returns:
# {
# "total_items": 42,
# "publications": 15,
# "datasets": 20,
# "metadata": 7,
# "total_size_mb": 12.5,
# "cache_dir": "/workspace/cache/content"
# }
Centralized Exports Directory (v1.0+)
As of version 1.0, all user-facing data exports (CSV, TSV, Excel) are written to a centralized exports directory for easy discovery.
Directory Structure:
workspace_path/
├── literature/ # Publications (PublicationContent)
├── data/ # Datasets (DatasetContent)
├── metadata/ # Metadata (MetadataContent)
└── exports/ # 🆕 User-facing CSV/TSV/Excel exports (v1.0+)
Why Centralized Exports?
- Single Location: Customers know exactly where to find exported files
- Easy Discovery: No hunting across multiple subdirectories
- Clean Organization: Separates cached JSON (metadata/) from final outputs (exports/)
- Predictable: All tools write to same location
Getting Exports Directory:
exports_dir = workspace_service.get_exports_directory(create=True)
# Returns: Path("workspace_path/exports")
Listing Export Files:
# List all exports
files = workspace_service.list_export_files()
# Returns: [
# {
# "name": "aggregated_samples.csv",
# "path": Path("workspace_path/exports/aggregated_samples.csv"),
# "size": 1024567,
# "modified": "2025-01-12T14:30:00",
# "category": "metadata" # metadata, results, plots, custom
# },
# ...
# ]
# Filter by pattern
csv_files = workspace_service.list_export_files(pattern="*.csv")
# Filter by category
metadata_exports = workspace_service.list_export_files(category="metadata")
File Categorization: Files are automatically categorized based on naming conventions:
metadata_*→ "metadata" (sample tables, mappings)results_*→ "results" (analysis outputs)plot_*→ "plots" (visualizations)- Other → "custom"
Usage in Custom Code:
# In execute_custom_code, OUTPUT_DIR variable is pre-configured
df.to_csv(OUTPUT_DIR / "my_results.csv") # Saves to workspace/exports/
Unified Metadata View:
The /metadata CLI command now shows exports alongside other sources:
sources = workspace_service.get_all_metadata_sources()
# Returns: {
# "in_memory": [...], # metadata_store entries
# "workspace_files": [...], # workspace/metadata/*.json
# "exports": [...], # workspace/exports/*.csv
# "deprecated": [...] # workspace/metadata/exports/*.csv (old location)
# }
Deprecation Warning:
The old workspace/metadata/exports/ location is deprecated. A warning is shown if files exist there:
⚠️ Found 3 files in deprecated location: workspace/metadata/exports/
New exports go to workspace/exports/. Consider migrating:
mv workspace/metadata/exports/* workspace/exports/
Integration with research_agent Tools
The research_agent provides two tools that use WorkspaceContentService under the hood:
write_to_workspace Tool
Purpose: Cache research content for persistent access and specialist handoff.
Usage Pattern:
# In research_agent tool
from lobster.tools.workspace_content_service import (
ContentType,
PublicationContent,
WorkspaceContentService
)
@tool
def write_to_workspace(identifier: str, workspace: str, content_type: str = None) -> str:
# 1. Initialize service
workspace_service = WorkspaceContentService(data_manager=data_manager)
# 2. Map workspace categories to ContentType enum
workspace_to_content_type = {
"literature": ContentType.PUBLICATION,
"data": ContentType.DATASET,
"metadata": ContentType.METADATA,
}
# 3. Validate workspace category
if workspace not in workspace_to_content_type:
return f"Error: Invalid workspace '{workspace}'"
# 4. Retrieve content from data_manager
if identifier in data_manager.metadata_store:
content_data = data_manager.metadata_store[identifier]
elif identifier in data_manager.list_modalities():
adata = data_manager.get_modality(identifier)
content_data = {...} # Extract metadata
else:
return f"Error: Identifier '{identifier}' not found"
# 5. Create Pydantic model
content_model = PublicationContent(
identifier=identifier,
# ... populate fields
cached_at=datetime.now().isoformat()
)
# 6. Write using service
cache_path = workspace_service.write_content(
content=content_model,
content_type=workspace_to_content_type[workspace]
)
return f"Cached to {cache_path}"
Naming Conventions:
- Publications:
publication_PMID12345orpublication_DOI... - Datasets:
dataset_GSE12345 - Metadata:
metadata_GSE12345_samples
Example:
# Cache publication after reading
> "I just read PMID:35042229. Please cache it for later."
→ write_to_workspace("publication_PMID35042229", workspace="literature", content_type="publication")
# Cache dataset metadata
> "Cache GSE123456 metadata for validation."
→ write_to_workspace("dataset_GSE123456", workspace="data", content_type="dataset")
get_content_from_workspace Tool
Purpose: Retrieve cached research content with flexible detail levels.
Unified Architecture (v2.6+)
As of version 2.6, get_content_from_workspace uses a unified adapter-based architecture that provides consistent behavior across all workspace types.
Key Improvements:
- Consistent API: All workspaces support the same operations (list, filter, retrieve)
- Unified Formatting: Status emojis, titles, and details formatted consistently
- Type Safety: Internal
WorkspaceItemTypedDict ensures defensive field access - Error Handling: No more KeyError crashes on missing fields
Architecture Diagram:
User Query → Dispatcher → Adapter → WorkspaceItem[] → Formatter → Markdown
↓ ↓ ↓ ↓
5 workspaces Normalize Unified Consistent
data types structure output
Adapters:
_adapt_general_content()- literature, data, metadata workspaces_adapt_download_queue()- download queue entries_adapt_publication_queue()- publication queue entries
WorkspaceItem Structure:
class WorkspaceItem(TypedDict, total=False):
identifier: str # Primary ID
workspace: str # Category
type: str # Item type
status: Optional[str] # For queues
priority: Optional[int] # For queues
title: Optional[str] # Display title
cached_at: Optional[str] # ISO timestamp
details: Optional[str] # Summary/metadata
Benefits:
- Agents can use same mental model for all workspaces
- No workspace-specific error handling needed
- Easy to add new workspace types (one adapter function)
- Backward compatible (same output format)
Usage Pattern (Simplified)
@tool
def get_content_from_workspace(
identifier: str = None,
workspace: str = None,
level: str = "summary"
) -> str:
# 1. Initialize service
workspace_service = WorkspaceContentService(data_manager=data_manager)
# 2. Map strings to enums
workspace_to_content_type = {
"literature": ContentType.PUBLICATION,
"data": ContentType.DATASET,
"metadata": ContentType.METADATA,
}
level_to_retrieval = {
"summary": RetrievalLevel.SUMMARY,
"methods": RetrievalLevel.METHODS,
"samples": RetrievalLevel.SAMPLES,
"platform": RetrievalLevel.PLATFORM,
"metadata": RetrievalLevel.FULL,
}
# 3. List mode (no identifier)
if identifier is None:
content_type_filter = workspace_to_content_type[workspace] if workspace else None
all_cached = workspace_service.list_content(content_type=content_type_filter)
return format_list_response(all_cached)
# 4. Retrieve mode (with identifier)
retrieval_level = level_to_retrieval[level]
# Try each content type if workspace not specified
content_types_to_try = (
[workspace_to_content_type[workspace]] if workspace
else list(ContentType)
)
for content_type in content_types_to_try:
try:
cached_content = workspace_service.read_content(
identifier=identifier,
content_type=content_type,
level=retrieval_level
)
return format_response(cached_content, level)
except FileNotFoundError:
continue
return f"Error: Identifier '{identifier}' not found"
Examples:
# List all cached content
> "What content do I have cached?"
→ get_content_from_workspace()
# List publications only
> "Show me cached publications."
→ get_content_from_workspace(workspace="literature")
# Get publication methods section
> "Show methods from PMID:35042229."
→ get_content_from_workspace(
identifier="publication_PMID35042229",
workspace="literature",
level="methods"
)
# Get dataset samples
> "Show sample IDs for GSE123456."
→ get_content_from_workspace(
identifier="dataset_GSE123456",
workspace="data",
level="samples"
)
# Get full metadata
> "Show full metadata for my sample mapping."
→ get_content_from_workspace(
identifier="metadata_gse12345_to_gse67890_mapping",
workspace="metadata",
level="metadata"
)
Common Workflows
Workflow 1: Cache Publication for Later Analysis
# 1. Search literature
search_literature("BRCA1 breast cancer", max_results=5)
# 2. Read full publication
read_full_publication("PMID:35042229")
# → Content automatically cached in metadata_store
# 3. Cache to workspace
write_to_workspace(
identifier="publication_PMID35042229",
workspace="literature",
content_type="publication"
)
# 4. Later: Retrieve methods section
get_content_from_workspace(
identifier="publication_PMID35042229",
workspace="literature",
level="methods"
)
Workflow 2: Cache Dataset Before Handoff to Specialist
# 1. Discover dataset
find_related_entries("PMID:35042229", entry_type="dataset")
# → Found: GSE123456
# 2. Get dataset metadata
get_dataset_metadata("GSE123456")
# → Metadata stored in metadata_store
# 3. Cache to workspace before handoff
write_to_workspace(
identifier="dataset_GSE123456",
workspace="data",
content_type="dataset"
)
# 4. Hand off to metadata_assistant
handoff_to_metadata_assistant(
instructions="Validate GSE123456 for treatment_response field. "
"Dataset cached in data workspace."
)
Workflow 3: Multiple Detail Levels
# Start with summary
get_content_from_workspace(
identifier="dataset_GSE123456",
workspace="data",
level="summary"
)
# → Returns: title, sample_count, organism
# Need more details? Get samples
get_content_from_workspace(
identifier="dataset_GSE123456",
workspace="data",
level="samples"
)
# → Returns: sample IDs and metadata
# Need platform info?
get_content_from_workspace(
identifier="dataset_GSE123456",
workspace="data",
level="platform"
)
# → Returns: platform, platform_id, organism
# Need everything?
get_content_from_workspace(
identifier="dataset_GSE123456",
workspace="data",
level="metadata"
)
# → Returns: all fields
Best Practices
Naming Conventions
Follow Professional Naming:
- Lowercase identifiers
- Underscores for separators
- Descriptive prefixes
# ✅ Good
"publication_PMID35042229"
"dataset_GSE123456"
"metadata_gse12345_to_gse67890_mapping"
# ❌ Bad
"PMID:35042229" # Contains colon
"GSE 123456" # Contains space
"mapping-12345" # Ambiguous prefix
Content Validation
Always Use Pydantic Models:
# ✅ Good - Validation enforced
pub_content = PublicationContent(
identifier="PMID:35042229",
source="PMC",
cached_at=datetime.now().isoformat()
)
workspace_service.write_content(pub_content, ContentType.PUBLICATION)
# ❌ Bad - No validation
raw_dict = {"identifier": "PMID:35042229"} # Missing required fields
# Will fail validation
Error Handling
Handle FileNotFoundError:
from lobster.tools.workspace_content_service import ContentType, RetrievalLevel
try:
content = workspace_service.read_content(
identifier="publication_PMID12345",
content_type=ContentType.PUBLICATION,
level=RetrievalLevel.SUMMARY
)
except FileNotFoundError as e:
logger.warning(f"Content not found: {e}")
# List available content
available = workspace_service.list_content(ContentType.PUBLICATION)
logger.info(f"Available publications: {[c['identifier'] for c in available]}")
Level Selection
Choose Appropriate Detail Level:
| Use Case | Recommended Level | Why |
|---|---|---|
| Quick overview | SUMMARY |
Fast, minimal data transfer |
| Replication protocol | METHODS |
Focused on procedures |
| Sample alignment | SAMPLES |
Just sample metadata |
| Platform validation | PLATFORM |
Technology compatibility check |
| Full export | FULL |
Complete content for archival |
Workspace Organization
Categorize Content by Type:
# Literature review project
workspace_service.write_content(pub1, ContentType.PUBLICATION) # → literature/
workspace_service.write_content(pub2, ContentType.PUBLICATION) # → literature/
# Dataset analysis project
workspace_service.write_content(dataset1, ContentType.DATASET) # → data/
workspace_service.write_content(dataset2, ContentType.DATASET) # → data/
# Metadata operations
workspace_service.write_content(mapping, ContentType.METADATA) # → metadata/
Backward Compatibility
Maintain Tool Signatures:
- Both tools (
write_to_workspace,get_content_from_workspace) maintain original signatures - String-based parameters at tool level
- Enum conversion happens internally
- Same response formats as before refactoring
Performance Considerations
Caching Strategy
When to Cache:
- ✅ After expensive operations (PDF parsing, full-text extraction)
- ✅ Before handing off to other agents (context preservation)
- ✅ When content will be reused (literature reviews, multi-step workflows)
When NOT to Cache:
- ❌ Temporary scratch data
- ❌ Duplicates of in-memory modalities
- ❌ Large binary files (use modalities storage instead)
File Size Management
Monitor Workspace Size:
stats = workspace_service.get_workspace_stats()
if stats["total_size_mb"] > 100:
logger.warning("Workspace size exceeding 100MB")
# Consider cleaning old cached content
Delete Old Content:
# Remove cached content no longer needed
workspace_service.delete_content(
identifier="old_publication_PMID12345",
content_type=ContentType.PUBLICATION
)
Troubleshooting
Common Issues
Issue: "File has been modified since read"
- Cause: Auto-formatter/linter running between Read and Edit
- Solution: Read larger context window (400+ lines) before editing
Issue: "Invalid workspace 'xyz'"
- Cause: Typo in workspace parameter
- Solution: Use enum mapping:
"literature","data", or"metadata"
Issue: "Invalid detail level 'abc'"
- Cause: Unsupported level string
- Solution: Use valid levels:
"summary","methods","samples","platform","metadata"
Issue: "ValidationError: Field required"
- Cause: Missing required Pydantic fields
- Solution: Check schema requirements (identifier, source, cached_at)
Debugging
Enable Debug Logging:
import logging
logging.basicConfig(level=logging.DEBUG)
# Service operations will log:
# - File paths created
# - Content validated
# - Errors encountered
Inspect Workspace Contents:
# Check cached files
ls -lh ~/.lobster_workspace/literature/
ls -lh ~/.lobster_workspace/data/
ls -lh ~/.lobster_workspace/metadata/
# View JSON content
cat ~/.lobster_workspace/literature/pmid_35042229.json | jq .
Migration from Manual JSON Handling
Before (Manual Implementation)
# Old approach - manual file operations
import json
from pathlib import Path
cache_dir = Path(workspace_path) / "literature"
cache_file = cache_dir / f"{identifier.lower()}.json"
# Write
with open(cache_file, "w") as f:
json.dump({"identifier": identifier, ...}, f)
# Read
with open(cache_file, "r") as f:
content = json.load(f)
# List
cached_files = list(cache_dir.glob("*.json"))
After (WorkspaceContentService)
# New approach - service-based
from lobster.tools.workspace_content_service import (
WorkspaceContentService,
PublicationContent,
ContentType,
RetrievalLevel
)
workspace_service = WorkspaceContentService(data_manager=data_manager)
# Write
pub_content = PublicationContent(identifier=identifier, ...)
workspace_service.write_content(pub_content, ContentType.PUBLICATION)
# Read
content = workspace_service.read_content(
identifier, ContentType.PUBLICATION, RetrievalLevel.SUMMARY
)
# List
cached_list = workspace_service.list_content(ContentType.PUBLICATION)
Benefits:
- ✅ Pydantic validation (catch errors early)
- ✅ Enum type safety (no string typos)
- ✅ Automatic directory management
- ✅ Level-based filtering (no manual if/elif chains)
- ✅ Professional naming (automatic sanitization)
Version History
| Version | Changes |
|---|---|
| v0.2+ | Initial implementation with Pydantic schemas, enum-based validation, two-tier architecture |
Related Documentation
- Data Management (DataManagerV2) - Multi-modal data orchestration
- Services API Reference - Service design patterns
- Creating Services - Service development guidelines
- Agent API Reference - research_agent tool integration
See Also
- WorkspaceContentService Source:
lobster/tools/workspace_content_service.py(714 lines) - Pydantic Schemas: PublicationContent, DatasetContent, MetadataContent
- Integration: research_agent tools (write_to_workspace, get_content_from_workspace)
- Testing:
tests/integration/test_workspace_content_service.py