self_hosted_model_spec - mark-ik/graphshell GitHub Wiki
Date: 2026-02-26 Status: Proposed (design-ready) Scope: Defines the self-hosted local model environment for Graphshell/Verse: capability contracts, runtime binding, model and engram classification, mini-adapter and FLora tie-ins, cooperative multi-model execution, and UI-facing behavior contracts. Memory storage and extractor/ingestor plumbing live in the companion intelligence memory architecture doc. Path note: This file retains its existing path for continuity, but it replaces the earlier slot-centric plan. Related:
design_docs/verse_docs/research/2026-02-24_local_intelligence_research.mddesign_docs/verse_docs/implementation_strategy/2026-02-26_intelligence_memory_architecture_stm_ltm_engrams_plan.mddesign_docs/verse_docs/implementation_strategy/engram_spec.mddesign_docs/verse_docs/implementation_strategy/flora_submission_checkpoint_spec.mddesign_docs/TERMINOLOGY.mddesign_docs/graphshell_docs/technical_architecture/2026-02-27_presentation_provider_and_ai_orchestration.md
The earlier framing used fixed "slots" (text, embedding, vision, audio) as the primary abstraction. That was useful as a first pass, but it is too rigid.
Graphshell does not care how many models a user runs, or which brands they use. It cares whether the local runtime can satisfy the app's contractual requirements:
- what capability is needed
- at what latency / quality / safety level
- with what modality support
- using which local or remote artifacts
This spec therefore pivots from fixed slots to capability contracts.
The local self-hosted model environment is where:
- local/private data is processed
- mini-adapter tuning happens
- engrams are assembled and applied
- FLora contributions are prepared
- multiple models can cooperate or divide work by capability
The core metaphor is:
- engrams are the stored food and genetics
- FLora is the garden / community cultivation layer
- models are the fauna that consume different diets and perform useful work
-
Contracts, not named slots The runtime should bind models to capability contracts, not to a hard-coded list of sockets.
-
Capabilities over model identity The app evaluates what a model can do, not what it is called.
-
Self-hosted first The local runtime is the sovereign place where raw data, tuning, and high-risk transforms occur.
-
Engrams are the canonical payload Models consume engrams directly or engram-derived subsets. FLora and local tuning both revolve around engrams.
-
Different models eat different diets Not all models consume the same forms of derived knowledge. Some want adapter weights, some want retrieval memory, some want structured facts, and some can combine them.
-
Cooperation is first-class Multiple models may cooperate on the same UI feature, each handling the modalities or transforms it is best at.
-
UI is contract-gated A feature is available only when its contract is satisfiable by the current local model ecology.
- Capability Contract: A runtime requirement describing what the app needs performed, with quality/latency/resource boundaries.
- Model Provider: A local or remote model endpoint that declares capabilities and resource constraints.
- Model Runtime Contract: The negotiated agreement between a capability request and the provider(s) fulfilling it.
- Model Diet: The classes of engram-derived inputs a model can meaningfully consume.
- Cooperative Pipeline: A multi-provider execution plan where several models divide one user-facing task.
-
Engram: The portable customization payload (
TransferProfile+ optional memories). - Mini-Adapter Pipeline: The local process that converts user-controlled data into adapter-ready artifacts and engrams.
The app should expose requirements as contracts rather than rigid slots.
struct CapabilityContract {
contract_id: String,
required_capabilities: Vec<CapabilityId>,
preferred_capabilities: Vec<CapabilityId>,
accepted_modalities: Vec<ModalityId>,
accepted_engram_classes: Vec<EngramPayloadClass>,
accepted_derivation_types: Vec<EngramDerivationType>,
interface_version: u32,
latency_budget_ms: Option<u32>,
memory_budget_mb: Option<u32>,
quality_floor: Option<QualityTier>,
privacy_requirement: PrivacyRequirement,
}contract:workspace_summary
- capabilities:
text-generation,summarization - accepts: text-derived summaries, retrieval memories, adapter weights
- privacy: local-preferred
contract:semantic_neighbor_search
- capabilities:
embedding,semantic-similarity - accepts: retrieval engrams, vector memories, symbolic side-data
- privacy: local-only or trusted-peer only
contract:image_caption_and_tag
- capabilities:
vision-captioning,vision-labeling - accepts: image-derived adapters, visual embeddings, perceptual fingerprints as support metadata
contract:audio_transcript_index
- capabilities:
speech-to-text, optionalaudio-labeling - accepts: audio adapters, transcript summaries, retrieval memories
A contract is satisfiable if the runtime can produce a valid plan that:
- meets all required capabilities
- respects privacy constraints
- stays within resource budgets
- has conformance above the configured floor
The plan may use one provider or several.
Each model provider declares capabilities, modalities, resource profile, and diet compatibility.
struct ModelProviderDeclaration {
provider_id: String,
model_family: String,
architecture_family: String,
capabilities: Vec<CapabilityDeclaration>,
modalities: Vec<ModalityId>,
diet_profile: ModelDietProfile,
consumption_profile: ModelConsumptionProfile,
execution_mode: ExecutionMode, // local-native | local-onnx | local-service | remote-verse
resource_profile: ResourceProfile,
}One provider may:
- satisfy many contracts
- satisfy only part of one contract
- satisfy different contracts with different modality subsets
This avoids forcing a one-model-per-purpose architecture.
The local runtime needs a practical classification scheme for models.
struct SelfHostedModelClass {
model_family: String,
architecture_family: String,
modalities: Vec<ModalityId>,
preferred_diets: Vec<ModelDietKind>,
accepted_derivations: Vec<EngramDerivationType>,
adaptation_methods_supported: Vec<AdapterMethod>,
}-
Adapter-tunable text model
- accepts
AdapterWeights,SoftPrompt,DerivedSummary - good for summary, extraction, chat, labeling
- accepts
-
Retrieval-first semantic model
- accepts
EmbeddingVector,HashFingerprint,StructuredFact - good for search, clustering, dedup, routing
- accepts
-
Vision-capable model
- accepts image/visual adapters, visual embeddings, perceptual fingerprints as support signals
- good for captioning, labeling, region analysis
-
Audio-capable model
- accepts audio-derived adapters, transcript summaries, retrieval context
- good for STT and audio indexing
-
Multi-diet multimodal model
- accepts several derivation types
- can combine retrieval + symbolic + adapter conditioning
Hashes, perceptual hashes, and locality-sensitive hashes are useful for:
- deduplication
- provenance
- similarity lookup
- routing and retrieval
They are not direct substitutes for adapter weights.
Engrams are the canonical fuel and transfer object for the local model environment.
Models may consume:
- the full engram
- only the runtime-loadable subset
- derived memories extracted from the engram
-
Adapter: direct behavioral tuning -
Retrieval: external memory for search or augmentation -
Symbolic: structured fact and schema support -
Evaluation: conformance and trust gating -
Provenance: trust, legal-risk, and policy context -
Hybrid: combined packages
Minimal application often requires only:
AdapterWeightsAdapterManifest- compatibility and conformance checks
Richer application may also use:
- retrieval memories
- symbolic facts
- eval/profile guidance
- provenance and attestation for trust-sensitive features
The self-hosted model environment owns the local mini-adapter path.
- Gather local user-controlled material:
- notes
- graph-linked content
- imported corpora
- extracted metadata
- local clips or analysis artifacts
- Convert raw/private material into safer derived forms.
- Classify the material with UDC and other required metadata.
- Produce adapter deltas, retrieval memories, symbolic records, summaries, and evals as appropriate.
- Package the result as an engram.
- Keep it local, merge it into local history, or export a redacted version to FLora.
FLora is not separate from this; it is the community layer built on top of the same local conversion pipeline.
Local node:
- grows and tests engrams privately
- exports selected engrams to verses
- imports community checkpoints back into local use
That is why model capability, engram classification, and FLora policy must line up.
A user-facing task may be fulfilled by one provider or several cooperating providers.
struct CooperativeRuntimePlan {
plan_id: String,
contract_id: String,
stages: Vec<ExecutionStage>,
final_outputs: Vec<OutputKind>,
}
struct ExecutionStage {
provider_id: String,
capability_subset: Vec<CapabilityId>,
input_refs: Vec<String>,
output_refs: Vec<String>,
}Audio + text split
- audio-capable provider performs STT
- text model summarizes and extracts tasks from the transcript
Vision + text split
- vision model captions/images/regions
- text model converts those captions into semantic tags, edge labels, or summaries
Retrieval + text split
- embedding model finds relevant context
- text model answers or summarizes using retrieved context
Multimodal single-provider
- one provider handles image, audio, and text if it can satisfy the contract efficiently
Prefer:
- the fewest providers needed to satisfy the contract
- unless specialization materially improves quality, safety, or latency
This keeps the runtime pragmatic instead of over-orchestrated.
The self-hosted model environment exists to enable UI behaviors, not model inventory for its own sake.
- workspace chat / Q&A
- page and node summaries
- semantic search and related-node discovery
- edge-label suggestions
- extraction into structured fields
- image captioning and media tagging
- audio transcription and transcript search
- moderation or trust hints for imported/community engrams
- capability-aware command suggestions
A UI feature should declare:
- required contract
- acceptable degradation
- privacy requirement
- whether local-only is mandatory
Example:
- transcript search may degrade gracefully if only transcript text exists
- invoice extraction should not degrade into unreliable freeform guessing
Each contract should resolve to one of:
fullpartialnone
But degradation must also consider:
- trust of the active engrams
- legal-risk class
- privacy constraints
- whether the current provider is local or remote
A technically capable provider may still be rejected for a contract if:
- its privacy boundary is too weak
- its engram inputs are too low-trust
- its diet does not match the available engram forms
The system should not require a single monolithic "best" model.
A practical default ecology is:
- one local text-capable provider
- one local retrieval/embedding provider
- one local or optional vision-capable provider
- one local or optional audio-capable provider
But this is not a hard contract. A single multimodal provider may collapse these roles, or specialized providers may divide them more finely.
The app only cares whether the active contracts are satisfiable.
This spec owns:
- capability contracts
- provider declarations
- runtime binding and cooperation
- model classification and diets
- engram consumption semantics for local models
- mini-adapter and FLora-facing model behavior
- UI-facing feature contracts
This spec does not own:
- long-term memory storage layout
- extractor/ingestor persistence plumbing
- Verse economic policy
- community governance
Those remain in the companion Verse docs.
- Replace fixed slot assumptions with contract-based routing in future model-facing APIs.
- Require providers to declare accepted derivation types and preferred diets.
- Keep raw/private data processing local; export only redacted or derived engrams by default.
- Let one contract be satisfied by multiple cooperating providers.
- Gate UI affordances by contract satisfaction, conformance, trust, and privacy, not by model presence alone.
- Treat FLora as an extension of the same local engram pipeline, not a separate model world.
This keeps the self-hosted model layer flexible, privacy-respecting, and aligned with the broader Verse architecture.