verseblob_content_addressing_spec - mark-ik/graphshell GitHub Wiki
Date: 2026-02-28
Status: Proposed (canonical Tier 2 draft)
Scope: Defines the canonical VerseBlob envelope, content addressing rules, transport split, size classes, and retrieval expectations for Tier 2 Verse communities.
Related:
design_docs/verse_docs/technical_architecture/2026-02-23_verse_tier2_architecture.mddesign_docs/verse_docs/implementation_strategy/engram_spec.mddesign_docs/verse_docs/implementation_strategy/flora_submission_checkpoint_spec.mddesign_docs/verse_docs/implementation_strategy/2026-03-28_decentralized_storage_bank_spec.md
Adopted standards (see 2026-03-04_standards_alignment_report.md §§3.11–3.13):
- IPFS CIDv1 — canonical content address format; base32 text encoding; codec from IPFS codec table (dag-cbor for structured data, raw for opaque bytes); BLAKE3 hash function.
-
W3C DID Core 1.0 —
did:keypeer identity URI used in VerseBlobissuerand credential envelopes. -
W3C VC Data Model 2.0 — VerseBlob payloads carrying attested knowledge objects use the VC envelope (
issuer,credentialSubject,proof). The blob's CID is the credential'sidfield.
VerseBlob is the canonical content-addressed transport unit for Tier 2.
It exists to:
- advertise and retrieve immutable content by hash
- separate small pubsub control messages from large binary payloads
- support reuse across FLora, search, storage receipts, applets, and community metadata
-
CID-first Every
VerseBlobshould be identified by a CID-compatible content address. -
CIDv1 base32 default Use CIDv1 with base32 text encoding as the portable canonical representation.
-
Pubsub carries manifests, not bulk bytes GossipSub messages should be compact announcements or manifests only.
-
Immutable payloads Once addressed, a blob is immutable. New state means a new blob.
-
Bounded decoding Receivers must enforce hard limits on decompression, nesting, and attachment expansion before fully decoding.
struct VerseBlob {
cid: Cid, // canonical CIDv1 identifier
schema_version: u32,
kind: VerseBlobKind,
codec: VerseBlobCodec,
author: PeerId,
created_at_ms: u64,
signature: Signature,
body: VerseBlobBody,
}
enum VerseBlobKind {
IntentDelta,
IndexSegment,
EngramEnvelope,
ReceiptBatch,
CommunityManifest,
GovernanceEvent,
AppletPackage,
FeedDelta,
Opaque,
}
enum VerseBlobCodec {
DagCbor,
Raw,
CarV1,
}enum VerseBlobBody {
InlineBytes(Vec<u8>), // only for small control payloads
AttachmentManifest(BlobManifest),// primary mode for larger content
}
struct BlobManifest {
root_ref: BlobRef,
attachments: Vec<BlobRef>,
total_declared_bytes: u64,
}
struct BlobRef {
cid: Cid,
declared_bytes: u64,
media_type: String,
role: BlobRole,
}
enum BlobRole {
Root,
AdapterWeights,
EvalBundle,
DatasetSummary,
PromptBundle,
SignatureBundle,
Ancillary,
}Recommended default:
- multihash:
sha2-256 - CID version:
v1 - textual form: base32 lower-case
This aligns with current IPFS interoperability norms and avoids base58btc-only legacy assumptions.
-
DagCbor: structured envelopes and manifests -
Raw: direct binary attachments -
CarV1: bundled, multi-block transport archives when a submission or checkpoint needs a package of content-addressed blocks
Verse uses CID-compatible addressing, but a VerseBlob is not required to be globally pinned on IPFS.
Practical rule:
- if a node can resolve by local cache, trusted peer provider, or community provider table, that is sufficient
- optional IPFS pinning is allowed for portability and archival
enum BlobSizeClass {
InlineControl, // <= 64 KiB
NormalFetch, // <= 8 MiB
LargeFetch, // <= 256 MiB
ArchiveOnly, // > 256 MiB or policy-restricted
}-
InlineControl: may travel directly over pubsub if schema-validated -
NormalFetch: advertised on pubsub, fetched separately -
LargeFetch: never in pubsub; request-response or provider fetch only -
ArchiveOnly: off the live swarm path by default; require explicit opt-in fetch
This keeps pubsub traffic bounded and reduces amplification and memory-pressure risk.
Recommended retrieval order:
- local content-addressed cache
- explicitly connected peer provider
- known community provider set
- DHT/provider lookup
- optional external IPFS pinning gateway or local IPFS node
- Never auto-fetch large attachments solely because a pubsub announcement was received.
- Validate the manifest first.
- Enforce max bytes and accepted media types before retrieval.
- Fetch on demand or under explicit policy.
Before accepting or forwarding a blob:
- Verify CID matches the declared body bytes.
- Verify signature over
(cid, schema_version, kind, created_at_ms). - Verify schema and kind-specific constraints.
- Verify attachment list is bounded.
- Verify declared byte counts are within policy.
- maximum nesting depth for manifests
- decompression ratio cap
- maximum attachment fan-out
- maximum number of unresolved missing refs before dropping
- duplicate CID suppression
These are straightforward anti-abuse controls used widely in content-addressed and pubsub systems.
- root should be the serialized
TransferProfile - attachments may contain
AdapterWeights, eval bundles, lineage summaries, or receipts
- root should be the queryable segment manifest
- large segment blocks should remain external attachments
- root should be a compact CBOR structure
- attachment use should be rare
- prefer
CarV1packaging with explicit manifest and signature bundle - no implicit execution on fetch
Nodes should track retention independently from addressing.
Suggested retention classes:
ephemeralcachedpinned-localpinned-communityarchival
Addressing says what the content is. Retention says how long the node keeps serving it.
For redundant hosting in the decentralized storage bank, each blob can be
described by a FragmentManifest that lists how its data is split across
providers:
struct FragmentManifest {
blob_cid: Cid, // the original blob CID
coding_scheme: CodingScheme,
fragments: Vec<FragmentEntry>,
k_required: u32, // fragments needed to reconstruct
m_total: u32, // total fragments produced
}
enum CodingScheme {
FullCopy, // v1: each fragment = the full blob
ReedSolomon { data: u32, parity: u32 }, // future: k-of-m erasure coding
}
struct FragmentEntry {
fragment_cid: Cid,
fragment_index: u32,
size_bytes: u64,
}v1: CodingScheme::FullCopy — each fragment is a complete copy of the
blob. k_required = 1, m_total = k_target. This is naive k-replication
using the same CIDv1 addressing. Switching to Reed-Solomon later requires no
structural changes to the manifest or the storage bank protocols.
See 2026-03-28_decentralized_storage_bank_spec.md §7 for the full durability model.
The storage bank uses three signed message types for health reporting and placement coordination. These are transported alongside VerseBlobs but are not VerseBlobs themselves — they are operational messages, not content-addressed immutable data.
struct StorageAnnouncement {
announcement_id: Uuid, // UUID v7
provider: Did,
community_id: CommunityId,
blob_cid: Cid,
fragment_index: u32, // for erasure coding; 0 for full-copy
announced_at_epoch: u64,
signature: Signature,
}
struct StorageHeartbeat {
heartbeat_id: Uuid, // UUID v7
provider: Did,
community_id: CommunityId,
held_blob_cids: Vec<Cid>,
available_bytes: u64,
uptime_epochs: u64,
heartbeat_epoch: u64,
signature: Signature,
}
struct StorageWithdrawal {
withdrawal_id: Uuid, // UUID v7
provider: Did,
community_id: CommunityId,
blob_cid: Cid,
withdrawal_epoch: u64,
signature: Signature,
}These messages are signed by the provider's did:key and validated by
community peers. They drive the storage bank's health view and placement
protocol.
See 2026-03-28_decentralized_storage_bank_spec.md §6–§8 for the full placement, health, and repair model.
- Use CIDv1 base32 with
sha2-256. - Use
DagCborfor envelopes and manifests. - Inline only small control data.
- Use
CarV1for multi-object bundled exports. - Treat pubsub as announcement/manifest distribution, not large payload transport.
- Require strict validation and bounded decoding before forwarding.
These defaults make VerseBlob practical, interoperable, and resilient without forcing a full IPFS dependency model.