Performance and Benchmarks - Garume/Manifold GitHub Wiki

Performance and Benchmarks

Manifold is designed from the ground up for zero-allocation hot paths in both CLI command dispatch and MCP tool invocation. This page documents the architectural techniques that enable this — overlay structs, frozen dictionary dispatch, and aggressively-inlined fast paths — along with the benchmark infrastructure used to measure and compare Manifold against competing frameworks. For the broader system architecture, see Architecture Overview; for the result type details, see Result Types and Formatting.

The performance-critical code lives in the Manifold.Cli and Manifold.Mcp packages, with the source generator (Manifold.Generators) emitting specialized dispatch code at compile time. The benchmark suites reside in benchmarks/Manifold.Benchmarks and benchmarks/Manifold.Mcp.Benchmarks.

Zero-Allocation Fast-Path Design

The central performance goal in Manifold is to avoid heap allocations on the hot path for both CLI and MCP invocations. This is achieved through three interlocking mechanisms:

  1. Overlay value structsFastCliInvocationResult and FastMcpInvocationResult use explicit struct layout to carry typed return values without boxing.
  2. Tiered invoker interfaces — synchronous fast paths (IFastSyncCliInvoker, IFastSyncMcpToolInvoker) avoid ValueTask overhead entirely; asynchronous fast paths (IFastCliInvoker, IFastMcpToolInvoker) use ValueTask<T> with completed-task short-circuits.
  3. Frozen dictionary / switch dispatch — CLI command lookup uses FrozenDictionary<string, CliCommandCandidate[]> for O(1) first-token resolution; MCP tool dispatch uses generated switch statements over tool names.
flowchart TD
    A[Incoming Request] --> B{Surface?}
    B -->|CLI| C[FrozenDictionary Lookup]
    B -->|MCP| D[Switch Dispatch]
    C --> E{Sync Fast Path?}
    D --> F{Sync Fast Path?}
    E -->|Yes| G[TryInvokeFastSync]
    E -->|No| H[TryInvokeFast]
    F -->|Yes| I[TryInvokeFastSync]
    F -->|No| J[TryInvokeFast]
    G --> K[FastCliInvocationResult]
    H --> K
    I --> L[FastMcpInvocationResult]
    J --> L
    K --> M[WriteFastResult]
    L --> N[WriteCallToolResponse]
    M --> O[Zero-Alloc Output]
    N --> O
Loading

Sources: Manifold/src/Manifold.Cli/CliApplication.cs:151-195, Manifold/src/Manifold.Cli/IFastCliInvoker.cs:1-19, Manifold/src/Manifold.Mcp/McpInvoker.cs:7-35

FastCliInvocationResult Overlay Struct

FastCliInvocationResult is a readonly struct that carries the result of a CLI operation without heap allocation. It stores a discriminated kind tag and a 16-byte value union.

Structure

public readonly struct FastCliInvocationResult
{
    private readonly FastCliInvocationValue value;  // 16-byte overlay union
    private readonly string? text;                   // reference, only for Text kind

    public FastCliInvocationKind Kind { get; }
}

The internal FastCliInvocationValue uses [StructLayout(LayoutKind.Explicit, Size = 16)] to overlay all supported primitive types at the same memory offset:

[StructLayout(LayoutKind.Explicit, Size = 16)]
internal readonly struct FastCliInvocationValue
{
    [FieldOffset(0)] private readonly bool boolean;
    [FieldOffset(0)] private readonly int number;
    [FieldOffset(0)] private readonly long largeNumber;
    [FieldOffset(0)] private readonly double realNumber;
    [FieldOffset(0)] private readonly decimal preciseNumber;
    [FieldOffset(0)] private readonly Guid identifier;
    [FieldOffset(0)] private readonly DateTimeOffset timestamp;
}

All factory methods are marked with [MethodImpl(MethodImplOptions.AggressiveInlining)] to eliminate call overhead.

Sources: Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:6-103, Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:118-239

Kind Discriminant

Kind Value .NET Type Size
None 0
Text 1 string? Reference
Boolean 2 bool 1 byte
Number 3 int 4 bytes
LargeNumber 4 long 8 bytes
RealNumber 5 double 8 bytes
PreciseNumber 6 decimal 16 bytes
Identifier 7 Guid 16 bytes
Timestamp 8 DateTimeOffset 16 bytes

Sources: Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:105-116

classDiagram
    class FastCliInvocationResult {
        +FastCliInvocationKind Kind
        +string? Text
        +bool Boolean
        +int Number
        +long LargeNumber
        +double RealNumber
        +decimal PreciseNumber
        +Guid Identifier
        +DateTimeOffset Timestamp
        +FromText(string?) FastCliInvocationResult
        +FromBoolean(bool) FastCliInvocationResult
        +FromNumber(int) FastCliInvocationResult
    }
    class FastCliInvocationValue {
        <<StructLayout Explicit 16 bytes>>
        -bool boolean
        -int number
        -long largeNumber
        -double realNumber
        -decimal preciseNumber
        -Guid identifier
        -DateTimeOffset timestamp
    }
    FastCliInvocationResult *-- FastCliInvocationValue : contains
Loading

FastMcpInvocationResult Overlay Struct

FastMcpInvocationResult extends the same overlay pattern for the MCP surface. It adds a Structured kind for returning serializable objects.

public readonly struct FastMcpInvocationResult
{
    private readonly FastMcpInvocationValue value;      // 16-byte overlay union
    private readonly object? reference;                 // for Text and Structured
    private readonly Type? structuredValueType;         // metadata for Structured
    public FastMcpInvocationKind Kind { get; }
}

The key difference from the CLI variant is the Structured kind (value 9), which carries an object? reference and its runtime Type for JSON serialization. All other kinds share the same zero-allocation overlay design.

Sources: Manifold/src/Manifold.Mcp/FastMcpInvocationResult.cs:6-119, Manifold/src/Manifold.Mcp/FastMcpInvocationResult.cs:121-133

Invoker Interface Hierarchy

Both CLI and MCP surfaces define a three-tier invoker hierarchy. The source generator emits a single class that implements all three tiers.

flowchart TD
    A[ICliInvoker] --> B[Standard Path]
    C[IFastCliInvoker] --> D[Async Fast Path]
    E[IFastSyncCliInvoker] --> F[Sync Fast Path]

    G[IMcpToolInvoker] --> H[Standard Path]
    I[IFastMcpToolInvoker] --> J[Async Fast Path]
    K[IFastSyncMcpToolInvoker] --> L[Sync Fast Path]

    B --> M[OperationInvocationResult]
    D --> N["ValueTask&lt;FastCliInvocationResult&gt;"]
    F --> O[FastCliInvocationResult]

    H --> P["ValueTask&lt;OperationInvocationResult&gt;"]
    J --> Q["ValueTask&lt;FastMcpInvocationResult&gt;"]
    L --> R[FastMcpInvocationResult]
Loading

CLI Invoker Interfaces

// Fastest: fully synchronous, no ValueTask overhead
public interface IFastSyncCliInvoker
{
    bool TryInvokeFastSync(
        string[] commandTokens,
        IServiceProvider? services,
        CancellationToken cancellationToken,
        out FastCliInvocationResult invocation);
}

// Fast: async but with completed-task short-circuit
public interface IFastCliInvoker
{
    bool TryInvokeFast(
        string[] commandTokens,
        IServiceProvider? services,
        CancellationToken cancellationToken,
        out ValueTask<FastCliInvocationResult> invocation);
}

Sources: Manifold/src/Manifold.Cli/IFastCliInvoker.cs:1-19

MCP Invoker Interfaces

// Fastest: fully synchronous
public interface IFastSyncMcpToolInvoker
{
    bool TryInvokeFastSync(
        string toolName, JsonElement? arguments,
        IServiceProvider? services, CancellationToken cancellationToken,
        out FastMcpInvocationResult invocation);
}

// Fast: async with ValueTask
public interface IFastMcpToolInvoker
{
    bool TryInvokeFast(
        string toolName, JsonElement? arguments,
        IServiceProvider? services, CancellationToken cancellationToken,
        out ValueTask<FastMcpInvocationResult> invocation);
}

Sources: Manifold/src/Manifold.Mcp/McpInvoker.cs:17-35

Frozen Dictionary Dispatch (CLI)

CliApplication builds a FrozenDictionary<string, CliCommandCandidate[]> at construction time from the registered operation descriptors. This provides O(1) command lookup by the first token of the command path.

private readonly FrozenDictionary<string, CliCommandCandidate[]> commandCandidatesByFirstToken;

The BuildCliState method processes all operations:

  1. Filters out MCP-only operations
  2. Groups command candidates by first token (case-insensitive)
  3. Sorts candidates within each group by path length (longest first, for greedy matching)
  4. Freezes the dictionary with OrdinalIgnoreCase comparer
flowchart TD
    A[OperationDescriptors] --> B[Filter CliVisible]
    B --> C[Group by First Token]
    C --> D[Sort by Path Length]
    D --> E[ToFrozenDictionary]
    E --> F["FrozenDictionary&lt;string, CliCommandCandidate[]&gt;"]
    F --> G[O_1 Lookup at Runtime]
Loading

The fast-path dispatch method (TryExecuteArrayFastPath) is marked [MethodImpl(MethodImplOptions.AggressiveInlining)] and follows a strict priority order:

  1. Try IFastSyncCliInvoker.TryInvokeFastSync() — zero-allocation synchronous path
  2. Try IFastCliInvoker.TryInvokeFast() — check IsCompletedSuccessfully for synchronous completion
  3. Fall back to full async AwaitFastInvocationAsync

Sources: Manifold/src/Manifold.Cli/CliApplication.cs:17-24, Manifold/src/Manifold.Cli/CliApplication.cs:151-195, Manifold/src/Manifold.Cli/CliApplication.cs:197-239

Switch Dispatch (MCP)

For MCP tool invocation, the source generator emits a switch statement over tool names in the GeneratedMcpInvoker class. This avoids dictionary lookups entirely and allows the JIT to optimize the dispatch as a jump table.

The generated invoker implements all three MCP invoker interfaces:

  • TryInvokeFastSync()switch (toolName) dispatching to Invoke{Operation}FastSync()
  • TryInvokeFast()switch (toolName) dispatching to Invoke{Operation}FastAsync()
  • TryInvoke()switch (toolName) dispatching to Invoke{Operation}Async()

For tool discovery, GeneratedMcpCatalog exposes a static McpToolDescriptor[] array with an AsSpan() method for zero-allocation enumeration.

Sources: Manifold/src/Manifold.Generators/OperationDescriptorGenerator.cs:1919-1942

MCP Response Writer Fast Path

McpTextContentResponseWriter constructs MCP tools/call response JSON directly into an IBufferWriter<byte> without using Utf8JsonWriter for the most common result types.

public static void WriteCallToolResponse(
    IBufferWriter<byte> writer, in FastMcpInvocationResult invocation)
{
    switch (invocation.Kind)
    {
        case FastMcpInvocationKind.None:    WriteEmpty(writer);    return;
        case FastMcpInvocationKind.Boolean: WriteBoolean(writer, invocation.Boolean); return;
        case FastMcpInvocationKind.Number:  WriteInt32(writer, invocation.Number);     return;
        case FastMcpInvocationKind.LargeNumber: WriteInt64(writer, invocation.LargeNumber); return;
    }
    WriteSlow(writer, in invocation);
}

The fast-path methods write pre-computed UTF-8 prefix/suffix bytes (ResponsePrefix, ResponseSuffix) directly to the buffer using IBufferWriter<byte>.GetSpan(), and format numeric values in-place with Utf8Formatter.TryFormat. The slow path falls back to Utf8JsonWriter for types like Text, RealNumber, Structured, etc.

flowchart TD
    A[WriteCallToolResponse] --> B{Kind?}
    B -->|None| C[WriteEmpty]
    B -->|Boolean| D[WriteBoolean]
    B -->|Number| E[WriteInt32]
    B -->|LargeNumber| F[WriteInt64]
    B -->|Other| G[WriteSlow]
    C --> H[Direct UTF-8 Bytes]
    D --> H
    E --> I[Utf8Formatter]
    F --> I
    I --> H
    G --> J[Utf8JsonWriter]
Loading

Sources: Manifold/src/Manifold.Mcp/McpTextContentResponseWriter.cs:9-131

Benchmark Infrastructure

Benchmark Projects

Project Location Competing Frameworks
Manifold.Benchmarks benchmarks/Manifold.Benchmarks/ System.CommandLine v2.0.5, ConsoleAppFramework v5.7.13
Manifold.Mcp.Benchmarks benchmarks/Manifold.Mcp.Benchmarks/ ModelContextProtocol, McpToolkit v0.1.3, mcpdotnet v1.1.0.1

Both projects use BenchmarkDotNet with the following configuration attributes:

  • [MemoryDiagnoser] — tracks GC generations and byte allocations
  • [ShortRunJob] — reduced iteration count for faster development cycles
  • [Orderer(SummaryOrderPolicy.FastestToSlowest)] — results sorted by latency

Sources: Manifold/benchmarks/Manifold.Benchmarks/CliBenchmarks.cs:11-13, Manifold/benchmarks/README.md:1-111

CLI Benchmark Classes

flowchart TD
    A[CliBenchmarkBase] --> B[Setup: Initialize 3 Frameworks]
    A --> C[RunManifold]
    A --> D[RunSystemCommandLine]
    A --> E[RunConsoleAppFramework]
    B --> F[CliPositionalBenchmarks]
    B --> G[CliOptionBenchmarks]
    F --> H["math add 4 5"]
    G --> I["weather preview --city Tokyo --days 3"]
Loading

CliBenchmarkBase initializes all three frameworks in [GlobalSetup]:

  • Manifold: CliApplication with GeneratedOperationRegistry and GeneratedCliInvoker
  • System.CommandLine: Manually constructed RootCommand with Command, Argument<T>, and Option<T>
  • ConsoleAppFramework: ConsoleApp.Create() with lambda-based command registration

All three write results to a shared BenchmarkSink static field to ensure the comparison isolates parser and dispatcher overhead from I/O.

CliPositionalBenchmarks benchmarks parsing of ["math", "add", "4", "5"] (positional arguments).

CliOptionBenchmarks benchmarks parsing of ["weather", "preview", "--city", "Tokyo", "--days", "3"] (named options).

Sources: Manifold/benchmarks/Manifold.Benchmarks/CliBenchmarks.cs:14-161

MCP Benchmark Classes

The MCP benchmarks are organized into two tiers:

Microbenchmarks (measure isolated operations):

  • McpDiscoveryBenchmarks — tool catalog access latency
  • McpInvocationBenchmarks — local tool invocation overhead

Roundtrip-shape benchmarks (measure end-to-end response construction):

  • McpListToolsRoundtripBenchmarks — full tools/list JSON response generation
  • McpCallToolRoundtripBenchmarks — full tools/call JSON response generation
flowchart TD
    A[MCP Benchmarks] --> B[Microbenchmarks]
    A --> C[Roundtrip Benchmarks]
    B --> D[McpDiscoveryBenchmarks]
    B --> E[McpInvocationBenchmarks]
    C --> F[McpListToolsRoundtripBenchmarks]
    C --> G[McpCallToolRoundtripBenchmarks]
    D --> H[Catalog Access Latency]
    E --> I[Tool Dispatch Overhead]
    F --> J[JSON Response Generation]
    G --> J
Loading

The roundtrip benchmarks use ArrayBufferWriter<byte> and Utf8JsonWriter to produce JSON responses in memory, isolating server-side computation from transport and I/O.

Sources: Manifold/benchmarks/Manifold.Mcp.Benchmarks/McpBenchmarks.cs:19-273, Manifold/benchmarks/Manifold.Mcp.Benchmarks/McpRoundtripBenchmarks.cs:20-531

Benchmark Results

All results measured on Windows 11, .NET 10.0.1, BenchmarkDotNet ShortRunJob.

CLI Benchmark Results

Scenario Manifold ConsoleAppFramework System.CommandLine
Positional command 22.61 ns / 0 B 26.57 ns / 0 B 1,730.82 ns / 4,688 B
Option-heavy command 28.89 ns / 0 B 24.76 ns / 0 B 2,110.84 ns / 5,632 B

Key observations:

  • Manifold and ConsoleAppFramework achieve zero heap allocations in both scenarios.
  • System.CommandLine allocates 4–5 KB per invocation due to its object-model-based parsing architecture.
  • Manifold is within ~4 ns of ConsoleAppFramework, with both frameworks approximately 75× faster than System.CommandLine.

MCP Microbenchmark Results

Scenario Manifold ModelContextProtocol McpToolkit mcpdotnet
Discovery 0.9560 ns / 0 B 1.0672 ns / 0 B 0.9664 ns / 0 B 0.9727 ns / 0 B
Invocation 36.92 ns / 0 B 0.0261 ns / 0 B* 151.88 ns / 96 B 51.75 ns / 256 B

*The ModelContextProtocol invocation microbenchmark registers near ZeroMeasurement and should be treated as a weak comparison point.

MCP Roundtrip Benchmark Results

Scenario Manifold ModelContextProtocol McpToolkit mcpdotnet
tools/list response 756.3 ns / 0 B 754.6 ns / 0 B 635.4 ns / 0 B 816.2 ns / 0 B
tools/call response 47.16 ns / 0 B 68.56 ns / 0 B 146.34 ns / 96 B 93.20 ns / 256 B

Key observations:

  • For tools/call response construction, Manifold is 31% faster than ModelContextProtocol, 3.1× faster than McpToolkit, and 2× faster than mcpdotnet.
  • Manifold maintains zero allocations across all MCP scenarios.
  • McpToolkit allocates 96 bytes per invocation; mcpdotnet allocates 256 bytes.
  • The roundtrip-shape benchmarks are more representative than microbenchmarks for real-world comparison, as they include JSON response construction.

Sources: Manifold/benchmarks/README.md:22-97

Performance Comparison Summary

flowchart TD
    subgraph CLI["CLI Dispatch Latency"]
        A["Manifold: ~23 ns"]
        B["ConsoleAppFramework: ~27 ns"]
        C["System.CommandLine: ~1,730 ns"]
    end

    subgraph MCP["MCP tools/call Latency"]
        D["Manifold: ~47 ns"]
        E["ModelContextProtocol: ~69 ns"]
        F["mcpdotnet: ~93 ns"]
        G["McpToolkit: ~146 ns"]
    end

    subgraph Alloc["Heap Allocations"]
        H["Manifold: 0 B"]
        I["ConsoleAppFramework: 0 B"]
        J["System.CommandLine: 4,688 B"]
        K["McpToolkit: 96 B"]
        L["mcpdotnet: 256 B"]
    end
Loading

Running Benchmarks

Benchmarks are executed via the build/benchmark.ps1 PowerShell script:

# Run all benchmarks
./build/benchmark.ps1

# Run CLI benchmarks only
./build/benchmark.ps1 -Target cli

# Run MCP benchmarks only
./build/benchmark.ps1 -Target mcp

# Run with a BenchmarkDotNet filter
./build/benchmark.ps1 -Target cli -- --filter *Option*

The script:

  1. Selects the appropriate benchmark project(s) based on the -Target parameter (cli, mcp, or all)
  2. Runs via dotnet run -c Release
  3. Outputs results to .artifacts/benchmark-output/<guid>/<project-name>/
  4. Defaults to --filter * when no BenchmarkDotNet arguments are provided (non-interactive mode)

Sources: Manifold/build/benchmark.ps1:1-60

Techniques Summary

Technique Where Applied Benefit
[StructLayout(LayoutKind.Explicit)] union FastCliInvocationValue, FastMcpInvocationValue 16-byte overlay avoids boxing all primitive types
readonly struct result types FastCliInvocationResult, FastMcpInvocationResult Stack-allocated, no GC pressure
[MethodImpl(AggressiveInlining)] Factory methods, dispatch, response writers Eliminates call-site overhead
FrozenDictionary CliApplication.commandCandidatesByFirstToken O(1) immutable lookup, optimized by runtime
Generated switch dispatch GeneratedMcpInvoker JIT-optimized jump table, no dictionary overhead
Direct IBufferWriter<byte> writes McpTextContentResponseWriter Bypasses Utf8JsonWriter for common types
Utf8Formatter.TryFormat WriteInt32, WriteInt64 in response writer In-place UTF-8 formatting, no string intermediaries
Pre-computed UTF-8 literals ResponsePrefix, ResponseSuffix ("..."u8) Compile-time byte arrays, no encoding at runtime
ValueTask<T> with sync completion check IFastCliInvoker, IFastMcpToolInvoker Avoids async state machine when result is ready

Related Pages

⚠️ **GitHub.com Fallback** ⚠️