Performance and Benchmarks - Garume/Manifold GitHub Wiki
Manifold is designed from the ground up for zero-allocation hot paths in both CLI command dispatch and MCP tool invocation. This page documents the architectural techniques that enable this — overlay structs, frozen dictionary dispatch, and aggressively-inlined fast paths — along with the benchmark infrastructure used to measure and compare Manifold against competing frameworks. For the broader system architecture, see Architecture Overview; for the result type details, see Result Types and Formatting.
The performance-critical code lives in the Manifold.Cli and Manifold.Mcp packages, with the source generator (Manifold.Generators) emitting specialized dispatch code at compile time. The benchmark suites reside in benchmarks/Manifold.Benchmarks and benchmarks/Manifold.Mcp.Benchmarks.
The central performance goal in Manifold is to avoid heap allocations on the hot path for both CLI and MCP invocations. This is achieved through three interlocking mechanisms:
-
Overlay value structs —
FastCliInvocationResultandFastMcpInvocationResultuse explicit struct layout to carry typed return values without boxing. -
Tiered invoker interfaces — synchronous fast paths (
IFastSyncCliInvoker,IFastSyncMcpToolInvoker) avoidValueTaskoverhead entirely; asynchronous fast paths (IFastCliInvoker,IFastMcpToolInvoker) useValueTask<T>with completed-task short-circuits. -
Frozen dictionary / switch dispatch — CLI command lookup uses
FrozenDictionary<string, CliCommandCandidate[]>for O(1) first-token resolution; MCP tool dispatch uses generatedswitchstatements over tool names.
flowchart TD
A[Incoming Request] --> B{Surface?}
B -->|CLI| C[FrozenDictionary Lookup]
B -->|MCP| D[Switch Dispatch]
C --> E{Sync Fast Path?}
D --> F{Sync Fast Path?}
E -->|Yes| G[TryInvokeFastSync]
E -->|No| H[TryInvokeFast]
F -->|Yes| I[TryInvokeFastSync]
F -->|No| J[TryInvokeFast]
G --> K[FastCliInvocationResult]
H --> K
I --> L[FastMcpInvocationResult]
J --> L
K --> M[WriteFastResult]
L --> N[WriteCallToolResponse]
M --> O[Zero-Alloc Output]
N --> O
Sources: Manifold/src/Manifold.Cli/CliApplication.cs:151-195, Manifold/src/Manifold.Cli/IFastCliInvoker.cs:1-19, Manifold/src/Manifold.Mcp/McpInvoker.cs:7-35
FastCliInvocationResult is a readonly struct that carries the result of a CLI operation without heap allocation. It stores a discriminated kind tag and a 16-byte value union.
public readonly struct FastCliInvocationResult
{
private readonly FastCliInvocationValue value; // 16-byte overlay union
private readonly string? text; // reference, only for Text kind
public FastCliInvocationKind Kind { get; }
}The internal FastCliInvocationValue uses [StructLayout(LayoutKind.Explicit, Size = 16)] to overlay all supported primitive types at the same memory offset:
[StructLayout(LayoutKind.Explicit, Size = 16)]
internal readonly struct FastCliInvocationValue
{
[FieldOffset(0)] private readonly bool boolean;
[FieldOffset(0)] private readonly int number;
[FieldOffset(0)] private readonly long largeNumber;
[FieldOffset(0)] private readonly double realNumber;
[FieldOffset(0)] private readonly decimal preciseNumber;
[FieldOffset(0)] private readonly Guid identifier;
[FieldOffset(0)] private readonly DateTimeOffset timestamp;
}All factory methods are marked with [MethodImpl(MethodImplOptions.AggressiveInlining)] to eliminate call overhead.
Sources: Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:6-103, Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:118-239
| Kind | Value | .NET Type | Size |
|---|---|---|---|
None |
0 | — | — |
Text |
1 | string? |
Reference |
Boolean |
2 | bool |
1 byte |
Number |
3 | int |
4 bytes |
LargeNumber |
4 | long |
8 bytes |
RealNumber |
5 | double |
8 bytes |
PreciseNumber |
6 | decimal |
16 bytes |
Identifier |
7 | Guid |
16 bytes |
Timestamp |
8 | DateTimeOffset |
16 bytes |
Sources: Manifold/src/Manifold.Cli/FastCliInvocationResult.cs:105-116
classDiagram
class FastCliInvocationResult {
+FastCliInvocationKind Kind
+string? Text
+bool Boolean
+int Number
+long LargeNumber
+double RealNumber
+decimal PreciseNumber
+Guid Identifier
+DateTimeOffset Timestamp
+FromText(string?) FastCliInvocationResult
+FromBoolean(bool) FastCliInvocationResult
+FromNumber(int) FastCliInvocationResult
}
class FastCliInvocationValue {
<<StructLayout Explicit 16 bytes>>
-bool boolean
-int number
-long largeNumber
-double realNumber
-decimal preciseNumber
-Guid identifier
-DateTimeOffset timestamp
}
FastCliInvocationResult *-- FastCliInvocationValue : contains
FastMcpInvocationResult extends the same overlay pattern for the MCP surface. It adds a Structured kind for returning serializable objects.
public readonly struct FastMcpInvocationResult
{
private readonly FastMcpInvocationValue value; // 16-byte overlay union
private readonly object? reference; // for Text and Structured
private readonly Type? structuredValueType; // metadata for Structured
public FastMcpInvocationKind Kind { get; }
}The key difference from the CLI variant is the Structured kind (value 9), which carries an object? reference and its runtime Type for JSON serialization. All other kinds share the same zero-allocation overlay design.
Sources: Manifold/src/Manifold.Mcp/FastMcpInvocationResult.cs:6-119, Manifold/src/Manifold.Mcp/FastMcpInvocationResult.cs:121-133
Both CLI and MCP surfaces define a three-tier invoker hierarchy. The source generator emits a single class that implements all three tiers.
flowchart TD
A[ICliInvoker] --> B[Standard Path]
C[IFastCliInvoker] --> D[Async Fast Path]
E[IFastSyncCliInvoker] --> F[Sync Fast Path]
G[IMcpToolInvoker] --> H[Standard Path]
I[IFastMcpToolInvoker] --> J[Async Fast Path]
K[IFastSyncMcpToolInvoker] --> L[Sync Fast Path]
B --> M[OperationInvocationResult]
D --> N["ValueTask<FastCliInvocationResult>"]
F --> O[FastCliInvocationResult]
H --> P["ValueTask<OperationInvocationResult>"]
J --> Q["ValueTask<FastMcpInvocationResult>"]
L --> R[FastMcpInvocationResult]
// Fastest: fully synchronous, no ValueTask overhead
public interface IFastSyncCliInvoker
{
bool TryInvokeFastSync(
string[] commandTokens,
IServiceProvider? services,
CancellationToken cancellationToken,
out FastCliInvocationResult invocation);
}
// Fast: async but with completed-task short-circuit
public interface IFastCliInvoker
{
bool TryInvokeFast(
string[] commandTokens,
IServiceProvider? services,
CancellationToken cancellationToken,
out ValueTask<FastCliInvocationResult> invocation);
}Sources: Manifold/src/Manifold.Cli/IFastCliInvoker.cs:1-19
// Fastest: fully synchronous
public interface IFastSyncMcpToolInvoker
{
bool TryInvokeFastSync(
string toolName, JsonElement? arguments,
IServiceProvider? services, CancellationToken cancellationToken,
out FastMcpInvocationResult invocation);
}
// Fast: async with ValueTask
public interface IFastMcpToolInvoker
{
bool TryInvokeFast(
string toolName, JsonElement? arguments,
IServiceProvider? services, CancellationToken cancellationToken,
out ValueTask<FastMcpInvocationResult> invocation);
}Sources: Manifold/src/Manifold.Mcp/McpInvoker.cs:17-35
CliApplication builds a FrozenDictionary<string, CliCommandCandidate[]> at construction time from the registered operation descriptors. This provides O(1) command lookup by the first token of the command path.
private readonly FrozenDictionary<string, CliCommandCandidate[]> commandCandidatesByFirstToken;The BuildCliState method processes all operations:
- Filters out MCP-only operations
- Groups command candidates by first token (case-insensitive)
- Sorts candidates within each group by path length (longest first, for greedy matching)
- Freezes the dictionary with
OrdinalIgnoreCasecomparer
flowchart TD
A[OperationDescriptors] --> B[Filter CliVisible]
B --> C[Group by First Token]
C --> D[Sort by Path Length]
D --> E[ToFrozenDictionary]
E --> F["FrozenDictionary<string, CliCommandCandidate[]>"]
F --> G[O_1 Lookup at Runtime]
The fast-path dispatch method (TryExecuteArrayFastPath) is marked [MethodImpl(MethodImplOptions.AggressiveInlining)] and follows a strict priority order:
- Try
IFastSyncCliInvoker.TryInvokeFastSync()— zero-allocation synchronous path - Try
IFastCliInvoker.TryInvokeFast()— checkIsCompletedSuccessfullyfor synchronous completion - Fall back to full async
AwaitFastInvocationAsync
Sources: Manifold/src/Manifold.Cli/CliApplication.cs:17-24, Manifold/src/Manifold.Cli/CliApplication.cs:151-195, Manifold/src/Manifold.Cli/CliApplication.cs:197-239
For MCP tool invocation, the source generator emits a switch statement over tool names in the GeneratedMcpInvoker class. This avoids dictionary lookups entirely and allows the JIT to optimize the dispatch as a jump table.
The generated invoker implements all three MCP invoker interfaces:
-
TryInvokeFastSync()—switch (toolName)dispatching toInvoke{Operation}FastSync() -
TryInvokeFast()—switch (toolName)dispatching toInvoke{Operation}FastAsync() -
TryInvoke()—switch (toolName)dispatching toInvoke{Operation}Async()
For tool discovery, GeneratedMcpCatalog exposes a static McpToolDescriptor[] array with an AsSpan() method for zero-allocation enumeration.
Sources: Manifold/src/Manifold.Generators/OperationDescriptorGenerator.cs:1919-1942
McpTextContentResponseWriter constructs MCP tools/call response JSON directly into an IBufferWriter<byte> without using Utf8JsonWriter for the most common result types.
public static void WriteCallToolResponse(
IBufferWriter<byte> writer, in FastMcpInvocationResult invocation)
{
switch (invocation.Kind)
{
case FastMcpInvocationKind.None: WriteEmpty(writer); return;
case FastMcpInvocationKind.Boolean: WriteBoolean(writer, invocation.Boolean); return;
case FastMcpInvocationKind.Number: WriteInt32(writer, invocation.Number); return;
case FastMcpInvocationKind.LargeNumber: WriteInt64(writer, invocation.LargeNumber); return;
}
WriteSlow(writer, in invocation);
}The fast-path methods write pre-computed UTF-8 prefix/suffix bytes (ResponsePrefix, ResponseSuffix) directly to the buffer using IBufferWriter<byte>.GetSpan(), and format numeric values in-place with Utf8Formatter.TryFormat. The slow path falls back to Utf8JsonWriter for types like Text, RealNumber, Structured, etc.
flowchart TD
A[WriteCallToolResponse] --> B{Kind?}
B -->|None| C[WriteEmpty]
B -->|Boolean| D[WriteBoolean]
B -->|Number| E[WriteInt32]
B -->|LargeNumber| F[WriteInt64]
B -->|Other| G[WriteSlow]
C --> H[Direct UTF-8 Bytes]
D --> H
E --> I[Utf8Formatter]
F --> I
I --> H
G --> J[Utf8JsonWriter]
Sources: Manifold/src/Manifold.Mcp/McpTextContentResponseWriter.cs:9-131
| Project | Location | Competing Frameworks |
|---|---|---|
Manifold.Benchmarks |
benchmarks/Manifold.Benchmarks/ |
System.CommandLine v2.0.5, ConsoleAppFramework v5.7.13 |
Manifold.Mcp.Benchmarks |
benchmarks/Manifold.Mcp.Benchmarks/ |
ModelContextProtocol, McpToolkit v0.1.3, mcpdotnet v1.1.0.1 |
Both projects use BenchmarkDotNet with the following configuration attributes:
-
[MemoryDiagnoser]— tracks GC generations and byte allocations -
[ShortRunJob]— reduced iteration count for faster development cycles -
[Orderer(SummaryOrderPolicy.FastestToSlowest)]— results sorted by latency
Sources: Manifold/benchmarks/Manifold.Benchmarks/CliBenchmarks.cs:11-13, Manifold/benchmarks/README.md:1-111
flowchart TD
A[CliBenchmarkBase] --> B[Setup: Initialize 3 Frameworks]
A --> C[RunManifold]
A --> D[RunSystemCommandLine]
A --> E[RunConsoleAppFramework]
B --> F[CliPositionalBenchmarks]
B --> G[CliOptionBenchmarks]
F --> H["math add 4 5"]
G --> I["weather preview --city Tokyo --days 3"]
CliBenchmarkBase initializes all three frameworks in [GlobalSetup]:
-
Manifold:
CliApplicationwithGeneratedOperationRegistryandGeneratedCliInvoker -
System.CommandLine: Manually constructed
RootCommandwithCommand,Argument<T>, andOption<T> -
ConsoleAppFramework:
ConsoleApp.Create()with lambda-based command registration
All three write results to a shared BenchmarkSink static field to ensure the comparison isolates parser and dispatcher overhead from I/O.
CliPositionalBenchmarks benchmarks parsing of ["math", "add", "4", "5"] (positional arguments).
CliOptionBenchmarks benchmarks parsing of ["weather", "preview", "--city", "Tokyo", "--days", "3"] (named options).
Sources: Manifold/benchmarks/Manifold.Benchmarks/CliBenchmarks.cs:14-161
The MCP benchmarks are organized into two tiers:
Microbenchmarks (measure isolated operations):
-
McpDiscoveryBenchmarks— tool catalog access latency -
McpInvocationBenchmarks— local tool invocation overhead
Roundtrip-shape benchmarks (measure end-to-end response construction):
-
McpListToolsRoundtripBenchmarks— fulltools/listJSON response generation -
McpCallToolRoundtripBenchmarks— fulltools/callJSON response generation
flowchart TD
A[MCP Benchmarks] --> B[Microbenchmarks]
A --> C[Roundtrip Benchmarks]
B --> D[McpDiscoveryBenchmarks]
B --> E[McpInvocationBenchmarks]
C --> F[McpListToolsRoundtripBenchmarks]
C --> G[McpCallToolRoundtripBenchmarks]
D --> H[Catalog Access Latency]
E --> I[Tool Dispatch Overhead]
F --> J[JSON Response Generation]
G --> J
The roundtrip benchmarks use ArrayBufferWriter<byte> and Utf8JsonWriter to produce JSON responses in memory, isolating server-side computation from transport and I/O.
Sources: Manifold/benchmarks/Manifold.Mcp.Benchmarks/McpBenchmarks.cs:19-273, Manifold/benchmarks/Manifold.Mcp.Benchmarks/McpRoundtripBenchmarks.cs:20-531
All results measured on Windows 11, .NET 10.0.1, BenchmarkDotNet ShortRunJob.
| Scenario | Manifold | ConsoleAppFramework | System.CommandLine |
|---|---|---|---|
| Positional command | 22.61 ns / 0 B | 26.57 ns / 0 B | 1,730.82 ns / 4,688 B |
| Option-heavy command | 28.89 ns / 0 B | 24.76 ns / 0 B | 2,110.84 ns / 5,632 B |
Key observations:
- Manifold and ConsoleAppFramework achieve zero heap allocations in both scenarios.
- System.CommandLine allocates 4–5 KB per invocation due to its object-model-based parsing architecture.
- Manifold is within ~4 ns of ConsoleAppFramework, with both frameworks approximately 75× faster than System.CommandLine.
| Scenario | Manifold | ModelContextProtocol | McpToolkit | mcpdotnet |
|---|---|---|---|---|
| Discovery | 0.9560 ns / 0 B | 1.0672 ns / 0 B | 0.9664 ns / 0 B | 0.9727 ns / 0 B |
| Invocation | 36.92 ns / 0 B | 0.0261 ns / 0 B* | 151.88 ns / 96 B | 51.75 ns / 256 B |
*The ModelContextProtocol invocation microbenchmark registers near ZeroMeasurement and should be treated as a weak comparison point.
| Scenario | Manifold | ModelContextProtocol | McpToolkit | mcpdotnet |
|---|---|---|---|---|
tools/list response |
756.3 ns / 0 B | 754.6 ns / 0 B | 635.4 ns / 0 B | 816.2 ns / 0 B |
tools/call response |
47.16 ns / 0 B | 68.56 ns / 0 B | 146.34 ns / 96 B | 93.20 ns / 256 B |
Key observations:
- For
tools/callresponse construction, Manifold is 31% faster than ModelContextProtocol, 3.1× faster than McpToolkit, and 2× faster than mcpdotnet. - Manifold maintains zero allocations across all MCP scenarios.
- McpToolkit allocates 96 bytes per invocation; mcpdotnet allocates 256 bytes.
- The roundtrip-shape benchmarks are more representative than microbenchmarks for real-world comparison, as they include JSON response construction.
Sources: Manifold/benchmarks/README.md:22-97
flowchart TD
subgraph CLI["CLI Dispatch Latency"]
A["Manifold: ~23 ns"]
B["ConsoleAppFramework: ~27 ns"]
C["System.CommandLine: ~1,730 ns"]
end
subgraph MCP["MCP tools/call Latency"]
D["Manifold: ~47 ns"]
E["ModelContextProtocol: ~69 ns"]
F["mcpdotnet: ~93 ns"]
G["McpToolkit: ~146 ns"]
end
subgraph Alloc["Heap Allocations"]
H["Manifold: 0 B"]
I["ConsoleAppFramework: 0 B"]
J["System.CommandLine: 4,688 B"]
K["McpToolkit: 96 B"]
L["mcpdotnet: 256 B"]
end
Benchmarks are executed via the build/benchmark.ps1 PowerShell script:
# Run all benchmarks
./build/benchmark.ps1
# Run CLI benchmarks only
./build/benchmark.ps1 -Target cli
# Run MCP benchmarks only
./build/benchmark.ps1 -Target mcp
# Run with a BenchmarkDotNet filter
./build/benchmark.ps1 -Target cli -- --filter *Option*The script:
- Selects the appropriate benchmark project(s) based on the
-Targetparameter (cli,mcp, orall) - Runs via
dotnet run -c Release - Outputs results to
.artifacts/benchmark-output/<guid>/<project-name>/ - Defaults to
--filter *when no BenchmarkDotNet arguments are provided (non-interactive mode)
Sources: Manifold/build/benchmark.ps1:1-60
| Technique | Where Applied | Benefit |
|---|---|---|
[StructLayout(LayoutKind.Explicit)] union |
FastCliInvocationValue, FastMcpInvocationValue
|
16-byte overlay avoids boxing all primitive types |
readonly struct result types |
FastCliInvocationResult, FastMcpInvocationResult
|
Stack-allocated, no GC pressure |
[MethodImpl(AggressiveInlining)] |
Factory methods, dispatch, response writers | Eliminates call-site overhead |
FrozenDictionary |
CliApplication.commandCandidatesByFirstToken |
O(1) immutable lookup, optimized by runtime |
Generated switch dispatch |
GeneratedMcpInvoker |
JIT-optimized jump table, no dictionary overhead |
Direct IBufferWriter<byte> writes |
McpTextContentResponseWriter |
Bypasses Utf8JsonWriter for common types |
Utf8Formatter.TryFormat |
WriteInt32, WriteInt64 in response writer |
In-place UTF-8 formatting, no string intermediaries |
| Pre-computed UTF-8 literals |
ResponsePrefix, ResponseSuffix ("..."u8) |
Compile-time byte arrays, no encoding at runtime |
ValueTask<T> with sync completion check |
IFastCliInvoker, IFastMcpToolInvoker
|
Avoids async state machine when result is ready |
- Architecture Overview — System architecture and package relationships
- Core Contracts — Manifold Package — Foundation types used by the fast-path invokers
- CLI Runtime — Manifold.Cli — Full CLI dispatch pipeline including fast-path details
- MCP Runtime — Manifold.Mcp — MCP invoker interfaces and response formatting
-
Source Generator — Manifold.Generators — How
GeneratedCliInvokerandGeneratedMcpInvokerare emitted -
Result Types and Formatting — Detailed documentation of
FastCliInvocationResultandFastMcpInvocationResult -
Build System and Scripts — The
benchmark.ps1script and build infrastructure - Project Structure and Tech Stack — Benchmark project locations and dependencies