Performance Benchmarking Guide - osama1998H/Moca GitHub Wiki

Benchmarking Guide

Moca has a three-tier benchmarking suite that measures individual functions, composed request pipelines, and underlying infrastructure. CI automatically detects performance regressions on every pull request.


The Three Tiers

Tier 1 — Critical Hot Path

Benchmarks for functions that execute on every single API request. A regression here affects every user of every tenant.

What's Tested Why It Matters
MetaType registry lookup (sync.Map, Redis, PostgreSQL fallback) Called on every document operation. L1 cache hit should be near-instant.
Document CRUD (Get, GetList, Insert) The core read and write operations behind every API endpoint.
SQL query builder Generates parameterized queries for every list view and document fetch.
HTTP middleware chain RequestID, CORS, tenant resolution, auth, rate limiting -- every request traverses this.

Tier 2 — Per-Request Components

Benchmarks for functions called within the hot path but not necessarily on every request.

What's Tested Why It Matters
Field validation Type coercion and validation rules per field. Cost scales with field count.
Document naming Pattern-based naming uses PostgreSQL sequences. Contention risk under concurrency.
Lifecycle event dispatch Switch dispatch across 14 lifecycle events during document writes.
Hook resolution Topological sort of hooks by dependency and priority. Cost grows with installed apps.
Rate limiting Redis sliding-window commands. Network-bound.
Request/response transformation Field filtering and aliasing. Cost scales with field and transformer count.
Database transactions Begin, execute, commit/rollback overhead measurement.

Tier 3 — Infrastructure

Benchmarks that measure the underlying systems in isolation. These establish baselines so that when a Tier 1 or Tier 2 benchmark regresses, you can determine whether the cause is in the application code or the infrastructure layer.

What's Tested Why It Matters
PostgreSQL round-trip (INSERT, SELECT) Raw database latency baseline. Helps isolate DB vs app-layer regressions.
Redis GET/SET (single, pipeline, parallel) Raw cache driver latency. Helps isolate Redis vs business-logic regressions.
Connection pool under load (1-500 goroutines) Detects pool exhaustion and lock contention. Key for tuning max_conns.
DDL generation (10, 50, 100 fields) Runs during migrations. Super-linear scaling indicates algorithmic problems.
Schema compilation (5, 50, 100 fields) Runs during app install and cache rebuild. Regression slows cold starts.
Document insert concurrency (1-500 goroutines) Detects bottlenecks in the full write path (naming, validation, transaction, hooks).

How to Run Benchmarks

Command What It Does Docker Required?
make bench Run all benchmarks that don't need external services (5 iterations) No
make bench-integration Start Docker services, run all benchmarks including DB/Redis (10 iterations) Yes
make bench-compare Run benchmarks and compare against saved baseline using benchstat No
make bench-save-baseline Run benchmarks and save results as the comparison baseline No
make bench-profile Capture CPU and memory profiles for a specific benchmark No

How to Read Results

A typical benchmark output line looks like:

BenchmarkRegistryGet_L1Hit-8    26492102    45.2 ns/op    0 B/op    0 allocs/op
Column Meaning
BenchmarkRegistryGet_L1Hit-8 Benchmark name. -8 means it ran with GOMAXPROCS=8.
26492102 Number of iterations the benchmark ran. More iterations = more statistical confidence.
45.2 ns/op Nanoseconds per operation. The primary performance metric. Lower is better.
0 B/op Bytes allocated per operation. Tracks memory pressure. Lower is better.
0 allocs/op Heap allocations per operation. Each allocation adds GC pressure. Zero is ideal for hot paths.

What "good" looks like

  • Tier 1 (L1 cache hits): < 200 ns/op, 0 allocs/op
  • Tier 1 (database operations): < 5 ms/op
  • Tier 2 (CPU-only): < 50 us/op
  • Tier 3 (infrastructure): Stable across runs. Used as a baseline, not judged in isolation.

Reading benchstat comparison output

When comparing two runs, benchstat shows:

name                old time/op    new time/op    delta
RegistryGet_L1Hit   45.2ns +- 2%   44.8ns +- 1%    ~  (p=0.421 n=10+10)
DocManagerInsert    1.23ms +- 3%   1.58ms +- 2%  +28.46%  (p=0.000 n=10+10)
  • ~ means no statistically significant change (good).
  • +28.46% means a 28% regression (investigate).
  • p=0.000 means the change is statistically significant (not noise).
  • n=10+10 means 10 samples from each run were compared.

CI Regression Detection

Every pull request that changes code in pkg/, internal/, or cmd/ triggers the benchmark workflow:

  1. Base branch benchmarks run in a git worktree at the PR's base commit (10 iterations)
  2. PR branch benchmarks run in a separate worktree at the PR's head commit (10 iterations)
  3. benchstat compares the two runs for statistically significant changes
  4. A structured PR comment is posted with:
    • Summary table grouped by tier (only changed benchmarks shown)
    • Status icons: :red_circle: regression >= 10%, :yellow_circle: regression 5-10%, :green_circle: improvement
    • Performance budget proximity table (how close critical paths are to their hard limits)
    • Full raw benchstat output in a collapsed section
  5. The check fails if any benchmark regresses by 10% or more

Performance Budgets

Four critical benchmarks have hard performance limits enforced as normal tests (run by make test, no Docker needed):

Benchmark Budget What It Guards
RegistryGet_L1Hit 200 ns/op sync.Map cache lookup must stay near-instant
GenerateTableDDL (10 fields) 50 us/op DDL generation must not slow migrations
TransformerChain_Response 20 us/op Response transformation must not add API latency
HookRegistryResolve_10Hooks 5 us/op Hook resolution must stay fast as apps are installed

If any budget is exceeded, make test fails. These catch absolute performance violations regardless of relative change -- even if a regression is small per-PR, repeated small regressions that cross a budget are caught.


Profiling a Regression

When a benchmark regresses, use profiling to find the cause:

# Interactive: prompts for benchmark pattern and package
make bench-profile

# Or run directly:
go test -run=^$ -bench=BenchmarkDocManagerInsert -cpuprofile=cpu.prof -memprofile=mem.prof -benchmem ./pkg/document/...

# View CPU profile in browser
go tool pprof -http=:8080 cpu.prof

# View memory profile in browser
go tool pprof -http=:8080 mem.prof

In the flame graph, look for:

  • Wide bars at the top = functions consuming the most time
  • Unexpected function calls in hot paths (e.g., reflection, JSON marshaling where there shouldn't be any)
  • Allocation-heavy functions in the memory profile = GC pressure sources

For concurrency issues, use the trace tool:

go test -run=^$ -bench=BenchmarkDocManagerInsert_Parallel -trace=trace.out ./pkg/document/...
go tool trace trace.out

Look for goroutine blocking, mutex contention, and scheduler delays in the trace viewer.