Performance Benchmarking Guide - osama1998H/Moca GitHub Wiki
Benchmarking Guide
Moca has a three-tier benchmarking suite that measures individual functions, composed request pipelines, and underlying infrastructure. CI automatically detects performance regressions on every pull request.
The Three Tiers
Tier 1 — Critical Hot Path
Benchmarks for functions that execute on every single API request. A regression here affects every user of every tenant.
| What's Tested | Why It Matters |
|---|---|
| MetaType registry lookup (sync.Map, Redis, PostgreSQL fallback) | Called on every document operation. L1 cache hit should be near-instant. |
| Document CRUD (Get, GetList, Insert) | The core read and write operations behind every API endpoint. |
| SQL query builder | Generates parameterized queries for every list view and document fetch. |
| HTTP middleware chain | RequestID, CORS, tenant resolution, auth, rate limiting -- every request traverses this. |
Tier 2 — Per-Request Components
Benchmarks for functions called within the hot path but not necessarily on every request.
| What's Tested | Why It Matters |
|---|---|
| Field validation | Type coercion and validation rules per field. Cost scales with field count. |
| Document naming | Pattern-based naming uses PostgreSQL sequences. Contention risk under concurrency. |
| Lifecycle event dispatch | Switch dispatch across 14 lifecycle events during document writes. |
| Hook resolution | Topological sort of hooks by dependency and priority. Cost grows with installed apps. |
| Rate limiting | Redis sliding-window commands. Network-bound. |
| Request/response transformation | Field filtering and aliasing. Cost scales with field and transformer count. |
| Database transactions | Begin, execute, commit/rollback overhead measurement. |
Tier 3 — Infrastructure
Benchmarks that measure the underlying systems in isolation. These establish baselines so that when a Tier 1 or Tier 2 benchmark regresses, you can determine whether the cause is in the application code or the infrastructure layer.
| What's Tested | Why It Matters |
|---|---|
| PostgreSQL round-trip (INSERT, SELECT) | Raw database latency baseline. Helps isolate DB vs app-layer regressions. |
| Redis GET/SET (single, pipeline, parallel) | Raw cache driver latency. Helps isolate Redis vs business-logic regressions. |
| Connection pool under load (1-500 goroutines) | Detects pool exhaustion and lock contention. Key for tuning max_conns. |
| DDL generation (10, 50, 100 fields) | Runs during migrations. Super-linear scaling indicates algorithmic problems. |
| Schema compilation (5, 50, 100 fields) | Runs during app install and cache rebuild. Regression slows cold starts. |
| Document insert concurrency (1-500 goroutines) | Detects bottlenecks in the full write path (naming, validation, transaction, hooks). |
How to Run Benchmarks
| Command | What It Does | Docker Required? |
|---|---|---|
make bench |
Run all benchmarks that don't need external services (5 iterations) | No |
make bench-integration |
Start Docker services, run all benchmarks including DB/Redis (10 iterations) | Yes |
make bench-compare |
Run benchmarks and compare against saved baseline using benchstat | No |
make bench-save-baseline |
Run benchmarks and save results as the comparison baseline | No |
make bench-profile |
Capture CPU and memory profiles for a specific benchmark | No |
How to Read Results
A typical benchmark output line looks like:
BenchmarkRegistryGet_L1Hit-8 26492102 45.2 ns/op 0 B/op 0 allocs/op
| Column | Meaning |
|---|---|
BenchmarkRegistryGet_L1Hit-8 |
Benchmark name. -8 means it ran with GOMAXPROCS=8. |
26492102 |
Number of iterations the benchmark ran. More iterations = more statistical confidence. |
45.2 ns/op |
Nanoseconds per operation. The primary performance metric. Lower is better. |
0 B/op |
Bytes allocated per operation. Tracks memory pressure. Lower is better. |
0 allocs/op |
Heap allocations per operation. Each allocation adds GC pressure. Zero is ideal for hot paths. |
What "good" looks like
- Tier 1 (L1 cache hits): < 200 ns/op, 0 allocs/op
- Tier 1 (database operations): < 5 ms/op
- Tier 2 (CPU-only): < 50 us/op
- Tier 3 (infrastructure): Stable across runs. Used as a baseline, not judged in isolation.
Reading benchstat comparison output
When comparing two runs, benchstat shows:
name old time/op new time/op delta
RegistryGet_L1Hit 45.2ns +- 2% 44.8ns +- 1% ~ (p=0.421 n=10+10)
DocManagerInsert 1.23ms +- 3% 1.58ms +- 2% +28.46% (p=0.000 n=10+10)
~means no statistically significant change (good).+28.46%means a 28% regression (investigate).p=0.000means the change is statistically significant (not noise).n=10+10means 10 samples from each run were compared.
CI Regression Detection
Every pull request that changes code in pkg/, internal/, or cmd/ triggers the benchmark workflow:
- Base branch benchmarks run in a git worktree at the PR's base commit (10 iterations)
- PR branch benchmarks run in a separate worktree at the PR's head commit (10 iterations)
- benchstat compares the two runs for statistically significant changes
- A structured PR comment is posted with:
- Summary table grouped by tier (only changed benchmarks shown)
- Status icons: :red_circle: regression >= 10%, :yellow_circle: regression 5-10%, :green_circle: improvement
- Performance budget proximity table (how close critical paths are to their hard limits)
- Full raw benchstat output in a collapsed section
- The check fails if any benchmark regresses by 10% or more
Performance Budgets
Four critical benchmarks have hard performance limits enforced as normal tests (run by make test, no Docker needed):
| Benchmark | Budget | What It Guards |
|---|---|---|
RegistryGet_L1Hit |
200 ns/op | sync.Map cache lookup must stay near-instant |
GenerateTableDDL (10 fields) |
50 us/op | DDL generation must not slow migrations |
TransformerChain_Response |
20 us/op | Response transformation must not add API latency |
HookRegistryResolve_10Hooks |
5 us/op | Hook resolution must stay fast as apps are installed |
If any budget is exceeded, make test fails. These catch absolute performance violations regardless of relative change -- even if a regression is small per-PR, repeated small regressions that cross a budget are caught.
Profiling a Regression
When a benchmark regresses, use profiling to find the cause:
# Interactive: prompts for benchmark pattern and package
make bench-profile
# Or run directly:
go test -run=^$ -bench=BenchmarkDocManagerInsert -cpuprofile=cpu.prof -memprofile=mem.prof -benchmem ./pkg/document/...
# View CPU profile in browser
go tool pprof -http=:8080 cpu.prof
# View memory profile in browser
go tool pprof -http=:8080 mem.prof
In the flame graph, look for:
- Wide bars at the top = functions consuming the most time
- Unexpected function calls in hot paths (e.g., reflection, JSON marshaling where there shouldn't be any)
- Allocation-heavy functions in the memory profile = GC pressure sources
For concurrency issues, use the trace tool:
go test -run=^$ -bench=BenchmarkDocManagerInsert_Parallel -trace=trace.out ./pkg/document/...
go tool trace trace.out
Look for goroutine blocking, mutex contention, and scheduler delays in the trace viewer.