Performance Benchmark Report Generator

A Python script that generates PDF reports comparing Unity Performance Test Framework results across different test scenarios.

Location

unity-explorer/
├── scripts/
│   ├── generate_perf_report.py      # Main script
│   ├── perf_report_config.json      # Configuration file
│   ├── requirements-perf-report.txt # Python dependencies
│   └── PERF_REPORT.md               # This documentation
└── Explorer/
    └── PerformanceTestResults.json  # Test output (generated)

Requirements

Windows

py -m pip install -r scripts\requirements-perf-report.txt

Or with Python launcher:

python -m pip install -r scripts\requirements-perf-report.txt

macOS / Linux

pip3 install -r scripts/requirements-perf-report.txt

Or with virtual environment (recommended):

python3 -m venv venv
source venv/bin/activate
pip install -r scripts/requirements-perf-report.txt

CI (GitHub Actions)

- name: Set up Python
  uses: actions/setup-python@v5
  with:
    python-version: '3.11'

- name: Install dependencies
  run: pip install -r scripts/requirements-perf-report.txt

Usage

Standard Report

Generate a full PDF report with summary page and detailed charts:

python generate_perf_report.py <input.json> [output.pdf]

Example:

python generate_perf_report.py PerformanceTestResults.json report.pdf

Summary Only

Generate only the summary page:

python generate_perf_report.py <input.json> <output.pdf> --summary-only

GitHub Actions Summary

Output markdown summary for $GITHUB_STEP_SUMMARY:

python generate_perf_report.py <input.json> <output.pdf> --github-summary summary.md

In CI workflow:

- name: Generate performance report
  run: |
    python scripts/generate_perf_report.py results.json report.pdf --github-summary summary.md
    cat summary.md >> $GITHUB_STEP_SUMMARY

Compare Two Reports

Compare results from two test runs side-by-side:

python generate_perf_report.py --compare <report1.json> <report2.json> <output.pdf> --fixture <TestFixtureName>

Options:

--fixture <name> - (Required) Filter to a specific test fixture class
--label1 <name> - Label for first report (default: "Report 1")
--label2 <name> - Label for second report (default: "Report 2")

Example:

python generate_perf_report.py --compare baseline.json new.json comparison.pdf \
  --fixture ProfilesPerformanceTest \
  --label1 "Before" \
  --label2 "After"

Configuration

The script uses perf_report_config.json in the same directory.

Configuration Structure

{
  "difference_thresholds": [...],
  "metrics": {...},
  "default_summary_metrics": [...],
  "parallelism": {...},
  "summary_cases": [...]
}

Difference Thresholds

Define how percentage differences are categorized and colored:

"difference_thresholds": [
  {
    "min": null,
    "max": -30,
    "label": "Major Improvement",
    "color": "#006400"
  },
  {
    "min": -10,
    "max": 10,
    "label": "Within Margin of Error",
    "color": "#808080"
  },
  {
    "min": 30,
    "max": null,
    "label": "Major Regression",
    "color": "#8B0000"
  }
]

min/max: Percentage bounds (use null for unbounded)
label: Human-readable category name
color: Hex color for PDF output

Metrics

Define human-readable names for sample groups:

"metrics": {
  "WebRequest.Send": {
    "name": "Web Request Send Time",
    "description": "time to send the web request"
  },
  "Iteration Total Time": {
    "name": "Total Iteration Time",
    "description": "total time for the complete operation"
  }
}

Default Summary Metrics

Specify which metrics appear in summary by default:

"default_summary_metrics": [
  "WebRequest.Send",
  "WebRequest.ProcessData",
  "Iteration Total Time"
]

Parallelism Categories

Group results by concurrency level:

"parallelism": {
  "no_concurrency": {
    "min": 1,
    "max": 1,
    "label": "No Concurrency"
  },
  "low_concurrency": {
    "min": 2,
    "max": 10,
    "label": "Low Concurrency"
  },
  "high_concurrency": {
    "min": 11,
    "max": null,
    "label": "High Concurrency"
  }
}

The script extracts concurrency from the first test case argument.

Summary Cases

Configure which tests appear on the summary page and how they're compared.

Mode 1: Endpoint Comparison (within same test)

Compare one endpoint against others in the same test:

{
  "test": "ProfilesPerformanceTest.PostProfilesAsync",
  "endpoint": "asset-bundle-registry"
}

Output shows percentage difference of asset-bundle-registry vs other endpoints.

Mode 2: Cross-Test Comparison

Compare an endpoint from one test against endpoints from another test:

{
  "test": "ProfilesPerformanceTest.PostMetadataAsync",
  "endpoint": "asset-bundle-registry",
  "compare_test": "ProfilesPerformanceTest.PostProfilesAsync"
}

Output shows how PostMetadataAsync.asset-bundle-registry performs vs all PostProfilesAsync endpoints.

Mode 3: List Only (no comparison)

List raw metric values without comparison (when endpoint is omitted):

{
  "test": "RPCFriendsServiceBenchmark.GetFriendsAsync"
}

Output shows Median and P95 values for each scenario grouped by parallelism.

Per-Case Metrics

Override default metrics for specific cases:

{
  "test": "ProfilesPerformanceTest.PostMetadataAsync",
  "endpoint": "asset-bundle-registry",
  "compare_test": "ProfilesPerformanceTest.PostProfilesAsync",
  "metrics": [
    "WebRequest.Send",
    "WebRequest.ProcessData",
    "Iteration Total Time",
    "Iteration Downloaded Data"
  ]
}

Input Format

The script expects Unity Performance Test Framework JSON output:

{
  "Results": [
    {
      "Name": "Namespace.TestClass(fixtureArg1,fixtureArg2).MethodName(testArg1,testArg2)",
      "SampleGroups": [
        {
          "Name": "WebRequest.Send",
          "Unit": 1,
          "Median": 12345.67,
          "Min": 1000.0,
          "Max": 50000.0,
          "StandardDeviation": 5000.0,
          "Samples": [1000, 2000, 3000, ...]
        }
      ]
    }
  ]
}

Unit Values

Value	Unit
0	Undefined
1	Microsecond
2	Millisecond
3	Second
4	Byte
5	Kilobyte
6	Megabyte
7	Gigabyte
8	Nanosecond

Baseline Detection

The script determines baseline from TestFixture arguments:

Extracts the second parameter from fixture args
If True, that scenario is the baseline
Example: TestClass("https://endpoint.com",True) marks this as baseline

Scenario Labels

Labels are extracted from TestFixture URL arguments:

https://peer-ap1.decentraland.org/... -> peer-ap1
https://asset-bundle-registry.decentraland.today/... -> asset-bundle-registry
https://gateway.decentraland.zone/... -> gateway.zone (includes TLD for distinction)

Output

PDF Report

Summary Page: Overview of configured test cases with percentage comparisons or raw metrics
Detail Pages: Per-method charts showing Median, P95, P99 with error bars
Baseline marked with "baseline" label
Percentage annotations colored by threshold category

Example PDF

GitHub Summary (Markdown)

The GitHub summary uses emoji indicators:

🟢 Green: Improvement
🔴 Red: Regression
⚪ White: Within margin of error

Output Examples

Endpoint Comparison Mode

Config:

{
  "test": "ProfilesPerformanceTest.PostProfilesAsync",
  "endpoint": "asset-bundle-registry"
}

Output:

### ProfilesPerformanceTest.PostProfilesAsync
*Endpoint: `asset-bundle-registry` (performance vs other endpoints)*
**No Concurrency:**
- 🟢 **Web Request Send Time**: -65.1% (Major Improvement)
- 🟢 **Data Processing Time**: -15.7% (Slight Improvement)
- 🟢 **Total Iteration Time**: -66.7% (Major Improvement)

**Low Concurrency:**
- 🟢 **Web Request Send Time**: -52.6% (Major Improvement)
- 🔴 **Data Processing Time**: +144.7% (Major Regression)
- 🟢 **Total Iteration Time**: -53.3% (Major Improvement)

**High Concurrency:**
- 🟢 **Web Request Send Time**: -66.9% (Major Improvement)
- 🔴 **Data Processing Time**: +307.3% (Major Regression)
- 🟢 **Total Iteration Time**: -56.0% (Major Improvement)

Cross-Test Comparison Mode

Config:

{
  "test": "ProfilesPerformanceTest.PostMetadataAsync",
  "endpoint": "asset-bundle-registry",
  "compare_test": "ProfilesPerformanceTest.PostProfilesAsync",
  "metrics": [
    "WebRequest.Send",
    "WebRequest.ProcessData",
    "Iteration Total Time",
    "Iteration Downloaded Data"
  ]
}

Output:

### ProfilesPerformanceTest.PostMetadataAsync
*Endpoint: `asset-bundle-registry` vs `ProfilesPerformanceTest.PostProfilesAsync` endpoints*
**No Concurrency:**
- 🟢 **Web Request Send Time**: -53.3% (Major Improvement)
- 🟢 **Data Processing Time**: -91.2% (Major Improvement)
- 🟢 **Total Iteration Time**: -55.2% (Major Improvement)
- 🟢 **Downloaded Data**: -92.2% (Major Improvement)

**Low Concurrency:**
- 🟢 **Web Request Send Time**: -63.4% (Major Improvement)
- 🟢 **Data Processing Time**: -93.3% (Major Improvement)
- 🟢 **Total Iteration Time**: -66.2% (Major Improvement)
- 🟢 **Downloaded Data**: -92.2% (Major Improvement)

List-Only Mode (No Comparison)

Config:

{
  "test": "ProfilesPerformanceTest.PostProfilesAsync"
}

Output:

### ProfilesPerformanceTest.PostProfilesAsync
*Metrics by scenario*
**No Concurrency:**

**asset-bundle-registry:**
- Web Request Send Time: 51.1ms (P95: 71.3ms)
- Data Processing Time: 18.1ms (P95: 32.5ms)
- Total Iteration Time: 5554.4ms (P95: 5771.7ms)

**peer-ap1:**
- Web Request Send Time: 284.5ms (P95: 770.9ms)
- Data Processing Time: 21.5ms (P95: 34.2ms)
- Total Iteration Time: 33759.2ms (P95: 34983.2ms)

**peer-ec1:**
- Web Request Send Time: 80.2ms (P95: 162.7ms)
- Data Processing Time: 21.6ms (P95: 36.2ms)
- Total Iteration Time: 9017.6ms (P95: 10293.9ms)

**Low Concurrency:**

**asset-bundle-registry:**
- Web Request Send Time: 86.0ms (P95: 132.9ms)
- Data Processing Time: 57.7ms (P95: 124.6ms)
- Total Iteration Time: 1014.2ms (P95: 1362.6ms)

**peer-ap1:**
- Web Request Send Time: 299.6ms (P95: 767.2ms)
- Data Processing Time: 21.8ms (P95: 39.1ms)
- Total Iteration Time: 3791.6ms (P95: 4116.4ms)

Within Margin of Error

Config:

{
  "test": "AssetBundleRegistryPerformanceTests.GetEntitiesActive",
  "endpoint": "gateway.zone"
}

Output:

### AssetBundleRegistryPerformanceTests.GetEntitiesActive
*Endpoint: `gateway.zone` (performance vs other endpoints)*
**Low Concurrency:**
- ⚪ **Web Request Send Time**: +10.0% (Within Margin of Error)
- ⚪ **Data Processing Time**: +2.2% (Within Margin of Error)
- 🔴 **Total Iteration Time**: +12.3% (Slight Regression)

**High Concurrency:**
- ⚪ **Web Request Send Time**: +9.5% (Within Margin of Error)
- ⚪ **Data Processing Time**: -1.6% (Within Margin of Error)
- ⚪ **Total Iteration Time**: +9.7% (Within Margin of Error)

Dual Report Comparison (CLI)

Command:

python generate_perf_report.py --compare baseline.json new.json comparison.pdf \
  --fixture ProfilesPerformanceTest --label1 "Before" --label2 "After"

PDF Output (per chart):

Grouped bar chart with two bars per scenario
Blue bars: "Before" (Report 1)
Orange bars: "After" (Report 2)
Baseline scenario marked with "base" label on Report 1 bar
Report 2 bars show percentage difference vs Report 1 baseline
Charts for Median, P95, P99 side by side

Examples

Full CI Pipeline

- name: Generate performance report
  run: |
    python scripts/generate_perf_report.py \
      Explorer/PerformanceTestResults.json \
      Explorer/PerformanceBenchmarkReport.pdf \
      --github-summary Explorer/summary.md
    cat Explorer/summary.md >> $GITHUB_STEP_SUMMARY

- name: Upload report
  uses: actions/upload-artifact@v4
  with:
    name: Performance Report
    path: Explorer/PerformanceBenchmarkReport.pdf

Compare Before/After

# Run baseline tests
python generate_perf_report.py --compare \
  results_main.json \
  results_feature.json \
  comparison.pdf \
  --fixture ProfilesPerformanceTest \
  --label1 "main" \
  --label2 "feature-branch"

Performance Benchmark - decentraland/unity-explorer GitHub Wiki

Performance Benchmark Report Generator

Location

Requirements

Windows

macOS / Linux

CI (GitHub Actions)

Usage

Standard Report

Summary Only

GitHub Actions Summary

Compare Two Reports

Configuration

Configuration Structure

Difference Thresholds

Metrics

Default Summary Metrics

Parallelism Categories

Summary Cases

Mode 1: Endpoint Comparison (within same test)

Mode 2: Cross-Test Comparison

Mode 3: List Only (no comparison)

Per-Case Metrics

Input Format

Unit Values

Baseline Detection

Scenario Labels

Output

PDF Report

GitHub Summary (Markdown)

Output Examples

Endpoint Comparison Mode

Cross-Test Comparison Mode

List-Only Mode (No Comparison)

Within Margin of Error

Dual Report Comparison (CLI)

Examples

Full CI Pipeline

Compare Before/After

⚠️ GitHub.com Fallback ⚠️

Performance Benchmark - decentraland/unity-explorer GitHub Wiki

Performance Benchmark Report Generator

Location

Requirements

Windows

macOS / Linux

CI (GitHub Actions)

Usage

Standard Report

Summary Only

GitHub Actions Summary

Compare Two Reports

Configuration

Configuration Structure

Difference Thresholds

Metrics

Default Summary Metrics

Parallelism Categories

Summary Cases

Mode 1: Endpoint Comparison (within same test)

Mode 2: Cross-Test Comparison

Mode 3: List Only (no comparison)

Per-Case Metrics

Input Format

Unit Values

Baseline Detection

Scenario Labels

Output

PDF Report

GitHub Summary (Markdown)

Output Examples

Endpoint Comparison Mode

Cross-Test Comparison Mode

List-Only Mode (No Comparison)

Within Margin of Error

Dual Report Comparison (CLI)

Examples

Full CI Pipeline

Compare Before/After

⚠️ **GitHub.com Fallback** ⚠️

⚠️ GitHub.com Fallback ⚠️