Benchmarking Guide - conjure-cp/conjure-oxide GitHub Wiki

Why benchmark?

Benchmarking is an essential part of any coding project, especially when it is performance-oriented. While it can be a little daunting when first getting started, this guide aims to show that benchmarking can be integrated into conjure-oxide and its workflows with a little work.

`criterion`

By far, the most popular benchmarking tool currently available for Rust is the criterion crate. Based off the Haskell library of the same name, it is a statistics-based tool which aims to measure wall-clock time for individual functions. To get started on criterion benching in a rust project my_project, you first need to make a directory called benches, which Rust will recognise as holding all benchmarking files. Let's now make a benchmark called my_bench.rs inside of my_project/benches

We now add the following changes to the crate's cargo.toml file

[dev-dependencies]
criterion = "0.3"

[bench](/conjure-cp/conjure-oxide/wiki/bench)
name = "my_bench"
harness = false

Suppose that we now want to create function to benchmark the addition of two numbers (which, as expected should be very fast!). We add the following to my_bench.

use criterion::{Criterion, black_box, criterion_group, criterion_main};

pub fn add(x: u64, y: u64) -> u64 {
    x + y
}

pub fn criterion_benchmark(c: &mut Criterion) {
    c.bench_function("add 20 + 20", |b| {
        b.iter(|| add(black_box(20), black_box(20)))
    });
}

criterion_group!(benches, criterion_benchmark);
criterion_main!(benches);

The .bench_function method creates an instance of a benchmark. Then the .iter method tells Criterion to repeatedly execute the provided closure. Finally, black_box is used to prevent the compiler from optimising away the code being benchmarked. To run the benchmark simply run cargo bench. Among other things, your terminal should show something alone the lines of: Which shows the average wall-clock time, as well as providing some information on outliers and performance against previous benchmarks. For the full details, see \target\criterion. The .html reports are especially good.

criterion is usually the right tool for most benchmarks, althought there are issues. Due to the statistics-driven ethos of criterion, there is currently no one-shot support, with 10 samples being the minimum number of samples for a benchmark. Wall-clock time also gives little insight into where things are slowing in your code, and will not catch things like poor memory locality. Even more crucially, however, is how criterion performs in CI pipelines. The developer’s themselves say the following:

”You probably shouldn’t (or, if you do, don’t rely on the results). The virtualization used by Cloud-CI providers like Travis-CI and Github Actions introduces a great deal of noise into the benchmarking process, and Criterion.rs’ statistical analysis can only do so much to mitigate that. This can result in the appearance of large changes in the measured performance even if the actual performance of the code is not changing."

As such, we need some other metric apart from wall-clock time to use in order to still run benchmarks in a CI pipeline. This is where the iai-callgrind crate comes in.

`iai-callgrind`

Iai-Callgrind is a benchmarking framework which uses Valgrind's Callgrind and other to provide extremely accurate and consistent measurements of Rust code. It does not provide information on wall-clock time, instead focussing on metrics like instruction count and memory hit rates. It is important to note that this will only run on linux, due to the valgrind dependancy. Let us create a benchmark called iai-bench in the benches folder. We add the following to cargo.toml

[profile.bench]
debug = true

[dev-dependencies]
iai-callgrind = "0.14.0"
criterion = "0.3"

[bench](/conjure-cp/conjure-oxide/wiki/bench)
name = "iai-bench"
harness = false

To get the benchmarking runner, we can quickly compile from source with cargo install --version 0.14.0 iai-callgrind-runner. To benchmark add using iai-callgrind we add the following to benches/iai-bench.rs.

use iai_callgrind::{main, library_benchmark_group, library_benchmark};
use std::hint::black_box;

fn add(x: u64, y:u64) -> u64 {
   x+y
}

#[library_benchmark]
#[bench::name(20,20)]
fn bench_add(x: u64,y:u64) -> u64 {
    black_box(add(x,y))
}

library_benchmark_group!(
    name = bench_fibonacci_group;
    benchmarks = bench_add
);

main!(library_benchmark_groups = bench_fibonacci_group);

And again run using cargo bench. To specify only running this benchmark we can instead do cargo bench --bench iai-bench. Upon running, you should see something like the following.

As you can see, iai is lightweight, fast and can provide some really accurate statistics on instruction count and memory hits. This makes iai perfect for benching in CI workflows!

Workflows

Once benchmarking is established, workflows are not too difficult to add too. As discussed before, for CI workflows iai should be used, and not criterion. Take the following example from the tree-morph crate. I will put the code below and then briefly explain each portion. It should not be too difficult to adapt to other benchmarks.

name: "iai tree-morph Benchmarks"

on:
  push:
    branches:
      - main 
      - auto-bench 
    paths:
      - conjure_oxide/**
      - solvers/**
      - crates/**
      - Cargo.*
      - conjure_oxide/tests/**
      - .github/workflows/iai-tree-morph-benches.yml
  pull_request:
    paths:
      - conjure_oxide/**
      - solvers/**
      - crates/**
      - Cargo.*
      - conjure_oxide/tests/**
      - .github/workflows/iai-tree-morph-benches.yml
  workflow_dispatch:



jobs:
  benches:
    name: "Run iai tree-morph benchmarks"
    runs-on: ubuntu-latest
    timeout-minutes: 10

    strategy:
      # run all combinations of the matrix even if one combination fails.
      fail-fast: false
      matrix:
        rust_release:
          - stable
          - nightly
    steps:
      - uses: actions/checkout@v4

      - uses: dtolnay/rust-toolchain@stable
        with:
          toolchain: ${{ matrix.rust_release }}

      - name: "Cache Rust dependencies"
        uses: actions/cache@v4
        with:
            path: |
              ~/.cargo/registry
              ~/.cargo/git
              target
            key: ${{ runner.os }}-cargo-${{ matrix.rust_release }}-${{ hashFiles('**/Cargo.lock') }}
            restore-keys: |
              ${{ runner.os }}-cargo-${{ matrix.rust_release }}-
      - name: Install Valgrind
        run: sudo apt-get update && sudo apt-get install -y valgrind

      - name: Install iai-callgrind-runner
        run: cargo install --version 0.14.0 iai-callgrind-runner

      - name: Run tree-morph benchmarks with iai-callgrind
        run: cargo bench --manifest-path crates/tree_morph/Cargo.toml --bench iai-factorial --bench iai-identity --bench iai-modify_leafs > iai_callgrind_output.txt

      - name: Upload artefact
        uses: actions/upload-artifact@v4
        with:
          name: iai-callgrind-results-${{ matrix.rust_release }}
          path: iai_callgrind_output.txt

Some comments:

name just tells GitHub what to call the workflows
on tells GitHub when to run the workflow
jobs is the core of the workflow:
- strategy specifies that we want to run both nightly and stable rust
- In steps, we first check out the repository code and set up a specific stable Rust toolchain based on a matrix variable, and then cache Rust dependencies. Next we install the necessary things for valgrind to run, before running benchmarks. We tell the virtual machine to an .txt file and upload it as an artefact.