Benchmarking - coq/coq GitHub Wiki

Benchmarking

For changes based on master: Make a PR and have someone with the rights on github say @coqbot bench or someone with the rights on gitlab start the "bench" job (in this case it's possible to change the tested packages without pushing a commit by setting the coq_opam_packages environment variable when starting the job).

To compare arbitrary Coq commits: currently impractical.

The job itself produces a looooong log. At its end you should see the results rendered as a table:

benchmarking-results.0.png

The table is also posted on zulip.

The table and the major per-command timing changes are posted to the PR on github once the bench completes.

Each line shows the measurement for a single OPAM package.

Each measured/computed quantity has its own column.

The git commits, that were considered are stated explicitely below the table as NEW and OLD.

E.g., in the table shown above, we see that the compilation of coq-geocoq

  • originally took 2394.71 seconds
  • now it takes 2215.65
  • which means that it decreased by cca. 7%

The lines of the table are ordered wrt. improvements in the overall compilation times.

The artifacts for the job contain

  • build logs in _bench/logs. When a package fails to install the error should be in $package.NEW/OLD.opam_install.1.stdout.log (it's probably also in stderr.log but with less context).

    vosize.log contains the sizes of the generated .vo files.

    _bench/logs also contains pretty printed per-file timings as $package.BOTH.perfile_timings.1.log.

  • _bench/timings contains the per-package table as bench_summary, it also has tables for per-command timings looking like

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                  Significant line time changes in bench                                                                   │
│                                                                                                                                                                           │
│   OLD       NEW      DIFF      %DIFF     Ln                           FILE                                                                                                │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│  47.7800   45.3170  -2.4630     -5.15%    558  coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html                          │
│ 214.0560  212.7540  -1.3020     -0.61%    141  coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html                                                 │
│ 113.1280  112.3790  -0.7490     -0.66%    999  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html                                            │
│ 113.5220  112.7780  -0.7440     -0.66%    968  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html                                            │
│ 122.5760  121.8830  -0.6930     -0.57%    155  coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html                                                 │
│   3.3200    2.7230  -0.5970    -17.98%    487  coq-stdlib/Numbers/HexadecimalFacts.v.html                                                                                 
...
  • _bench/html contains rendered files to display the per-command changes graphically. They are linked from the timings tables.

  • _bench/opam.{NEW,OLD} contains the raw data for the per-command timings

Interpreting bench results

The coq-core job is the OCaml part of Coq and doesn't really matter.

The coq-stdlib job is run with parallel make (-j $nprocs) which causes noise, so should be ignored unless it has especially significant changes.

In general noise is in the +/-1% range. It can be more for shorter jobs (where 1s can be more than 1%) and for some reason coqprime. Changes above 5% are almost certainly not noise.

It can also be interesting to look at the proportion of jobs with positive vs negative time differences. For instance https://gitlab.com/coq/coq/-/jobs/1919336897 has only small PDIFFs but they clearly trend negative (ie faster).

Benchmarking (with "overlays")

The following two environment variables

  • new_coq_opam_archive_git_uri
  • new_coq_opam_archive_git_branch

enable us to define "overlays", i.e. tweak the definitions of OPAM packages to be used with new_coq_commit.

They can be set when starting the bench job.

Choosing opam package

The variable coq_opam_packages can be set to restrict the list of opam packages to bench.

Setting up new machines

Replace TOKEN by the one from https://gitlab.com/coq/coq/-/settings/ci_cd see https://docs.gitlab.com/runner/register/#linux

# opam, time and perf for the bench
# perl and gmp for zarith
# lua for timelog2html (NB it wants 5.1 specifically at time of writing this, see shebang)
# m4, autoconf and automake for hott and others
# jq for fiat-crypto and zulip posting
# python-is-python3 for test suite's timing table invocation
sudo apt-get install -y curl opam time linux-perf perl libgmp-dev lua5.1 m4 automake autoconf jq python-is-python3

sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c "echo 'kernel.perf_event_paranoid = -1' > /etc/sysctl.d/00-coq-bench.conf"

curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash

sudo env GITLAB_RUNNER_DISABLE_SKEL=true apt-get install -y gitlab-runner

sudo gitlab-runner register \
     --non-interactive \
     --url 'https://gitlab.com' \
     --registration-token TOKEN \
     --name "$(hostname -f)" \
     --tag-list timing \
     --executor shell

sudo gitlab-runner start