Benchmarking - rocq-prover/rocq GitHub Wiki

Benchmarking

For changes based on master: Make a PR and have someone with the rights on github say @coqbot bench or someone with the rights on gitlab start the "bench" job (in this case it's possible to change the tested packages without pushing a commit by setting the coq_opam_packages environment variable when starting the job).

To compare arbitrary Coq commits: currently impractical.

The job itself produces a looooong log. At its end you should see the results rendered as a table:

┌──────────────────────────┬───────────────────────┬─────────────────────────────────────┬──────────────────────────┐
│                          │     user time [s]     │          CPU instructions           │  max resident mem [KB]   │
│                          │                       │                                     │                          │
│       package_name       │  NEW     OLD    PDIFF │      NEW            OLD       PDIFF │   NEW      OLD    PDIFF  │
├──────────────────────────┼───────────────────────┼─────────────────────────────────────┼──────────────────────────┤
│                 coq-core │   2.98    3.02  -1.32 │   20652348297    20643391337   0.04 │   92152    91860    0.32 │
│        coq-metacoq-utils │  22.62   22.89  -1.18 │  149676333158   149801269733  -0.08 │  593776   594448   -0.11 │
│ coq-metacoq-translations │  16.73   16.87  -0.83 │  119913272164   121304824094  -1.15 │  766524   773868   -0.95 │
│     coq-metacoq-template │ 147.31  147.63  -0.22 │  994070735750   995041625581  -0.10 │ 1034592  1050076   -1.47 │
│       coq-metacoq-common │  66.83   66.76   0.10 │  435770086025   436982991560  -0.28 │ 1034788  1050288   -1.48 │
│      coq-metacoq-erasure │ 514.51  513.10   0.27 │ 3557161626336  3557955538910  -0.02 │ 1797852  1828656   -1.68 │
│  coq-metacoq-safechecker │ 345.34  344.33   0.29 │ 2618349133036  2616884721455   0.06 │ 1666244  1651488    0.89 │
│        coq-metacoq-pcuic │ 660.97  658.46   0.38 │ 4282611577498  4284738620863  -0.05 │ 2430836  2265416    7.30 │
│           rocq-equations │   8.62    8.58   0.47 │   59102168827    59111582307  -0.02 │  396380   396512   -0.03 │
│             rocq-runtime │  74.43   73.71   0.98 │  537538217178   537370067435   0.03 │  484384   486740   -0.48 │
│      coq-category-theory │ 594.58  587.42   1.22 │ 4398470158491  4383712971053   0.34 │  905540  1020072  -11.23 │
│                rocq-core │   6.55    6.43   1.87 │   40345935823    40320391231   0.06 │  437428   437856   -0.10 │
└──────────────────────────┴───────────────────────┴─────────────────────────────────────┴──────────────────────────┘

The table is also posted on zulip.

The table and the major per-command timing changes are posted to the PR on github once the bench completes.

Each line shows the measurement for a single OPAM package.

Each measured/computed quantity has its own column.

The git commits, that were considered are stated explicitely below the table as NEW and OLD.

E.g., in the table shown above, we see that the compilation of coq-metacoq-translations

  • originally took 16.87 seconds
  • now it takes 16.73 seconds
  • which means that it decreased by cca. 0.83%

The lines of the table are ordered by increasing user time PDIFF.

The artifacts for the job contain

  • build logs in _bench/logs. When a package fails to install the error should be in $package.NEW/OLD.opam_install.1.stdout.log (it's probably also in stderr.log but with less context).

    vosize.log contains the sizes of the generated .vo files.

    _bench/logs also contains pretty printed per-file timings as $package.BOTH.perfile_timings.1.log.

  • _bench/timings contains the per-package table as bench_summary, it also has tables for per-command timings looking like

┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│                                                                  Significant line time changes in bench                                                                   │
│                                                                                                                                                                           │
│   OLD       NEW      DIFF      %DIFF     Ln                           FILE                                                                                                │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│  47.7800   45.3170  -2.4630     -5.15%    558  coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html                          │
│ 214.0560  212.7540  -1.3020     -0.61%    141  coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html                                                 │
│ 113.1280  112.3790  -0.7490     -0.66%    999  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html                                            │
│ 113.5220  112.7780  -0.7440     -0.66%    968  coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html                                            │
│ 122.5760  121.8830  -0.6930     -0.57%    155  coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html                                                 │
│   3.3200    2.7230  -0.5970    -17.98%    487  coq-stdlib/Numbers/HexadecimalFacts.v.html                                                                                 
...
  • _bench/html contains rendered files to display the per-command changes graphically. They are linked from the timings tables.

  • _bench/opam.{NEW,OLD} contains the raw data for the per-command timings

Interpreting bench results

The rocq-runtime and coq-core jobs are the OCaml part of Coq and don't really matter.

In general noise is in the +/-1% range. It can be more for shorter jobs (where 1s can be more than 1%) and for some reason coqprime. Changes above 5% are almost certainly not noise.

CPU instruction noise is in the +/-0.1% range.

Max memory can be very noisy.

It can also be interesting to look at the proportion of jobs with positive vs negative time differences. For instance https://gitlab.com/coq/coq/-/jobs/1919336897 has only small PDIFFs but they clearly trend negative (ie faster).

Benchmarking (with "overlays")

The following two environment variables

  • new_coq_opam_archive_git_uri
  • new_coq_opam_archive_git_branch

enable us to define "overlays", i.e. tweak the definitions of OPAM packages to be used with new_coq_commit.

If you only need to point the package at a branch without changing the build instructions, you can also use new_opam_override_urls and old_opam_override_urls. For instance new_opam_override_urls="coq-hott.dev git+https://github.com/some-user/coq-hott#some-branch" (make sure to include the version for the opam package, note that just https won't work).

They can be set when starting the bench job in the gitlab web interface.

Choosing opam packages

The variable coq_opam_packages can be set to restrict the list of opam packages to bench.

Setting up new machines

Replace TOKEN by the one from https://gitlab.com/coq/coq/-/settings/ci_cd see https://docs.gitlab.com/runner/register/#linux

# opam, time and perf for the bench
# perl and gmp for zarith
# zlib1g-dev for timelog2html
# m4, autoconf and automake for hott and others
# jq for fiat-crypto and zulip posting
# python-is-python3 for test suite's timing table invocation
sudo apt-get install -y curl opam time linux-perf perl libgmp-dev zlib1g-dev m4 automake autoconf jq python-is-python3

sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c "echo 'kernel.perf_event_paranoid = -1' > /etc/sysctl.d/00-coq-bench.conf"

curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash

sudo env GITLAB_RUNNER_DISABLE_SKEL=true apt-get install -y gitlab-runner

sudo gitlab-runner register \
     --non-interactive \
     --url 'https://gitlab.com' \
     --registration-token TOKEN \
     --name "$(hostname -f)" \
     --tag-list timing \
     --executor shell

sudo gitlab-runner start