Benchmarking - coq/coq GitHub Wiki
Benchmarking
For changes based on master:
Make a PR and have someone with the rights on github say @coqbot bench
or someone with the rights on gitlab start the "bench" job (in this case it's possible to change the tested packages without pushing a commit by setting the coq_opam_packages environment variable when starting the job).
To compare arbitrary Coq commits: currently impractical.
The job itself produces a looooong log. At its end you should see the results rendered as a table:
The table is also posted on zulip.
The table and the major per-command timing changes are posted to the PR on github once the bench completes.
Each line shows the measurement for a single OPAM package.
Each measured/computed quantity has its own column.
The git commits, that were considered are stated explicitely below the table as NEW and OLD.
E.g., in the table shown above, we see that the compilation of coq-geocoq
- originally took 2394.71 seconds
- now it takes 2215.65
- which means that it decreased by cca. 7%
The lines of the table are ordered wrt. improvements in the overall compilation times.
The artifacts for the job contain
-
build logs in
_bench/logs
. When a package fails to install the error should be in$package.NEW/OLD.opam_install.1.stdout.log
(it's probably also in stderr.log but with less context).vosize.log
contains the sizes of the generated.vo
files._bench/logs
also contains pretty printed per-file timings as$package.BOTH.perfile_timings.1.log
. -
_bench/timings
contains the per-package table asbench_summary
, it also has tables for per-command timings looking like
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Significant line time changes in bench │
│ │
│ OLD NEW DIFF %DIFF Ln FILE │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 47.7800 45.3170 -2.4630 -5.15% 558 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │
│ 214.0560 212.7540 -1.3020 -0.61% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │
│ 113.1280 112.3790 -0.7490 -0.66% 999 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │
│ 113.5220 112.7780 -0.7440 -0.66% 968 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │
│ 122.5760 121.8830 -0.6930 -0.57% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │
│ 3.3200 2.7230 -0.5970 -17.98% 487 coq-stdlib/Numbers/HexadecimalFacts.v.html
...
-
_bench/html
contains rendered files to display the per-command changes graphically. They are linked from thetimings
tables. -
_bench/opam.{NEW,OLD}
contains the raw data for the per-command timings
Interpreting bench results
The coq-core
job is the OCaml part of Coq and doesn't really matter.
The coq-stdlib
job is run with parallel make (-j $nprocs
) which causes noise, so should be ignored unless it has especially significant changes.
In general noise is in the +/-1% range. It can be more for shorter jobs (where 1s can be more than 1%) and for some reason coqprime. Changes above 5% are almost certainly not noise.
It can also be interesting to look at the proportion of jobs with positive vs negative time differences. For instance https://gitlab.com/coq/coq/-/jobs/1919336897 has only small PDIFFs but they clearly trend negative (ie faster).
Benchmarking (with "overlays")
The following two environment variables
new_coq_opam_archive_git_uri
new_coq_opam_archive_git_branch
enable us to define "overlays", i.e. tweak the definitions of OPAM packages to be used with new_coq_commit
.
They can be set when starting the bench job.
Choosing opam package
The variable coq_opam_packages
can be set to restrict the list of opam packages to bench.
Setting up new machines
Replace TOKEN by the one from https://gitlab.com/coq/coq/-/settings/ci_cd see https://docs.gitlab.com/runner/register/#linux
# opam, time and perf for the bench
# perl and gmp for zarith
# lua for timelog2html (NB it wants 5.1 specifically at time of writing this, see shebang)
# m4, autoconf and automake for hott and others
# jq for fiat-crypto and zulip posting
# python-is-python3 for test suite's timing table invocation
sudo apt-get install -y curl opam time linux-perf perl libgmp-dev lua5.1 m4 automake autoconf jq python-is-python3
sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c "echo 'kernel.perf_event_paranoid = -1' > /etc/sysctl.d/00-coq-bench.conf"
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
sudo env GITLAB_RUNNER_DISABLE_SKEL=true apt-get install -y gitlab-runner
sudo gitlab-runner register \
--non-interactive \
--url 'https://gitlab.com' \
--registration-token TOKEN \
--name "$(hostname -f)" \
--tag-list timing \
--executor shell
sudo gitlab-runner start