Benchmarking - rocq-prover/rocq GitHub Wiki
Benchmarking
For changes based on master:
Make a PR and have someone with the rights on github say @coqbot bench
or someone with the rights on gitlab start the "bench" job (in this case it's possible to change the tested packages without pushing a commit by setting the coq_opam_packages environment variable when starting the job).
To compare arbitrary Coq commits: currently impractical.
The job itself produces a looooong log. At its end you should see the results rendered as a table:
┌──────────────────────────┬───────────────────────┬─────────────────────────────────────┬──────────────────────────┐
│ │ user time [s] │ CPU instructions │ max resident mem [KB] │
│ │ │ │ │
│ package_name │ NEW OLD PDIFF │ NEW OLD PDIFF │ NEW OLD PDIFF │
├──────────────────────────┼───────────────────────┼─────────────────────────────────────┼──────────────────────────┤
│ coq-core │ 2.98 3.02 -1.32 │ 20652348297 20643391337 0.04 │ 92152 91860 0.32 │
│ coq-metacoq-utils │ 22.62 22.89 -1.18 │ 149676333158 149801269733 -0.08 │ 593776 594448 -0.11 │
│ coq-metacoq-translations │ 16.73 16.87 -0.83 │ 119913272164 121304824094 -1.15 │ 766524 773868 -0.95 │
│ coq-metacoq-template │ 147.31 147.63 -0.22 │ 994070735750 995041625581 -0.10 │ 1034592 1050076 -1.47 │
│ coq-metacoq-common │ 66.83 66.76 0.10 │ 435770086025 436982991560 -0.28 │ 1034788 1050288 -1.48 │
│ coq-metacoq-erasure │ 514.51 513.10 0.27 │ 3557161626336 3557955538910 -0.02 │ 1797852 1828656 -1.68 │
│ coq-metacoq-safechecker │ 345.34 344.33 0.29 │ 2618349133036 2616884721455 0.06 │ 1666244 1651488 0.89 │
│ coq-metacoq-pcuic │ 660.97 658.46 0.38 │ 4282611577498 4284738620863 -0.05 │ 2430836 2265416 7.30 │
│ rocq-equations │ 8.62 8.58 0.47 │ 59102168827 59111582307 -0.02 │ 396380 396512 -0.03 │
│ rocq-runtime │ 74.43 73.71 0.98 │ 537538217178 537370067435 0.03 │ 484384 486740 -0.48 │
│ coq-category-theory │ 594.58 587.42 1.22 │ 4398470158491 4383712971053 0.34 │ 905540 1020072 -11.23 │
│ rocq-core │ 6.55 6.43 1.87 │ 40345935823 40320391231 0.06 │ 437428 437856 -0.10 │
└──────────────────────────┴───────────────────────┴─────────────────────────────────────┴──────────────────────────┘
The table is also posted on zulip.
The table and the major per-command timing changes are posted to the PR on github once the bench completes.
Each line shows the measurement for a single OPAM package.
Each measured/computed quantity has its own column.
The git commits, that were considered are stated explicitely below the table as NEW and OLD.
E.g., in the table shown above, we see that the compilation of coq-metacoq-translations
- originally took 16.87 seconds
- now it takes 16.73 seconds
- which means that it decreased by cca. 0.83%
The lines of the table are ordered by increasing user time PDIFF.
The artifacts for the job contain
-
build logs in
_bench/logs
. When a package fails to install the error should be in$package.NEW/OLD.opam_install.1.stdout.log
(it's probably also in stderr.log but with less context).vosize.log
contains the sizes of the generated.vo
files._bench/logs
also contains pretty printed per-file timings as$package.BOTH.perfile_timings.1.log
. -
_bench/timings
contains the per-package table asbench_summary
, it also has tables for per-command timings looking like
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ Significant line time changes in bench │
│ │
│ OLD NEW DIFF %DIFF Ln FILE │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ 47.7800 45.3170 -2.4630 -5.15% 558 coq-fiat-crypto-with-bedrock/rupicola/bedrock2/bedrock2/src/bedrock2Examples/insertionsort.v.html │
│ 214.0560 212.7540 -1.3020 -0.61% 141 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │
│ 113.1280 112.3790 -0.7490 -0.66% 999 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │
│ 113.5220 112.7780 -0.7440 -0.66% 968 coq-performance-tests-lite/src/fiat_crypto_via_setoid_rewrite_standalone.v.html │
│ 122.5760 121.8830 -0.6930 -0.57% 155 coq-fiat-crypto-with-bedrock/src/UnsaturatedSolinasHeuristics/Tests.v.html │
│ 3.3200 2.7230 -0.5970 -17.98% 487 coq-stdlib/Numbers/HexadecimalFacts.v.html
...
-
_bench/html
contains rendered files to display the per-command changes graphically. They are linked from thetimings
tables. -
_bench/opam.{NEW,OLD}
contains the raw data for the per-command timings
Interpreting bench results
The rocq-runtime
and coq-core
jobs are the OCaml part of Coq and don't really matter.
In general noise is in the +/-1% range. It can be more for shorter jobs (where 1s can be more than 1%) and for some reason coqprime. Changes above 5% are almost certainly not noise.
CPU instruction noise is in the +/-0.1% range.
Max memory can be very noisy.
It can also be interesting to look at the proportion of jobs with positive vs negative time differences. For instance https://gitlab.com/coq/coq/-/jobs/1919336897 has only small PDIFFs but they clearly trend negative (ie faster).
Benchmarking (with "overlays")
The following two environment variables
new_coq_opam_archive_git_uri
new_coq_opam_archive_git_branch
enable us to define "overlays", i.e. tweak the definitions of OPAM packages to be used with new_coq_commit
.
If you only need to point the package at a branch without changing the build instructions, you can also use
new_opam_override_urls
and old_opam_override_urls
.
For instance new_opam_override_urls="coq-hott.dev git+https://github.com/some-user/coq-hott#some-branch"
(make sure to include the version for the opam package, note that just https won't work).
They can be set when starting the bench job in the gitlab web interface.
Choosing opam packages
The variable coq_opam_packages
can be set to restrict the list of opam packages to bench.
Setting up new machines
Replace TOKEN by the one from https://gitlab.com/coq/coq/-/settings/ci_cd see https://docs.gitlab.com/runner/register/#linux
# opam, time and perf for the bench
# perl and gmp for zarith
# zlib1g-dev for timelog2html
# m4, autoconf and automake for hott and others
# jq for fiat-crypto and zulip posting
# python-is-python3 for test suite's timing table invocation
sudo apt-get install -y curl opam time linux-perf perl libgmp-dev zlib1g-dev m4 automake autoconf jq python-is-python3
sudo bash -c 'echo -1 > /proc/sys/kernel/perf_event_paranoid'
sudo bash -c "echo 'kernel.perf_event_paranoid = -1' > /etc/sysctl.d/00-coq-bench.conf"
curl -L "https://packages.gitlab.com/install/repositories/runner/gitlab-runner/script.deb.sh" | sudo bash
sudo env GITLAB_RUNNER_DISABLE_SKEL=true apt-get install -y gitlab-runner
sudo gitlab-runner register \
--non-interactive \
--url 'https://gitlab.com' \
--registration-token TOKEN \
--name "$(hostname -f)" \
--tag-list timing \
--executor shell
sudo gitlab-runner start