Benchmarker - NetLogo/Tortoise GitHub Wiki

Benchmarker

Tortoise has a benchmarking tool that allows you to run benchmarks in Node.js using the V8 JavaScript runtime engine.Benchmarking results are dumped to the engine-benchmarks.txt file at the root of the repository. To run the benchmarker, simply run the netLogoWeb/benchmark task in sbt.

The benchmarker also accepts some options to customize your benchmarking runs. They are as follows:

  • --comment/--comments
    • Takes one argument, which is a string comment that will be associated with your benchmarking run. Useful to help not forget why you ran the benchmark or specifying the "before" and "after" of a series of benchmarks.
  • --quick
    • Takes no arguments. Overrides all of the options listed below to force a fast benchmarking run (1 iteration of BZ Benchmark in V8).
  • --count/--iters/--num
    • Takes one argument, which is the number of times you would like the benchmarks run in each engine. The default value is 3.
  • --ticks
    • Takes one argument, which is the number of ticks to run the go procedure of a model for if that model doesn't have its own benchmark procedure. The default is 100. Note that setup will also be assumed to exist and be run first when using this option. A lot of models have a non-default number of ticks to run set in the Model.scala file.
  • --engine
    • Takes one or more arguments, each of which indicates an engine in which the benchmarks should be run. node, v8, google, or chrome indicates V8; mozilla, firefox, or spidermonkey indicates SpiderMonkey; graal, java, oracle, rhino, or nashorn indicates the GraalVM JS engine. You can install SpiderMonkey locally on your machine and make sure it's on your PATH to try to get it working, too, but it hasn't been tested in a long time. Typically we only test V8 unless we have a good reason to check the others.

Here are some examples (assuming that these are being run from the root of the repository):

  • ./sbt.sh netLogoWeb/benchmark
    • Run the benchmarker with the default configuration (3 iterations, all models, all engines)
  • ./sbt.sh 'netLogoWeb/benchmark --quick --comment "Redesigned turtle jumping"'
    • Run the quick benchmarking mode (1 iteration, BZ Benchmark, only in V8), using the comment "Redesigned turtle jumping"
  • ./sbt.sh 'netLogoWeb/benchmark --iters 5 --engine graal --models "Wealth Benchmark" "Heatbugs Benchmark" "Erosion Benchmark"'
    • Run 5 iterations each of three different models in the GraalVM JS engine.
  • ./sbt.sh netLogoWeb/benchmark --count 9001 --engine oracle mozilla
    • Run the benchmarker 9001 times in both Nashorn and SpiderMonkey. (P.S. This will take forever.)

Tips

Benchmarking should be done a system with no other activities going on. Browsing the web, working on other code, or watching videos will impact the results due to processor and memory contention. Disabling networking can help ensure no background process fires up to download updates or do other idle work. For the best results run a lot of iters and a high number of ticks. The faster the model finishes a single iteration the more variance you're likely to see due to startup/warmup time. The benchmarker, by default, uses the same random seed for all runs. This means the variance between runs should be low, so seeing a high variance is a good indication something was impacting the available resources for the run.

To put it another way, running a low number of iterations with a modest 100 ticks (the defaults) on a busy machine will only reliably show very large performance differences. Any other differences you see are likely to be within the margin of error of the noise and variance. The solution to get results that can be trusted for changes with low impact (say 1-5% change in performance) is to run a higher number of iterations and ticks together on a very quiet machine.

Writing out your benchmarking command(s) into a simple shell script is really useful for making it easy to run the before and after sequentially (with the appropriate git commands in-between) without user intervention, or just having the benchmark be repeatable as work is done.