Benchmark Organization Conventions - Sable/Ostrich GitHub Wiki

Every benchmark follows the following conventions in order to be integrated with the other tools of the suite. Requirements and guidelines for every file follow.

short-name/                      % Unique, memorable, short name for the benchmark
    +- Makefile (with capital M) % Driver file for preparing the benchmark to be run, or compiled by   other tools
    +- misc/                     % Benchmark-specific information and supporting files that may be used to generate inputs or may be common to all language implementations 
    +- input/                    % Generated directory with input data obtained running 'make input'
        +- small.txt                 
        +- medium.txt
        +- large.txt
    +- benchmark.json            % Meta-information about the benchmark, that can be processed by tools
    +- implementation-short-name/               % Source code for language 1
        +- Makefile (with capital M)
        +- implementation.json      
        +- core-computation.ext
        +- misc/                 % Implementation-specific and benchmark-specific information and support files
        +- compilers/
            +- compiler-short-name-1/
                +- compiler.json   % Benchmark-specific compilation options
                +- runner.ext      % Runner file
           
            +- compiler-short-name-2/
                +- ...
         
    +- implementation-short-name-2/
        +- ...

short-name

This is the name of the benchmark that people will be repeating over and over again both to access it on the filesystem, think about the results, and explain their empirical results orally in team meetings and presentations. Ideally it should be 2-3 syllables, short to type, easy to remember. Otherwise, if the name comes from another benchmark suite and is well established, that should take priority over the previous requirement.

The name should be all lowercase.

short-name/Makefile

The Makefile takes the source code, generate multiple versions given experimentation parameters, and output different versions according to particular experiments.

It needs to support at least the following targets:

Target Explanation
input/all Generate the small, medium, and large input data files in the input dir.
input/small Generate the small input in the input dir.
input/medium Generate the medium input in the input dir.
input/large Generate the large input in the input dir.
bench Generate the runnable code for LANG=name, where name defaults to 'c'.
Options Explanation
RUN_DIR Output directory for running code generation.

misc (Miscellaneous)

Supporting files and information that is common for all language implementations. Language-specific information should go into the directory for the particular language implementation.

input

Generated once the first time the benchmark is prepared.

A 'small'-sized input should take in the order of 0.1 second or less for the reference C implementation to run and is mainly used to quickly ensure correctness of code transformations on the entire benchmark suite.

A 'medium'-sized input should take in the order of 1-10 seconds for the reference C implementation to and is suitable for performance evaluation of numerical programs (since the other language implementations might be multiple times slower).

A 'large'-sized input should take more than 10 seconds to run for the reference C implementation and is used to stress test the implementation and see if the results for the "medium"-sized version still hold on larger inputs.

benchmark.json

This provides meta-information to be used by tools to pull required dependencies to setup the benchmark, run the benchmark, automatically create tables and figures for reports and papers, etc.

It needs to be a valid JSON file according to JSON linters, and have the following properties:

Name Type Value Example
name String Long name for the benchmark, should spell out acronyms "Lower-Upper Decomposition"
short-name String Short, memorable, easy to remember, unique amongst the suite. Should be the same as the directory name for the benchmark "lud"
description String Long description of what the benchmark is computing "In numerical analysis, Lower-Upper Decomposition [...]"
version String Version number for the benchmark ("X.Y.Z") "1.0.2"
languages Array of Strings List of language implementations with their canonical language name (see below) ["c", "js"]
input-size Object Values for various sizes of input {small:1, medium:10, large:100}
input-ext String Input file extension "csv"
random-seed Number Random seed for data generation and random algorithms 1337

Version number convention

"X.Y.Z" where:

  • X is incremented after a source code change that modify the output of the benchmark for the same input
  • Y is incremented if it might have an effect on performance
  • Z is incremented in the change is cosmetic and does not change either correctness or the performance characteristics of the program (ex: benchmark directory structure or comments in the source code)

List of language implementations

All names should be lower-case and be the short mnemonic names people use to refer to these languages. Equivalence table:

Canonical Language Name Covered Cases
c C,c
js JavaScript, javascript
matlab MATLAB, matlab
opencl OpenCL, opencl
webcl WebCL, webcl

Language implementations