Metrics - CSUS-LLVM/OptSched GitHub Wiki

Some metrics can be difficult to gather or tricky to understand what they are gathering. This wiki page is intended to document these metrics.

Extremely Large Logs

Sometimes the log files can be exceedingly large (excess of 2 GB per file). Your computer may not have enough memory to analyze these log files, as a 2 GB log file can take closer to 10 GB of memory when parsed. If you can aggregate your metrics if you had analyzed it by benchmark, it is possible to split a large log file by benchmark using csplit:

extract-benchs.sh:

#!/usr/bin/env bash
set -e
mkdir -p "$1.logs/"
cd "$1.logs/"
csplit ../"$1"*.log '/^  Building/' '{*}'
grep '  Building' * | sed -E 's/^(\S*):  Building ([^ ]*).*$/\1 \2.log/' | xargs -n2 mv
rm xx*

Usage: extract-benchs.sh thelogs . This creates a directory thelogs.logs/ containing the separated CPU2017 benchmarks from thelogs.log.

From these split logs, you can analyze each benchmark individually and aggregate the results.

Total Compilation Time

Although we only have control over scheduling time, the number that actually matters at the end of the day is the total compilation time. How do we measure this metric across the various benchmark suites?

CPU2006/CPU2017

The SPEC benchmark runners (runcpu/runspec) include a short message after finishing the compilation of a benchmark, with a message of:

64 total seconds elapsed

CPU2017 also has a more precise number in the form:

Elapsed compile for '519.lbm_r': 00:00:04 (4)

To get the total compilation time, this number is simply extracted from the raw string log of the last block in the benchmark.

PlaidML

plaidbench includes a message on the compilation time for each benchmark:

Example finished, elapsed: 123.45s (compile), 42.35s (execution)

Similarly to the SPEC benchmarks, this number is extracted from the raw string log of the last block in the benchmark.

SHOC

SHOC doesn't natively output the compilation time. As such, to get the compilation time, you have to modify the SHOC source code and rebuild. The key is to replace calls to clBuildProgram(...) with a wrapper that also records the elapsed time and prints that time to stderr (stderr causes the messages to be in the same file as the OptSched logs). Then, you can sum up all the occurrences of that message for the entire benchmark.

As an example, you could create a wrapper that looks something like this:

namespace ctmetric
{
    inline cl_int clBuildProgram(cl_program program,
                                 cl_uint num_devices,
                                 const cl_device_id *device_list,
                                 const char *options,
                                 void (*pfn_notify)(cl_program, void *user_data),
                                 void *user_data)
    {
        auto start = std::chrono::high_resolution_clock::now();
        auto result = ::clBuildProgram(program, num_devices, device_list, options, pfn_notify, user_data);
        auto end = std::chrono::high_resolution_clock::now();

        std::clog << "Finished compiling; total ns = " << std::chrono::nanoseconds(end - start).count << '\n';
        return result;
    }
}

Then, in the Python side (assuming the abstracted log types):

_SHOC_TIME_ELAPSED = re.compile(r'Finished compiling; total ns = (?P<elapsed>\d+)')


def shoc_total_compile_time_seconds(logs):
    try:
        elapsed = sum(int(m['elapsed'])
                   for bench in logs.benchmarks
                   for blk in bench
                   for m in _SHOC_TIME_ELAPSED.finditer(blk.raw_log))
        return float(elapsed) * 1e-9
    except TypeError:
        raise KeyError('Logs must contain "Finished compiling; total ns = " output by the modified SHOC benchmark suite')
⚠️ **GitHub.com Fallback** ⚠️