Benchmark Results: Wholesystem Qemu - ruijiefang/llvm-hcs GitHub Wiki

qemu x86_64 wholesystem benchmark

We used the scripts from unixbench: https://github.com/kdlucas/byte-unixbench to perform the wholesystem benchmark. The following benchmark programs in unixbench are used:

pipe
spawn
context1
syscall 
dhry2

A script is registered to crontab on the guest qemu OS to perform these benchmarks upon startup, and then shutdown the system immediately.

Results

We averaged perf data across 6 runs. The standard deviation is shown in the right column. The runtime did not change, but we do see an improvement in icache misses:

Baseline (PGO)

     6,340,961,198      instructions              #    1.24  insn per cycle           ( +-  0.94% )
     1,072,825,058      branches                  #   27.828 M/sec                    ( +-  0.83% )
        18,154,956      branch-misses             #    1.69% of all branches          ( +-  0.49% )                                
        34,467,405      icache.misses             #    0.894 M/sec                    ( +-  0.93% )                                                                                             
     1,803,675,781      icache.hit                #   46.785 M/sec                    ( +-  0.71% )  
     38.66 seconds (stddev: +- 0.12%)

icache miss percentage: 1.912%

With splitting into cold section

      38212.371184      task-clock (msec)         #    0.994 CPUs utilized            ( +-  0.14% )
            11,010      context-switches          #    0.288 K/sec                    ( +-  8.85% )
               249      cpu-migrations            #    0.007 K/sec                    ( +- 42.13% )
             9,854      page-faults               #    0.258 K/sec                    ( +-  1.33% )
     3,756,725,090      cycles                    #    0.098 GHz                      ( +-  1.61% )
     2,428,286,682      instructions              #    0.65  insn per cycle           ( +-  2.18% )
       489,387,637      branches                  #   12.807 M/sec                    ( +-  1.80% )
        15,260,406      branch-misses             #    3.12% of all branches          ( +-  1.26% )
        33,658,280      L1-icache-load-misses                                         ( +-  1.94% )
        33,658,280      icache.misses             #    0.881 M/sec                    ( +-  1.94% )
     1,738,128,851      icache.hit                #   45.486 M/sec                    ( +-  1.84% )

      38.433942833 seconds time elapsed                                          ( +-  0.18% )

icache miss percentage: 1.936%

Without splitting into cold section


      38146.482805      task-clock (msec)         #    0.995 CPUs utilized            ( +-  0.11% )
            10,248      context-switches          #    0.269 K/sec                    ( +-  3.06% )
               125      cpu-migrations            #    0.003 K/sec                    ( +-  3.55% )
             9,741      page-faults               #    0.255 K/sec                    ( +-  1.36% )
     3,682,423,565      cycles                    #    0.097 GHz                      ( +-  0.43% )
     2,361,489,614      instructions              #    0.64  insn per cycle           ( +-  0.43% )
       477,946,914      branches                  #   12.529 M/sec                    ( +-  0.42% )
        15,056,133      branch-misses             #    3.15% of all branches          ( +-  0.37% )
        33,003,254      L1-icache-load-misses                                         ( +-  0.48% )
        33,003,254      icache.misses             #    0.865 M/sec                    ( +-  0.48% )
     1,691,043,021      icache.hit                #   44.330 M/sec                    ( +-  0.38% )

      38.337958217 seconds time elapsed                                          ( +-  0.13% )

icache miss percentage: 1.952%