Benchmarks - marinus-lab/z88dk GitHub Wiki
Benchmark Strategy
In all cases a simple compile is done to verify the programs generate correct results. Then the programs are compiled for a minimal target +test
that eliminates as much unnecessary code as possible; this includes elimination of stdio and as many device drivers as possible. Total program size is recorded (this includes the CODE
, DATA
and BSS
sections but does not include the stack) and the execution time is measured by ticks. z88dk-ticks
is a command line z80 emulator that comes with z88dk and can measure execution time of program fragments exactly.
Compilers
To put the current compilers and libraries for z88dk into context, the benchmarks are conducted against common alternatives.
HITECH-C CPM v3.09
Hitech-C (CP/M-80) v 3.09 Hitech's last CPM C compiler. One of the most capable native C compilers for CP/M. Runs under CP/M 2.2 and implements a large subset of C89. This compiler represents the best z80 native code generator. Hitech made this available for free many years ago.
Efforts to maintain the v3.09 compiler are being made. Some benchmarks were run using Release 15, containing many patches including back ported from v7.80pl2 amongst many others.
HITECH-C MSDOS v7.80 Patch Level 2
The last z80 compiler from Hi-Tech, cross compiles z80 and z180 code from MSDOS (and Linux). Supports banking into the full z180 address space. Seems to be near complete compliance with C89. Has both an integrated development environment HPDZ
and command line ZC
options.Compiler kindly provided by @artrag.
IAR Z80 V4.06A
IAR's last z80 compiler running under windows. Although it's not currently listed on their webpage, they are willing to sell it to anyone who has the cash.
SDCC
sdcc 4.2.0 r13131 (Linux)
sdcc is a current open source C cross compiler targeting several small CPUs including the z80. Its primary feature is that it supports a large subset of modern C standards (C89, C99, C11, C23).In these tests the new register based calling convention __sdcccall(1)
is enabled.
Z88DK/SCCZ80_CLASSIC
(Nightly build 28 April 2021) z88dk's native C compiler sccz80 using the classic C library in z88dk. sccz80 is a derivative of small C with most small C limitations eliminated. Its primary feature is a comprehensive (classic) C library written in assembly language.
Z88DK/SDCC_CLASSIC
(Nightly build 10 March 2022) sdcc 4.2.0 r13131 is used to translate C code with z88dk supplying its (classic) C library and startup code for targets.In these tests the traditional calling conventions __sdcccall(0)
, with __z88dk_callee
and __z88dk_fastcall
, are enabled.
Z88DK/SCCZ80_NEW
(Nightly build 28 April 2021) z88dk's native C compiler sccz80 using the new C library in z88dk. sccz80 is a derivative of small C with most small C limitations eliminated. Its primary feature is a comprehensive (new) C library written in assembly language.
Z88DK/SDCC_NEW
(Nightly build 10 March 2022) sdcc 4.2.0 r13131 is used to translate C code with z88dk supplying its (new) C library and startup code for targets.In these tests the traditional calling conventions __sdcccall(0)
, with __z88dk_callee
and __z88dk_fastcall
, are enabled.
Binary-Trees
The purpose of this benchmark is to verify that malloc/free function trouble free and to measure the speed of malloc/free with allocations done in the context of constructing binary trees.
The work is to create binary trees - composed only of tree nodes all the way down-to depth 0, before any of those nodes are GC'd - using at-minimum the number of allocations of Jeremy Zerfas's C program. Don't optimize away the work.
SIZE | Z80 Cycles | Wall Clock @4Mhz | |
---|---|---|---|
Hitech-C CPM v3.09 | 5821 | 298,416,076 | 74.60 sec |
Hitech-C Z80 v7.80 | 4247 | 240,336,355 | 60.09 sec |
IAR Z80 V4.06A | 4525 | 7,358,336,547 | 30 min 40 sec |
SDCC | 7158 | 188,126,191 | 47.03 sec |
Z88DK/SCCZ80_CLASSIC | 3800 | 145,563,150 | 36.39 sec |
Z88DK/SCCZ80_NEW | 2711 | 6,582,763,903 | 27 min 25 sec |
Z88DK/SDCC_CLASSIC | 3536 | 150,118,736 | 37.52 sec |
Z88DK/SDCC_NEW | 2689 | 6,576,349,618 | 27 min 24 sec |
Notes:
- NEW library Issue #113 Library optimization for fast realloc causes slow free block search when a thousand blocks are allocated in this benchmark.
- IAR is likely implementing a heap similar to z88dk's new c library where an emphasis is placed on the speed of realloc().
Dhrystone 2.1
Dhrystone was a common synthetic benchmark for measuring the integer performance of compilers in the 1980s until more modern benchmarks replaced it. It attempts to simulate typical programs by executing a set of statements statistically determined from common programs in the wild.
The benchmark package is available for download.
SIZE | Z80 Cycles | Wall Clock @4Mhz | DHRYSTONES/S | DMIPS | |
---|---|---|---|---|---|
Hitech-C CPM v3.09 | 8988 | 356,235,065 | 89.06 sec | 224.57 | 0.1278 |
Hitech-C Z80 v7.80 | 7002 | 280,100,135 | 70.02 sec | 285.61 | 0.1625 |
IAR Z80 V4.06A | 7371 | 306,860,580 | 76.72 sec | 260.70 | 0.1484 |
SDCC | 6825 | 225,522,684 | 56.38 sec | 354.73 | 0.2019 |
Z88DK/SDCC_CLASSIC | 7882 | 251,880,052 | 62.97 sec | 317.61 | 0.1808 |
Z88DK/SDCC_NEW | 7072 | 254,720,052 | 63.68 sec | 314.07 | 0.1787 |
Notes:
- Hitech-C Z80 v7.80 must be compiled with global optimizer set to two; higher causes the program to hang.
- Dhrystone 2.1 is deprecated because optimizing compilers can eliminate redundant statements that were intended to add to execution time. However many z80-era compilers ran this benchmark so it is also available in the z88dk repository.
Fannkuch
The fannkuch benchmark is defined by programs in Performing Lisp Analysis of the FANNKUCH Benchmark, Kenneth R. Anderson and Duane Rettig. FANNKUCH is an abbreviation for the German word Pfannkuchen, or pancakes, in analogy to flipping pancakes. The conjecture is that the maximum count is approximated by n*log(n) when n goes to infinity.
SIZE | Z80 Cycles | Wall Clock @4Mhz | |
---|---|---|---|
Hitech-C CPM v3.09 | 2771 | 56,614,856 | 14.15 sec |
Hitech-C Z80 v7.80 | 868 | 51,982,515 | 12.99 sec |
IAR Z80 V4.06A | 1347 | 56,708,022 | 14.18 sec |
SDCC | 962 | 57,325,388 | 14.33 sec |
Z88DK/SCCZ80_CLASSIC | 1763 | 75,381,296 | 18.84 sec |
Z88DK/SCCZ80_NEW | 957 | 77,386,481 | 19.35 sec |
Z88DK/SDCC_CLASSIC | 1304 | 59,756,269 | 14.94 sec |
Z88DK/SDCC_NEW | 1070 | 56,090,095 | 14.02 sec |
Fasta
The program should:
- generate DNA sequences, by copying from a given sequence.
- generate DNA sequences, by weighted random selection from 2 alphabets.
- convert the expected probability of selecting each nucleotide into cumulative probabilities.
- match a random number against those cumulative probabilities to select each nucleotide (use linear search or binary search).
- use this naïve linear congruential generator to calculate a random number each time a nucleotide needs to be selected (don't cache the random number sequence).
SIZE | Z80 Cycles | Wall Clock @4Mhz | |
---|---|---|---|
Hitech-C CPM v3.09 | 5638 | 189,901,647 | 47.47 sec |
Hitech-C Z80 v7.80 | 4121 | DISQ | |
IAR Z80 V4.06A | 6041 | 223,805,149 | 55.95 sec |
SDCC | 5835 | 373,202,979 | 93.30 sec |
Z88DK/SCCZ80_CLASSIC | 3291 | 243,021,012 | 60.76 sec |
Z88DK/SCCZ80_CLASSIC/MATH32 | 3978 | 136,057,474 | 34.01 sec |
Z88DK/SCCZ80_NEW | 2998 | 204,281,085 | 51.07 sec |
Z88DK/SCCZ80_NEW/MATH32 | 3729 | 136,057,141 | 34.01 sec |
Z88DK/SDCC_CLASSIC | 3583 | 248,331,410 | 62.08 sec |
Z88DK/SDCC_NEW | 3171 | 245,055,005 | 61.26 sec |
Notes:
- Hitech-C Z80 v7.80pl2 produces incorrect results on all optimization levels.
- SDCC's performance is hurt by a floating point package implemented in C.
- Z88DK/SCCZ80_CLASSIC uses the
genmath
float library while the other Z88DK compiles usemath48
. - Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
- Z88DK/SCCZ80/MATH32 uses the
math32
32-bit IEEE-754 floating point package.
n-Body
Model the orbits of Jovian planets, using the same simple symplectic-integrator. Thanks to Mark C. Lewis for suggesting this task.
Useful symplectic integrators are freely available, for example the HNBody Symplectic Integration Package.
SIZE | Z80 Cycles | Wall Clock @4Mhz | |
---|---|---|---|
Hitech-C CPM v3.09 | 5633 | 1,594,771,948 | 6 min 38 sec |
Hitech-C Z80 v7.80 | 3736 | 1,600,543,903 | 6 min 40 sec |
IAR Z80 V4.06A | 4084 | 2,331,516,019 | 9 min 43 sec |
SDCC | 7141 | 3,163,137,393 | 13 min 11 sec |
Z88DK/SCCZ80_CLASSIC | 4493 | 3,658,052,111 | 15 min 14 sec |
Z88DK/SCCZ80_NEW | 3363 | 2,376,486,525 | 9 min 53 sec |
Z88DK/SCCZ80_NEW/MATH32 | 5149 | 754,266,702 | 3 min 8 sec |
Z88DK/SCCZ80_NEW/MATH16 | 3227 | 0,384,230,543 | 1 min 36 sec |
Z88DK/SDCC_CLASSIC | 5246 | 2,253,709,929 | 9 min 23 sec |
Z88DK/SDCC_NEW | 4332 | 2,247,889,896 | 9 min 22 sec |
Notes:
- SDCC's performance is hurt by a floating point package implemented in C.
- Z88DK/SCCZ80_CLASSIC uses the
genmath
float library while the other Z88DK compiles usemath48
. - Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
- Z88DK/SCCZ80_NEW/MATH32 uses the
math32
32-bit IEEE-754 floating point package. - Z88DK/SCCZ80_NEW/MATH16 uses the
math16
16-bit IEEE-754 floating point package.
Pi
Pi.c computes pi to 800 decimal places. It is based on an implementation found at crypto.stanford.edu.
Pi.c measures 32-bit integer math performance. The computation can make good use of ldiv() but not all compilers supply this function so the program is run with and without ldiv() for comparison purposes.
Z88DK's new C library has a fast integer math option so the table below shows results for it as well as the normal build using the small integer math option.
The first set of numbers are without the use of ldiv() and the second with using ldiv().
Size | Z80 Cycles | Wall Clock @4Mhz | Size | Z80 Cycles | Wall Clock @4MHz | |
---|---|---|---|---|---|---|
Hitech-C CPM v3.09 | 8342 | 5,532,347,800 | 23 min 03 sec | |||
Hitech-C Z80 v7.80 | 6593 | 5,528,979,464 | 23 min 02 sec | 6728 | 5,892,567,264 | 24 min 33 sec |
IAR Z80 V4.06A | 6789 | 8,762,223,085 | 36 min 31 sec | 7006 | 8,799,503,282 | 36 min 40 sec |
SDCC | 6591 | 6,649,404,381 | 27 min 42 sec | |||
Z88DK/SCCZ80_CLASSIC | 6508 | 4,012,440,830 | 16 min 43 sec | |||
Z88DK/SCCZ80_NEW | 6269 | 4,012,440,735 | 16 min 43 sec | 6182 | 2,576,381,983 | 10 min 44 sec |
Z88DK/SCCZ80_NEW_FAST | 8999 | 1,696,878,309 | 7 min 04 sec | 9131 | 1,301,832,933 | 5 min 25 sec |
Z88DK/SDCC_CLASSIC | 6600 | 4,169,137,078 | 17 min 22 sec | |||
Z88DK/SDCC_NEW | 6246 | 4,067,517,071 | 16 min 57 sec | 6388 | 2,609,489,119 | 10 min 52 sec |
Z88DK/SDCC_NEW_FAST | 8997 | 1,756,864,232 | 7 min 19 sec | 9097 | 1,339,849,656 | 5 min 35 sec |
Notes:
- Although HITECH-C Z80 v7.80 supplies ldiv(), it still performs two divisions to get quotient and remainder.
- SDCC's performance is hurt by having its 32-bit math routines implemented in C.
- Z88DK's small integer math library demotes long multiplies to integer where possible.
- Z88DK's fast integer math library is able to reduce most 32-bit divides to 16-bit divides. The loop unrolling option is not enabled.
Sieve of Eratosthenes (Prime Numbers)
Sieve.c finds all the prime numbers in [2,7999]. The algorithm is known as the Sieve of Eratosthenes.
This is a popular benchmark for small machine compilers because just about every compiler is able to compile it. As a benchmarking tool it's mainly measuring loop overhead.
SIZE | Z80 Cycles | Wall Clock @4Mhz | |
---|---|---|---|
Hitech-C CPM v3.09 | 10297 | 7,916,099 | 1.9790 sec |
Hitech-C Z80 v7.80 | 8472 | 3,885,436 | 0.9713 sec |
IAR Z80 V4.06A | 8772 | 3,714,152 | 0.9285 sec |
SDCC | 8278 | 4,219,481 | 1.0548 sec |
Z88DK/SCCZ80_CLASSIC | 8589 | 4,957,733 | 1.2394 sec |
Z88DK/SCCZ80_NEW | 8362 | 4,957,733 | 1.2394 sec |
Z88DK/SDCC_CLASSIC | 8558 | 4,510,806 | 1.1277 sec |
Z88DK/SDCC_NEW | 8315 | 3,665,494 | 0.9163 sec |
Notes:
- Z88DK/SCCZ80 tries to generate small code by turning primitive compiler operations into subroutine calls. The additional call/ret overhead of these subroutine calls is significant in the small loop code and this is what hurts its performance in comparison to other compilers.
Whetstone 1.2
Whetstone is a synthetic floating point benchmark. The benchmark package is available for download.
Floating point performance depends strongly on the number of mantissa bits in the float type.
Float Size | Mantissa | Bytes | Z80 Cycles | Wall Clock @4MHz | KWIPS | |
---|---|---|---|---|---|---|
Hitech-C CPM v3.09 | 32 | 24 | 9076 | 646,520,995 | 161.6302 sec | 6.1870 |
Hitech-C Z80 v7.80 | 32 | 24 | 6919 | 614,748,605 | 153.6871 sec | 6.5067 |
IAR Z80 V4.06A | 32 | 24 | 6524 | 732,360,277 | 183.0901 sec | 5.4618 |
SDCC | 32 | 24 | 10935 | 1,491,668,242 | 372.9170 sec | 2.6816 |
Z88DK/SCCZ80_CLASSIC | 48 | 40 | 6359 | 1,283,271,893 | 320.8179 sec | 3.1170 |
Z88DK/SCCZ80_NEW | 48 | 40 | 5362 | 972,899,568 | 243.2248 sec | 4.1114 |
Z88DK/SCCZ80/MATH32 | 32 | 24 | 8921 | 567,396,426 | 141.8491 sec | 7.0497 |
Z88DK/SDCC_CLASSIC | 32(48) | 24(40) | 7588 | 920,781,972 | 230.1954 sec | 4.3441 |
Z88DK/SDCC_NEW | 32(48) | 24(40) | 6221 | 914,412,771 | 228.6031 sec | 4.3743 |
Z88DK/SDCC/MATH32 | 32 | 24 | 10113 | 576,187,434 | 144.0468 sec | 6.9421 |
Notes:
- Hitech-C CPM v3.09 produces some results with some error in the third decimal position.
- SDCC's performance is hurt by a floating point package implemented in C.
- Z88DK/SCCZ80_CLASSIC uses the
genmath
float library while the other Z88DK compiles usemath48
. - Z88DK/SCCZ80/MATH32 uses the
math32
32-bit IEEE-754 floating point package. - Z88DK/SDCC uses a 48-bit float internally but this is converted to 32-bit at the compiler-library interface since sdcc only understands a 32-bit float type.
- Z88DK/SDCC/MATH32 uses the
math32
32-bit IEEE-754 floating point package.