Sieve of Eratosthenes ‐ a prime number generator - retrotruestory/M1DEV GitHub Wiki
5/8/2004 by Bill Buzbee
Lost the use of the kitchen table again. Monica finished the costume for Maia's kindergarten production, but accidentally ruined it with a too-hot iron - so she's having to do it all over again. So, I've been poking around with the tool chain. Back in the early 80's, Byte Magazine published a set of small CPU benchmarks. The benchmarks themselves are pretty worthless using today's standards, but I thought it would be fun to compare Magic-1 to some of the early 80's machines. It's not a fair comparison - I'm using vastly faster memory than was typically available then - but what the hell....
Anyway, the most famous of the Byte bencharks was the Sieve of Eratosthenes - a prime number generator. I found a copy and stripped out all of the timing, error-checking and target dependent code. Here's what's left:
#define SIZE 8191 #define M (SIZE+1) #define N 10 #define TRUE 1 #define FALSE 0 int main() { int i,prime,count,k; int iter; static char flags[M]; for (iter=1; iter <= N; iter++) { count = 0; for (i=0; i <= SIZE; i++) { flags[i] = TRUE; } for (i=0; i <= SIZE; i++) { if (flags[i]) { count++; prime = i + i + 3; // remove all multiples of prime for (k=i+prime; k <= SIZE; k+=prime) { flags[k] = FALSE; } } } } //printf("Count: %d, 0x%x\n", count,count); return(count); }
I'm having main "return(count)" to ensure the result winds up in register A - my integer return register, and the benchmark is set up to do 10 iterations - each of which should find 1899 primes.
Given that I already have a C compiler, I thought this would be easy - just compile and go. However, the exercise led me to find and fix several microcode bugs, two C compiler problems and do some enhancements to my hack assembler. However, it works now (at least in simulation - have to wait for kitchen table space to try it on real hardware).
Following is the assembler listing, with embedded source code. I've highlighed the original C code in red. Note that loop code is embedded at the beginning and end of loops, so you'll see some duplication. Also, this is a one-pass assembler which generates the listing as it goes, so forward references show up as "----" in the code listing. You can find the final values in the fixup section, which is generated at the end when forward references are backpatched.
Overall, the code quality isn't too bad considering that I haven't done much tuning and lcc really isn't an optimizing compiler. Until I get a linker up and running, I'm also having lcc inject some startup code, including a placeholder for the interrupt vector. Based on short runs on my simulator, I expect to complete 10 iterations in about 4.7 seconds on M-1 running at 3.68 MHz. Also, for this workload I'm averaging one M-1 instruction every 5 clocks.
In real terms, that means I'll be running somewhere between .5 and .75 million instructions per second - though that metric has little to do with the [in]famous Vax MIPs rating. Eventually, I'll run dhrystone on this beast. Just for kicks, I tried compiling dhrystone, but failed about 90% through the main file. Another C compiler retargeting bug to track down. Anyway, here's sieve:
: | ; Magic-1 assembly file, generated by lcc 4.2 : | _start: 0000 : 83 ---- | br over_ivec 0003 : ---- | defw unhandled_exception ; Interrupt vector 1 0005 : ---- | defw unhandled_exception ; Interrupt vector 2 0007 : ---- | defw unhandled_exception ; Interrupt vector 3 0009 : ---- | defw unhandled_exception ; Interrupt vector 4 000b : ---- | defw unhandled_exception ; Interrupt vector 5 000d : ---- | defw unhandled_exception ; Interrupt vector 6 000f : ---- | defw unhandled_exception ; Interrupt vector 7 0011 : ---- | defw unhandled_exception ; Interrupt vector 8 0013 : ---- | defw unhandled_exception ; Interrupt vector 9 0015 : ---- | defw unhandled_exception ; Interrupt vector a 0017 : ---- | defw unhandled_exception ; Interrupt vector b 0019 : ---- | defw unhandled_exception ; Interrupt vector c 001b : ---- | defw unhandled_exception ; Interrupt vector d 001d : ---- | defw unhandled_exception ; Interrupt vector e 001f : ---- | defw unhandled_exception ; Interrupt vector f : | unhandled_exception: 0021 : 00 | halt : | over_ivec: 0022 : 7c 8000 | ld.16 a,0x8000 0025 : cb | copy sp,a 0026 : 7c 4000 | ld.16 a,0x4000 0029 : b6 | copy dp,a 002a : 80 ---- | call _main 002d : 00 | halt : | bss : | L2: : | defs 8192 : | global _main : | cseg : | _main: 002e : e5 f4 | enter 12 : | ;main() { : | ; for (iter=1; iter <= N; iter++) { 0030 : 7a 01 | ld.16 a,1 0032 : d9 04 | st.16 -10+14(sp),a : | L3: : | ; count = 0; 0034 : 7a 00 | ld.16 a,0 0036 : d9 06 | st.16 -8+14(sp),a : | ; for (i=0; i <= SIZE; i++) { 0038 : 7a 00 | ld.16 a,0 003a : d9 0a | st.16 -4+14(sp),a : | L7: : | ; flags[i] = TRUE; 003c : 81 0a | ld.16 a,-4+14(sp) 003e : 74 002e | lea b,L2(dp) 0041 : 3f | add.16 a,b 0042 : 79 01 | ld.8 b,1 0044 : d6 00 | st.8 0(a),b : | ; } : | L8: : | ; for (i=0; i <= SIZE; i++) { 0046 : 81 0a | ld.16 a,-4+14(sp) 0048 : 3d 01 | add.16 a,1 004a : d9 0a | st.16 -4+14(sp),a 004c : 81 0a | ld.16 a,-4+14(sp) 004e : bc 1fff ea | cmpb.le.16 a,8191,L7 : | ; for (i=0; i <= SIZE; i++) { 0052 : 7a 00 | ld.16 a,0 0054 : d9 0a | st.16 -4+14(sp),a : | L11: : | ; if (flags[i]) { 0056 : 81 0a | ld.16 a,-4+14(sp) 0058 : 74 002e | lea b,L2(dp) 005b : 3f | add.16 a,b 005c : 12 00 | ld.8 a,0(a) 005e : 95 -- | cmpb.eq.8 a,0,L15 : | ; count++; 0060 : 81 06 | ld.16 a,-8+14(sp) 0062 : 3d 01 | add.16 a,1 0064 : d9 06 | st.16 -8+14(sp),a : | ; prime = i + i + 3; 0066 : 81 0a | ld.16 a,-4+14(sp) 0068 : d9 02 | st.16 -12+14(sp),a 006a : 85 02 | ld.16 b,-12+14(sp) 006c : 3f | add.16 a,b 006d : 3d 03 | add.16 a,3 006f : d9 08 | st.16 -6+14(sp),a : | ; for (k=i+prime; k <= SIZE; k+=prime) { 0071 : 81 0a | ld.16 a,-4+14(sp) 0073 : 39 08 | add.16 a,-6+14(sp) 0075 : d9 0c | st.16 -2+14(sp),a 0077 : 83 ---- | br L20 : | L17: : | ; flags[k] = FALSE; 007a : 81 0c | ld.16 a,-2+14(sp) 007c : 74 002e | lea b,L2(dp) 007f : 3f | add.16 a,b 0080 : 79 00 | ld.8 b,0 0082 : d6 00 | st.8 0(a),b : | ; } : | L18: : | ; for (k=i+prime; k <= SIZE; k+=prime) { 0084 : 81 0c | ld.16 a,-2+14(sp) 0086 : 39 08 | add.16 a,-6+14(sp) 0088 : d9 0c | st.16 -2+14(sp),a : | L20: 008a : 81 0c | ld.16 a,-2+14(sp) 008c : bc 1fff ea | cmpb.le.16 a,8191,L17 : | ; } : | L15: : | ; } : | L12: : | ; for (i=0; i <= SIZE; i++) { 0090 : 81 0a | ld.16 a,-4+14(sp) 0092 : 3d 01 | add.16 a,1 0094 : d9 0a | st.16 -4+14(sp),a 0096 : 81 0a | ld.16 a,-4+14(sp) 0098 : bc 1fff ba | cmpb.le.16 a,8191,L11 : | ; } : | L4: : | ; for (iter=1; iter <= N; iter++) { 009c : 81 04 | ld.16 a,-10+14(sp) 009e : 3d 01 | add.16 a,1 00a0 : d9 04 | st.16 -10+14(sp),a 00a2 : 81 04 | ld.16 a,-10+14(sp) 00a4 : bd 0a 8d | cmpb.le.16 a,10,L3 : | ; return(count); 00a7 : 81 06 | ld.16 a,-8+14(sp) : | L1: 00a9 : 0d | pop sp 00aa : 0b | pop pc : | end =========================== Fixups ================================= Fixup applied, 2-byte store of 0x0010 to 0x0078 Fixup applied, 1-byte store of 0x0030 to 0x005f Fixup applied, 2-byte store of 0x0001 to 0x002b Fixup applied, 2-byte store of 0x0021 to 0x001f Fixup applied, 2-byte store of 0x0021 to 0x001d Fixup applied, 2-byte store of 0x0021 to 0x001b Fixup applied, 2-byte store of 0x0021 to 0x0019 Fixup applied, 2-byte store of 0x0021 to 0x0017 Fixup applied, 2-byte store of 0x0021 to 0x0015 Fixup applied, 2-byte store of 0x0021 to 0x0013 Fixup applied, 2-byte store of 0x0021 to 0x0011 Fixup applied, 2-byte store of 0x0021 to 0x000f Fixup applied, 2-byte store of 0x0021 to 0x000d Fixup applied, 2-byte store of 0x0021 to 0x000b Fixup applied, 2-byte store of 0x0021 to 0x0009 Fixup applied, 2-byte store of 0x0021 to 0x0007 Fixup applied, 2-byte store of 0x0021 to 0x0005 Fixup applied, 2-byte store of 0x0021 to 0x0003 Fixup applied, 2-byte store of 0x001f to 0x0001 =========================== Symbols ================================= L1 -> 0x00a9 L4 -> 0x009c L12 -> 0x0090 L18 -> 0x0084 L17 -> 0x007a L20 -> 0x008a L15 -> 0x0090 L11 -> 0x0056 L8 -> 0x0046 L7 -> 0x003c L3 -> 0x0034 L2 -> 0x002e _main -> 0x002e unhandled_exception -> 0x0021 over_ivec -> 0x0022 _start -> 0x0000