Table of Contents

Calculator Benchmark: find primes in a very naive way

The idea Design a computation benchmark that involves the basic operations with little use of memory and can be handled by almost all programmable calculators. Moreover it's design should allow a challenge task even for newer calculators simply by increasing the input value. 1)

The pseudocode

input: n
--
for k:=3 to n do {
  for j:=2 to k-1 do {
    if ( k mod j == 0 ) then {
      j:= k-1 //so we exit from the inner for
    }
  }
}

How to report the results

A result is composed by the following list
- the device used plus the language used, eventual overclock, eventual custom firmware and so on.
- time elapsed for a given n in seconds (see below)
- the code used.

if the calculator is too slow, or limited, to compute a given n, then report "for n the computation
takes too much time". Conversely, if the calculator is too fast to compute a given n, then report 
"for n the computation takes too little time, i skipped it"

Physical calculators

n is 10

  1. HP-71B, native BASIC, ROM version 1BBBB, Saturn @ 640 kHz (default)
    • Source: Thanks to J.H
    • Time: 0.6 s
    • 10 T=TIME 
      20 INPUT N 
      30 FOR K=3 TO N 
      40 FOR J=2 TO K-1 
      50 IF MOD(K,J)=0 THEN J=K-1 
      60 NEXT J 
      70 NEXT K 
      80 DISP TIME-T

n is 100

  1. Hp 50g ARM ASM version 2.15 with HPGCC3 patch
    • Time: 0.0085 s @75mhz
    • Note: Just an improvement in the modulo algorithm that makes it much faster for small numbers (which are checked quite often in this test)
    • Code see n4
    • Time: 0.0121 s
    • Note: Division would be another slow instruction, but the ARM doesn't have one at all. Usually it is implemented in software. When I talked about a hardware division earlier, I really meant the emulated Saturn hardware. Speaking about division emulation, I just built one myself for the ultra-naive primes benchmark.
    • Code 2)
  2. TI-Nspire CX CAS, not overclocked, OS 3.2.3
  3. Hp 50g Saturn ASM
    • Time: 0.0302 s OS version 2.15 with HPGCC3 patch, 75mhz 4)
    • Note: To be precise, it was the part about DIV.A being an instruction. I threw it at the hex editing tools and found out that it is just an assembler macro. So the results I wrote are actually using the slow software division routine. I tracked down the real hardware division and modulo instructions and came up with this piece of code
    • Code - see the note 5)
    • Time: 0.0764 sec OS version 2.15 with HPGCC3 patch, 75mhz 6)
    • Code - see the note 7)
  4. HP Prime (Hdw rev 5106 )
    • Time: 0.047 sec @400mhz
    • EXPORT Bench1(n)
      BEGIN
       FOR K:=3 TO n DO
         FOR J:=2 TO K-1 DO
          IF K MOD J == 0 THEN BREAK; END; 
         END; 
       END;
      END;
    • Time: 0.052 sec @400mhz
    • EXPORT Bench1(n)
      BEGIN
       FOR K:=3 TO n DO
         FOR J:=2 TO K-1 DO
          IF NOT(K MOD J) THEN BREAK; END; 
         END; 
       END;
      END;
  5. Casio fx-CG 10 PRIZM, OS version 01.04.3200, LuaZM 8)
    • Time: 0.297 sec overclocked to 94.3MHz (max overclocking without freezing
    • Time: 0.477 sec default 58Mhz
    • Note: uses the RTC to get the times, should be more accurate than a stopwatch, and it doesn't slow down program execution because it is done outside the loop.
    • local start = zmg.ticks()
      n=100
      for k=3,n do
        for j=2,k-1 do
          if k%j==0 then j=k-1 end
        end
      end
      
      print('n=100: ' .. (zmg.ticks()-start)/128 .. 'sec')
  6. Hp 50g, ROM 2.15, SysRPL
  7. Casio fx-CG 10 PRIZM, OS version 01.04.3200, Casio-BASIC 10)
    • Time: 3.9 sec overclocked to 94.3MHz (max overclocking without freezing
    • Time: 6.5 sec clocked at default 58MHz
    • ?->N
      For 3->K To N
        For 2->J To K-1
         K-J*Int (K/J)=0=>Break
        Next
      Next    
  8. HP 50g, 2.09, userRPL, #1
    • 8.274secs @75mhz
    • Code - see the note 11)
  9. TI-89 Titanium, AMS 3.10, HW4, amspatch, TI-BASIC
    • Time:
      • AUTO: 14s
      • EXACT: 14-15s
      • APPROX: 28s
    • Code 12)
    • Time: AMS 2.08, HW2, amspatch, TI-BASIC
      • AUTO: 17-18s
      • EXACT: 18s
      • APPROX: 30s
    • Code same as before
  10. HP-71B, native BASIC, ROM version 1BBBB, Saturn @ 640 kHz (default)
  11. Casio Algebra FX 2.0 rom 1.05 clock around 24mhz 13)
    • Time: 28 sec
    • ?->N
      For 3->K To N
        For 2->J To K-1
         K-J*Int (K/J)=0=>Break
        Next
      Next    
    • Time: 30 sec
    • ?->N
      For 3->K To N
        For 2->J To K-1
         If K-J*Int (K/J)=0
         Then K-1->J //31 sec, instead with Break are 30
         IfEnd
        Next
      Next    
  12. Casio fx-9750G Plus 14)
    • Time: 40 sec
    • ?->N
      For 3->K To N
        For 2->J To K-1
         K-J*Int (K/J)=0=>Break
        Next
      Next    
    • Time: 43 sec
    • ?->N
      For 3->K To N
        For 2->J To K-1
         If K-J*Int (K/J)=0
         Then K-1->J //or Break, no change in performance
         IfEnd
        Next
      Next    

n is 1'000

  1. Hp 50g ARM ASM version 2.15 with HPGCC3 patch
  2. Casio fx 9860GII-2 (power graphic 2) SH4A by C.Basic ver.1.73beta
    • Time: 0.72s overclock 236Mhz Ftune2 F5 preset integer
    • Time: 5.06s normal 29 Mhz integer
    • Time: 2.6s overclock 236Mhz Ftune2 F5 preset double precision (default)
    • Time: 16.5s normal 29.5MHz double precision (default)
    • Code see n3
  3. TI-Nspire CX CAS, not overclocked , OS 3.2.3
  4. Casio fx 50 SH4A by C.BasicCG ver.0.45alpha
    • Time: 0.87s 192 Mhz Ptune3 F5 preset integer
    • Time: 1.48s 118 Mhz integer
    • Time: 2.9s 192 Mhz Ptune3 F5 preset double precision (default)
    • Time: 5.2s 118 Mhz double precision (default)
    • Code see n3
  5. Ti 89T HW4 AMS 3.10 patched with tiosmod+amspatch GCC4TI
  6. Casio fx 20 SH4A by C.BasicCG ver.0.45alpha
    • Time: 1.37s 118 Mhz integer
    • Time: 4.3s 118 Mhz double precision (default)
    • Code see n3
  7. Hp 50g Saturn ASM
    • Time: 1.4668 s OS version 2.15 with HPGCC3 patch, 75mhz 15)
    • code: see the entry for n=100
    • Time: 4.0665 sec OS version 2.15 with HPGCC3 patch, 75mhz 16)
    • the same as the entry for n=100
  8. HP Prime (Hdw rev 5106 )
    • Time: 2.567 sec @400mhz
    • see the n=100 entry
    • Time: 2.762 sec @400mhz
    • see the n=100 entry
  9. Casio fx 9860GII by C.Basic ver.1.73beta
    • Time: 2.9s SH3 normal overclock 118MHz Ftune F5 preset integer
    • Time: 8s SH3 normal 29.5MHz integer
    • Time: 8.1s SH3 overclock 118MHz Ftune F5 preset double precision (default)
    • Time: 21.7s SH3 normal 29.5MHz double precision (default)
    • Code see n3
  10. Casio fx-CG 10 PRIZM, OS version 01.04.3200, LuaZM 17)
    • Time: 29.773 sec sec overclocked to 94.3MHz (max overclocking without freezing
    • Time: 48.382 sec sec default 58Mhz
    • same as the version for n=100
  11. Hp 50g, ROM 2.15, SysRPL
  12. Casio fx-CG 10 PRIZM, OS version 01.04.3200, Casio-BASIC 18)
    • Time: 237 sec overclocked to 94.3MHz (max overclocking without freezing
    • Time: 386 sec clocked at default 58MHz
    • Same as the entry for n:=100
  13. HP 50g, 2.09, userRPL, #1
    • Time: 545.7 secs @75mhz
    • same of HP 50g, 2.09, userRPL, 75mhz #1 for n:=100
  14. TI-89 Titanium, AMS 3.10, HW4, amspatch, TI-BASIC
    • Time:
      • AUTO: 976-977s
      • EXACT: 975-976s
      • APPROX: 1894-1895s
    • Code: see entry for n=100
    • Time: AMS 2.08, HW2, amspatch, TI-BASIC
      • AUTO: 1063-1064s
      • EXACT: 1062-1063s
      • APPROX: 1814-1815s
    • Code same as before
  15. HP-71B, native BASIC, ROM version 1BBBB, Saturn @ 640 kHz (default)
  16. Casio Fx 5800P

n is 10'000

  1. Hp 50g ARM ASM version 2.15 with HPGCC3 patch
  2. Casio fx-CG 10 PRIZM, OS version 01.04.3200, C PrizmSDK 19)
    • Time: 6.1 sec @94.3 mhz
    • Time: 9.7 sec @58mhz
    • Code - see the note n2
  3. TI-Nspire CX CAS, not overclocked , OS 3.2.3
  4. Ti 89T HW4 AMS 3.10 patched with tiosmod+amspatch GCC4TI
  5. Hp 50g Saturn ASM
    • Time: 107.6708 s OS version 2.15 with HPGCC3 patch, 75mhz 20)
    • code: see the entry for n=100
    • Time: 261.1537 sec OS version 2.15 with HPGCC3 patch, 75mhz 21)
  6. HP Prime (Hdw rev 5106 )
    • Time: 187.12 sec @400mhz
    • see the n=100 entry
    • Time: 202.112 sec 22)
    • see the n=100 entry
  7. HP 50g, 2.09, userRPL, #1
    • Time: 40'292 sec @75mhz
    • same of HP 50g, 2.09, userRPL, 75mhz #1 for n:=100

n is 100'000

  1. Hp 50g ARM ASM version 2.15 with HPGCC3 patch
  2. Casio fx-CG 10 PRIZM, OS version 01.04.3200, C PrizmSDK 23)
    • Time: 455 sec @94.3 mhz
    • Time: 720 sec @58mhz
    • see previous entry for n=10'000

Emulators on mobile/handheld device OR mobile/handheld devices itself smaller than 7'' (7'' included)

n is 100

n is 1'000

  1. Nokia e5-00, Symbian OS v9.3, 600 MHz, ram 256 MB, pys60 2.0 Pys60
    • Time: around 1 sec
    • def naiveprimes(n):
        import time;
        print time.localtime();
        for k in range(3,n+1):
          for j in range(2,k):
            if k % j == 0:
              break;
       
        print time.localtime();
  2. Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
    • Time: around 1 second (doing also other things, it is multitasking)
    • def naiveprimes(n):
        import time;
        print time.localtime();
        for k in range(3,n+1):
          for j in range(2,k):
            if k % j == 0:
              break;
       
        print time.localtime();

n is 10'000

  1. Nokia e5-00, Symbian OS v9.3, 600 MHz, ram 256 MB, pys60 2.0 Pys60
    • Time: around 30 sec
    • See the same entry for n:=1000
  2. Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
    • Time: around 79 seconds (doing also other things, it is multitasking)
    • See the same entry for n:=1000

n is 100'000

  1. Nokia e5-00, Symbian OS v9.3, 600 MHz, ram 256 MB, pys60 2.0 Pys60
    • Time: 3'539 sec
    • See the same entry for n:=1000
  2. Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
    • Time: around 4'421 seconds (doing also other things, it is multitasking)
    • See the same entry for n:=1000

Code complexity comparison between inputs

The code has a temporal complexity of $ O( \frac{n \cdot (n+1)}{2} ) $ , so times between different inputs should be loosely proportional.
Then, with $ 10n $ we have the following proportion $$ \left [ \frac {\left ( \frac {10n\cdot(10n+1)} {2} \right )} {\left ( \frac {n\cdot(n+1)} {2} \right )} \sim \frac {t_{10n}} {t_n} \right ] \rightarrow \left [ \left ( \frac {10n\cdot(10n+1)} {2} \right ) \cdot \left ( \frac {2} {n\cdot(n+1)} \right ) \sim \frac {t_{10n}} {t_n} \right ] \rightarrow \left [ t_{10n} \sim t_n \cdot \left ( \frac {10\cdot(10n+1)} {(n+1))} \right ) \right ] $$ In our case n=100 so we have 99 as a “coefficent”.
Thus for Hp50g:
8 sec * 99 = 792 sec (result of n:=100 times the coefficent) and the real result is 545
For Primz
6.5*99 = 643 sec while the real is 386
3.9*99 = 386 sec while the real is 237

Note: the same order of magnitude!

Notes

Because putting code in references makes the dokuwiki source code hard to read.

  1. n1
// ultranaiveprimes.c: silly algorithm for finding primes
 
#define MIN_AMS 101
#define USE_TI89
#define USE_TI92P
#define USE_V200
#define USE_TI89T
#define NO_CALC_DETECT
#define OPTIMIZE_ROM_CALLS
#define RETURN_VALUE
 
#include <stdint.h>
#include <system.h>
#include <args.h>
#include <estack.h>
#include <intr.h>
#include <timath.h>
 
#define TIMER_START_VAL (100000UL)
#define N (10000)
 
void _main(void) {
    uint16_t j, k;
    short orig_rate = PRG_getRate();
    unsigned short orig_start = PRG_getStart();
    unsigned long val = 0;
 
    // Make the system timer an order of magnitude more precise;
    // NOTE: this code assumes a HW2+ TI-68k, i.e. anything since 1999.
    PRG_setRate(1); // Increment counter at a rate of 2^19/2^9 Hz
    PRG_setStart(0xCE); // Trigger the interrupt every 257 - 0xCE = 51 increments ~ 20.07 Hz.
 
    // The PRG_getStart() above effectively waited for the interrupt to trigger, so we don't need another wait.
    /*OSRegisterTimer(USER_TIMER, 1);
    while (!OSTimerExpired(USER_TIMER));
    OSFreeTimer(USER_TIMER);*/
    OSRegisterTimer(USER_TIMER, TIMER_START_VAL);
 
    // Main loop :)
    for (k = 3; k < N; k++) {
        for (j = 2; j < k - 1; j++) {
            if (k % j == 0) {
                j = k - 1; // so we exit from the inner for
            }
        }
    }
 
    // Retrieve timer value.
    val = TIMER_START_VAL - OSTimerCurVal(USER_TIMER);
    OSFreeTimer(USER_TIMER);
 
    // Push arguments onto the RPN stack: clean arguments up, then create a list.
    while (GetArgType (top_estack) != END_TAG) {
        top_estack = next_expression_index (top_estack);
    }
    top_estack--;
    push_longint(val);
 
    // Restore old system state.
    PRG_setRate(orig_rate);
    PRG_setStart(orig_start);
}
 
Compiler options:
tigcc -v -O3 -Wall -W -mpcrel --optimize-code --cut-ranges --reorder-sections --remove-unused --merge-constants -fmerge-all-constants -Wa,--all-relocs -Wa,-l -fverbose-asm -save-temps -o unprimes ultranaiveprimes.c
  1. n2
#include <display_syscalls.h>
#include <keyboard_syscalls.h>
#include <keyboard.hpp>
#include <color.h>
 
// Getkey routine
const unsigned short* keyboard_register = (unsigned short*)0xA44B0000;
unsigned short lastkey[8];
unsigned short holdkey[8];
 
void keyupdate(void) {
  memcpy(holdkey, lastkey, sizeof(unsigned short)*8);
  memcpy(lastkey, keyboard_register, sizeof(unsigned short)*8);
}
int keydownlast(int basic_keycode) {
  int row, col, word, bit;
  row = basic_keycode%10;
  col = basic_keycode/10-1;
  word = row>>1;
  bit = col + 8*(row&1);
  return (0 != (lastkey[word] & 1<<bit));
}
int keydownhold(int basic_keycode) {
  int row, col, word, bit;
  row = basic_keycode%10;
  col = basic_keycode/10-1;
  word = row>>1;
  bit = col + 8*(row&1);
  return (0 != (holdkey[word] & 1<<bit));
}
 
int main() {
  int n=100; //change n as input
  int key;
  // clear screen
  Bdisp_AllClr_VRAM();
  PrintXY(1,1,"Wait...",0,COLOR_BLACK);
  Bdisp_PutDisp_DD();
 
  for (int k = 3; k < n; ++k) {
    for (int j = 2; j < k-1; ++j) {
      if(k%j==0) {
        j = k-1;
      }
    }
  }
 
  PrintXY(1,1," done",0,COLOR_BLACK);
  Bdisp_PutDisp_DD();
  GetKey(&key);
 
  while(1) {
    GetKey(&key);
  }
 
  return 1;
}
1000->N
For 3->K To N
 For 2->J To K-1
  MOD(K,J)=0=>Break
 Next
Next
  1. n4:
CODE
A=0.W
GOSBVL POP#
GOSBVL SAVPTR
SKUB {
*start
!ARM
STMDB sp! {R4 R5 R6 LP}
LDR R2,[R1,#2316]
MOV R3,3
*outer
MOV R4,2
*inner
MOV R5,R4
*modloop1
MOV R5,R5 LSL #1
CMP R5,R3
BLO modloop1
BEQ outer_end
MOV R6,R3
*modloop2
CMP R6,R5
BEQ outer_end
SUBHS R6,R6,R5
MOV R5,R5 LSR #1
CMP R5,R4
BHS modloop2
ADD R4,R4,1
CMP R4,R3
BLO inner
*outer_end
ADD R3,R3,1
CMP R3,R2
BLS outer
LDMIA sp! {R4 R5 R6 PC}
!ASM
*end
}
C=RSTK
D0=C
D1=80100
LC(5)end-start
MOVEDN
LC 80100
ARMSAT
GOVLNG GETPTRLOOP
ENDCODE
1)
For a nother tough benchmark for calculators released after 1990 see Middle square random number seed test
2)
CODE
A=0.W
GOSBVL POP#
GOSBVL SAVPTR
SKUB {
*start
!ARM
STMDB sp! {R4 R5 R6 LP}
LDR R2,[R1,#2316] ;Saturn A register
MOV R3,3
*outer
MOV R4,2
*inner
MOVS R5,R4 LSL #16
MOVCS R5.R4
MOVS R6,R5 LSL #8
MOVCS R6,R5
MOVS R5,R6 LSL #4
MOVCS R5,R6
MOVS R6,R5 LSL #2
MOVCSS R6,R5
MOVPL R6,R6 LSL #1
MOV R5,R3
*modloop
CMP R5,R6
BEQ outer_end
SUBHS R5,R5,R6
MOV R6,R6 LSR #1
CMP R6,R4
BHS modloop
ADD R4,R4,1
CMP R4,R3
BLO inner
*outer_end
ADD R3,R3,1
CMP R3,R2
BLS outer
LDMIA sp! {R4 R5 R6 PC}
!ASM
*end
}
C=RSTK
D0=C
D1=80100
LC(5)end-start
MOVEDN
LC 80100
ARMSAT
GOVLNG GETPTRLOOP
ENDCODE
3)
local timerStart = timer.getMilliSecCounter()
local n = 100
for k = 3, n do
  for j = 2, k-1 do
    if k%j==0 then break end
  end
end
 
print("n = " .. n, (timer.getMilliSecCounter()-timerStart)/1000 .. ' s.')
5)
CODE
GOSBVL POP#
GOSBVL SAVPTR
A+1.A
B=0.A
B+3.A
*outer
C=0.A
C+2.A
*inner
D=B.A
D%C.A
?D=0.A
GOYES outer_end
C+1.A
?B>C.A
GOYES inner
*outer_end
B+1.A
?A>B.A
GOYES outer
GOVLNG GETPTRLOOP
ENDCODE
7)
CODE
GOSBVL POP#
A+1.A
R0=A.A
A=0.A
A+3.A
*outer
R1=A.A
C=0.A
C+2.A
*inner
R2=C.A
GOSBVL IntDiv
?A=0.A
GOYES outer_end
C=R2.A
C+1.A
A=R1.A
?A>C.A
GOYES inner
*outer_end
A=R1.A
A+1.A
C=R0.A
?C>A
GOYES outer
GOVLNG Loop
ENDCODE
9)
:: 
CK1NOLASTWD 
CK&DISPATCH1 
real 
  :: 
    CLKTICKS 
    SWAP COERCE #1+ THREE DO 
      INDEX@ TWO DO 
        JINDEX@ INDEX@ #/ DROP 
        #0= IT ExitAtLOOP 
      LOOP 
      ATTN? IT ExitAtLOOP 
    LOOP 
    CLKTICKS SWAP bit- HXS>% %8192 %/ 
  ; 
;
11)
 «
   @n as input

   PUSH @saves system flags
   -105 SF @approx mode

   3 SWAP @from 3 to n
   FOR k
     2 k 1 - @fro 2 to k-1
     FOR j
       IF
         k j MOD
         0 == @k mod j == 0
       THEN
         k 'j' STO @to exit from the FOR
       END
     NEXT
   NEXT

   POP
 »
 executed with TEVAL
12)
prime(n)
Prgm
Local k,j
For k,3,n
 For j,2,k-1
  If mod(k,j)=0:Exit
 EndFor
EndFor
EndPrgm
22)
Interesting. Even if Prime has a 400mhz ARM CPU, and the test uses almost no memory (maybe it doesn't fill even the CPU cache), the Palm treo pro @400mhz, with a scripiting language like the one on the Prime (but didn't done by the Palm) is able to do way better. So the match Handheld calculators vs smartphones is not only a matter of pure CPU Mhz. See also the reply on Hpmuseum forum of Paul Dale - 13 Sept 2013, 3:39 a.m about the accuracy of results. Moreover all calculators with arm processors don't have an FPU (they use BCD arithmetic), so it's hard for them doing floating point operations like square roots.