User Tools

Site Tools


benchmarks:middlesquare

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
benchmarks:middlesquare [2017/05/07 01:20]
pier4r
benchmarks:middlesquare [2017/05/09 09:03] (current)
pier4r
Line 1: Line 1:
 +<wrap info>~~TALKPAGE~~</wrap>
  
 +====== Calculator benchmark: middle square method seed test ======
 +
 +**The idea**
 +> In September 2013 I started to contribute on wiki4hp, and my first self-chosen task was: mine information from discussion places of HP calculators's community to report links to best/useful discussions and organize these by topics. One of those topic is: a general calculator benchmark for programmable calculators. As September 2013, I think that a lot of calculator benchmarks were discussed by the community, but only two has earned enough popularity: the //[[benchmarks:addloop|calculator add loop]]// and the //[[benchmarks:nqueens|nqueens (where n is 8) problem]]//
 +> Benchmarks are not the ultimate test to measure the absolute computational power of a calculator (or a computer), but they give a good approximation of relative speed between calculators; that is: if a calculator A is faster than a calculator B in a benchmark, for computations that involves similar operations as in the benchmark, we can expect that the calculator A will take less time to finish them.
 +>
 +> Now, if we look at code of the two popular benchmark, we can see that the //Calculator add loop// is really simple: just add one to a previous number, starting from 0, in the smartest way that you can code. The second is more complex but it involves: memory, subtractions and additions, and the "n" is not variable, is fixed to 8. This is a great limitation for new calculators that are really fast. Unfortunately, with a greater n maybe some old calculators can't do anything (since they don't have enough memory), or maybe the community likes the number 8 since the problem is related to chess (that has a chessboard of 8x8). Anyway the benchmark can be extended (increasing the n value by fixed steps), but this can be tried later, hoping that the community will contribute (again).
 +>
 +> So thinking about a benchmark that use at least the four (+,-,*,/) operations, I stumbled upon the middle square method to produce, easily, random numbers((Even if the method has a lot of flaws from the point of view of randomness.)). Since we don't care about a particular benchmark algorithm, but we use the benchmark only to compare devices, it should be ok to test almost all programmable calculators, since it doesn't require much memory and it involves the four operations.
 +>
 +>So, if you agree (else, feel free to fork the idea on this wiki!) let's look the pseudocode.
 +
 +See also [[http://www.hpmuseum.org/forum/thread-8287-post-72772.html#pid72772|here(MoHPC)]]
 +
 +**Criticism**\\
 +[[http://www.hpmuseum.org/cgi-sys/cgiwrap/hpmuseum/archv021.cgi?read=249996|MoHPC]] . Message #6. In short, a benchmark should be helpful to compare calculators performance on the same set of the instructions. While the middle square benchmark is more a programming challenge, because some calculators have more instructions than others or have other data types that allows faster/slower/more precise/etc.. operations.
 +
 +So actually this should be seen as challenge, and only after that as an approximative benchmark.
 +
 +**The Pseudocode** \\
 +As the two popular benchmarks shows, we don't test only the computational power of the devices, 
 +but also the programming ability of the user to "trick" the game.
 +<code>
 +k:= a positive integer
 +n:= 100^k
 +For i:=(n/10) to (n-1) do {
 +  old := i
 +  sequenceIsConvergent := false
 +  limitCyclicSequences := 0 //because we can end in cyclic sequences and we don't want to use a lot of memory!
 +  while ( (NOT sequenceIsConvergent) AND (limitCyclicSequences < n) ) {
 +    new = extract_middle_number from old^2 (see below)
 +    if new == old { 
 +      sequenceIsConvergent = true
 +    }
 +    else { 
 +      old := new
 +      limitCyclicSequences = limitCyclicSequences+1;
 +    }    
 +  }//end while
 +}//end for
 +
 +->Result: the time spent to finish the computation given the starting k value.
 +
 +How does extract_middle_digits work?
 +Given n, that is a power of 10 of the type 10^d, 
 +its square is equal to 10^(d*2) and it has (d*2+1) digits.
 +
 +We consider only the last (d*2) digits. 
 +That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.
 +
 +Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits. 
 +For example if d=4 and the number under process is 1542, 
 +we consider 1542^2 as 02377764 instead of 2377764.
 +
 +Why d=4 ? Because we use as seed, with n=10000, numbers from 1000 to 9999, so numbers having 4 digits.
 +
 +After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64.
 +So the middle number extracted is 3777.
 +</code>
 +
 +The method of extraction for the digits can be anything you want, a standard method involving modulo and divisions or some other method. You can use an iterative subtraction to get the modulo instead of using the calculator's built-in function to compute it. Here we will see the user ability to exploit all the possibilities of the calculator, hoping that even algorithms with "classic" operations will be reported for a comparison.
 +
 +**More details!**
 +<code>
 +Let k=1 , so we have as reference n:= 100^k = 100
 +Do sequences for all the seeds from a:=n/10 to n-1.
 +So the seeds are {a:=10,11,12,...,99}
 +
 +How do we compute a sequence?
 +a:=10
 +seed:=a
 +1. we first square the number
 +10*10 = 100
 +2. 100 squared is 10000, with 4 zeroes, then we consider the previous number with four digits (with a leading zero)
 +0100
 +3. Then we pick the middle digits of the number, as many as the zeroes of n (that is 100), the new seed is
 +0[10]0
 +4. The new seed is the same as the initial seed, then if we reapply the process we end up always with 10. Therefore the sequence is convergent.
 +
 +Next step
 +a:=11
 +seed:=a
 +seed^2 -> 121
 +See it with 4 digits -> 0121
 +Pick middle digits -> 0[12]1 , seed:= 12
 +seed^2 -> 144
 +See it with 4 digits -> 0144
 +Pick middle digits -> 0[14]4, seed:=14
 +seed^2 -> 196
 +See it with 4 digits -> 0196
 +Pick middle digits -> 0[19]6, seed:=19
 +seed^2 -> 361
 +See it with 4 digits -> 0361
 +Pick middle digits -> 0[36]1, seed:=36
 +seed^2 -> 1296
 +See it with 4 digits -> 1296
 +Pick middle digits -> 1[29]6, seed:=29
 +seed^2 -> 841
 +See it with 4 digits -> 0841
 +Pick middle digits -> 0[84]1, seed:=84
 +seed^2 -> 7056
 +See it with 4 digits -> 7056
 +Pick middle digits -> 7[05]6, seed:=5
 +seed^2 -> 25
 +See it with 4 digits -> 0025
 +Pick middle digits -> 0[02]5, seed:=2
 +seed^2 -> 4
 +See it with 4 digits -> 0004
 +Pick middle digits -> 0[00]4, seed:=0
 +seed^2 -> 0
 +See it with 4 digits -> 0000
 +Pick middle digits -> 0[00]0, seed:=0
 +Here the new seed is the same of the previous, so the process stops
 +
 +let's skip some iterations to show another problem
 +a:=24
 +seed:=a
 +seed^2 -> 576
 +See it with 4 digits -> 0576
 +Pick middle digits -> 0[57]6, seed:=57
 +seed^2 -> 3249
 +See it with 4 digits -> 3249
 +Pick middle digits -> 3[24]9, seed:=24
 +
 +At this time you can do an infinite loop that is cyclic,
 +so a way is to limit any sequence to max 100^k iterations.
 +
 +In this case one could consider the sentence "For a generator of n-digit numbers the period can be no longer than 8^n" on wikipedia, but to make it standard, let's limit it to 100^k iterations.
 +</code>
 +
 +**How to report a result** \\
 +<code>
 +A result is composed by the following list
 +- the device used plus the language used, overclock if any, custom firmware if any and so on.
 +- time elapsed for a given k in seconds (see below)
 +- the code used.
 +
 +if the calculator is too slow, or limited, to compute a given n, then report "for n=100^k the computation
 +takes too much time". Conversely, if the calculator is too fast to compute a given n, then report 
 +"for n=100^k the computation takes too little time, I skipped it"
 +</code>
 +
 +===== Physical calculators =====
 +
 +==== Results for n:=100^k, k=1 ====
 +  - HP 50g , 2.15 with HPGCC3 patch, ARM ASM
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58809|3298 on Casio forum]]
 +    * Time: 0.0093 s @75mhz
 +    * Notes: For the code, I obviously needed division. This time, I searched for proper implementations of division on the web and found [[http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0040d/ch05s08s02.html|this]] one in the official infocenter. It is pretty scary how close I got with my own modulo routine, so it was quite optimized after all. Not many processors allow the programmer to write such a compact version (''CMP R7,R8; SUBNES R6,R6,1; BNE WHILEloop'') of ''if(R7!=R8 && (--R6)!=0) goto WHILEloop;''
 +    * Code: ((<code>
 +CODE
 +A=0.W
 +GOSBVL POP#
 +GOSBVL SAVPTR
 +SKUB {
 +*start
 +!ARM
 +STMDB sp! {R4 R5 R6 R7 R8 R9 R10 LP }
 +LDR R2,[R1,#2316]
 +MOV R3,1
 +MOV R10,10
 +*ALOGloop
 +MUL R3,R3,R10
 +SUBS R2,R2,1
 +BNE ALOGloop
 +MUL R2,R3,R3
 +MOV R7,R2
 +BL divmod
 +MOV R4,R9
 +*FORloop
 +MOV R5,R4
 +MOV R6,R4
 +*WHILEloop
 +MUL R,R5,R5
 +MOV R10,R3
 +BL divmod
 +MOV R7,R9
 +MOV R10,R2
 +BL divmod
 +MOV R8,R5
 +MOV R5,R7
 +CMP R7,R8
 +SUBNES R6,R6,1
 +BNE WHILEloop
 +ADD R4,R4,1
 +CMP R4,R2
 +BNE FORloop
 +LDMIA sp! {R4 R5 R6 R7 R8 R9 R10 PC}
 +*divmod
 +MOV R8,R10
 +MOV R9,0
 +CMP R7,R10
 +MOVLO PC,LR
 +*.LSloop
 +MOV R8,R8 LSL #1
 +CMP R8,R7
 +BLO .LSloop
 +*.RSloop
 +MOVHI R8,R8 LSR #1
 +CMP R7,R8
 +SUBHS R7,R7,R8
 +ADC R9,R9,R9
 +CMP R8,R10
 +BHI .RSloop
 +MOV PC,LR
 +!ASM
 +*end
 +}
 +C=RSTK
 +D0=C
 +D1=80100
 +LC(5)end-start
 +MOVEDN
 +LC 80100
 +ARMSAT
 +GOVLNG GETPTRLOOP
 +ENDCODE
 +</code>))
 +  - HP 50g , 2.15 with HPGCC3 patch, SaturnASM
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58800|3298 on Casio forum]]
 +    * Time: 0.0359 s @75mhz
 +    * Notes: just with Saturn ASM, though I have to admit that I used some instructions the real Saturn didn't have - only the emulated Saturn can multiply and divide with a single instruction. So what will ARM ASM bring? \\ Another note on optimization in this code: I let the limitCyclicSequences counter run backwards from n to 0, this saves an instruction or two in the WHILE loop compared to the pseudocode version.
 +    * Code: ((<code>
 +CODE
 +GOSBVL POP#
 +GOSBVL SAVPTR
 +B=A.A
 +C=0.W
 +A=0.W
 +C+1.W
 +A+10.W
 +*ALOGloop
 +C*A.W
 +B-1.W
 +?B#0.A
 +GOYES ALOGloop
 +R0=C.W
 +C*C.W
 +R1=C.W
 +C/A.W
 +R2=C.W
 +*FORloop
 +B=C.W
 +D=C.W
 +*WHILEloop
 +D-1.W
 +?D=0.W
 +GOYES endFORloop
 +C*C.W
 +A=R0.W
 +C/A.WA=R1.W
 +C%A.W
 +A=B.W
 +B=C.W
 +?A#C.W
 +GOYES WHILEloop
 +*endFORloop
 +C=R2.W
 +A=R1.W
 +C+1.W
 +R2=C.W
 +?C#A.W
 +GOYES FORloop
 +GOVLNG GETPTRLOOP
 +ENDCODE
 +</code>))
 +  - HP 50g , 2.15 with HPGCC3 patch, SysRPL
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58798|3298 on Casio forum]]
 +    * Time: 1.1421 s @75mhz
 +    * Notes: Due to the size of the numbers this test produces I can't use normal bints if I want to run it for k=2. As a consequence, my first try was to translate my UserRPL version to SysRPL using HXS numbers. That one got k=1 solved in about 4 seconds with hardcoded constants n and n² for k=1. When I replaced them by a piece of code that was supposed to calculate them from user input, it was for some reason slowed down to about 12 seconds (yes, slower than my previous UserRPL version). I could have done a version that is slightly faster than the UserRPL version by using real numbers as well, but as the highest number calculated for k=1 is still below the limit of 1048576, I can safely use bints for that one.
 +    * Code: ((<code>
 +::
 +  BINT10 BINT100
 +  BINT100 BINT10 DO
 +        DUPINDEX@ BEGIN
 +          DUPDUP #* 5PICK #/ SWAPDROP 4PICK #/ DROP
 +          SWAPOVER #=
 +          ROT #1- DUP4UNROLL #0=
 +          OR
 +        UNTIL 2DROP
 +  LOOP 2DROP
 +;
 +</code>))
 +  - HP 50g , 2.15 HRAST BASIC
 +    * Source: [[http://www.hpmuseum.org/forum/thread-8287-post-72903.html#pid72903|On MoHPC forum]]
 +    * Time: 1.94s 
 +    * Code: ((<code>
 +WATCH INTEGER K=1,N=100^K,Q=SQR N
 +FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O<>R@ O=R NEXT ELSE LEAVE
 +NEXT ? TICKS
 +</code>))
 +  - HP 49g, HRAST BASIC
 +    * Source: [[http://www.hpmuseum.org/forum/thread-8287-post-72903.html#pid72903|On MoHPC forum]]
 +    * Time: 4.81s 
 +    * Code: ((<code>
 +WATCH INTEGER K=1,N=100^K,Q=SQR N
 +FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O=R LEAVE ELSE O=R NEXT
 +NEXT ? TICKS
 +</code>))
 +  - HP 50g , 2.15 with HPGCC3 patch, UserRPL
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58798|3298 on Casio forum]]
 +    * Time: 10.6349 s @75mhz
 +    * Code: ((<code>
 +<<
 +ALOG DUP SQ
 +DUP 10. / OVER 1. - FOR a
 +  DUP a DO
 +        DUP SQ 5. PICK / IP 4. PICK MOD
 +  UNTIL
 +        SWAP OVER ==
 +        ROT 1. - UNROT PICK3 NOT
 +        OR
 +  END DROP2
 +NEXT DROP2
 +>>
 +</code>))
 +  - HP 50g , 2.08, UserRPL 75mhz , #1
 +    * Time: 24.4003 sec
 +    * Code: ((<code>
 +\<<
 +  @k
 +  100 SWAP ^ @n:= 100^k
 +  DUP \v/ @\v/n
 +  0 @old
 +  0 @new
 +  0 @limit
 +  \->
 +  n
 +  nsqr
 +  old
 +  new
 +  limit
 +  \<<
 +    PUSH @saves system flags
 +    -105 SF @approx mode
 +
 +    n 10 /
 +    n 1 -
 +    FOR a
 +      a 'old' STO
 +      1 CF @the sequence is not convergent
 +      0 'limit' STO @to limit cyclic sequences
 +      WHILE
 +        1 FC?
 +        limit n <
 +        AND
 +      REPEAT
 +        @extracts middle digits
 +        old old * @square
 +        nsqr / IP @integer part of old^2/\v/n
 +        n / FP @the previous result / n^2, picks the decimals
 +        n * @restores the decimals as integers
 +        'new' STO
 +        IF
 +          new old ==
 +        THEN
 +          1 SF
 +        ELSE
 +          new 'old' STO
 +          'limit' 1 STO+
 +        END
 +      END
 +    NEXT
 +
 +    POP @restores system flags
 +  \>>
 +\>>
 +
 +executed with TEVAL
 +</code>))
 +
 +==== Results for n:=100^k, k=2 ====
 +  - HP 50g , 2.15 with HPGCC3 patch, ARM ASM
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58809|3298 on Casio forum]]
 +    * Time: 169.9734 s @75mhz
 +    * refer to the same entry for k:=1
 +  - HP 50g , 2.15 with HPGCC3 patch, SaturnASM
 +    * Source: [[http://community.casiocalc.org/topic/7183-help-needed-calculator-benchmark/page__st__40#entry58800|3298 on Casio forum]]
 +    * Time: 1445.8055 s @75mhz
 +    * Code: see the entry for k=1.
 +  - HP 50g , 2.08, UserRPL 75mhz , #1
 +    * Time: estimated 250'000 secs ((Because the complexity of the task for the input k=1 is around $ O(100 \cdot 100=100^2) $ and it takes 24.4 secs. The complexity of the task with the input k=2 is around $ O(100^2 \cdot 100^2 = 100^4) $ that is $ 100^2 $ times the complexity with k=1. So, while the Hp50g is still crunching for a confirmation, the task will take around 250'000 seconds. \\ 
 +\\ 
 +Today i interrupted the computation after almost 48 hours and it is still unfinished as expected. \\
 +**To avoid battery consumption you must use the an usb charger!**))
 +    * Code: The same of the entry "HP 50g , 2.08, UserRPL 75mhz , #1" for the 100^k, k=1 input.
 +
 +==== Results for n:=100^k, k=3 ====
 +
 +==== Results for n:=100^k, k=4 ====
 +
 +===== Emulators on mobile/handheld device OR mobile/handheld devices itself =====
 +Why? See the note ((Because, a part from school, one can use an handheld device to do math, maybe simple math using built-in functions of the framework, as well. Sure, it is not comfortable as a physical calculator unless it is large enough to simulate a calculator keyboard.)) . \\
 +**>>Devices smaller than a tablet of 7 inches, 7 inches included<<**
 +
 +==== Results for n:=100^k, k=1 ====
 +  - Palm treo pro GSM (400mhz, 128Mb ram), [[http://pythonce.sourceforge.net/|pythonCE 2.5-20061219]] #1
 +    * Time: around 3 seconds (doing also other things, it is multitasking)
 +    * <code python>
 +def middleSquareTest(k):
 +  import time;
 +  print time.localtime();
 +  n = 100**k;
 +  nsqrt = n**(1/2.0);
 +  temp = None;
 +  new = None;
 +  for a in range(n/10, n):
 +    old = a;
 +    for j in range(n):
 +      #limit to max 'n' iterations because we don't want unlimited cycles
 +      temp = int(old*old / nsqrt)/float(n) ;
 +      new = (temp-int(temp))*n;
 +      if new == old:
 +        #exit
 +        j = n;
 +      else:
 +        #print old;
 +        old = new;
 +        
 +  print time.localtime();
 +      </code>
 +
 +==== Results for n:=100^k, k=2 ====
 +  - Palm treo pro GSM (400mhz, 128Mb ram), [[http://pythonce.sourceforge.net/|pythonCE 2.5-20061219]] #1
 +    * Time: 27'706 seconds (doing also other things, it is multitasking)
 +    * See entry for k=1
 +
 +==== Results for n:=100^k, k=3 ====
 +
 +==== Results for n:=100^k, k=4 ====
 +
 +===== Target devices of this test =====
 +As Marcel Samek on 11 Sept 2013 wrote on MoHPC's forum, this test is hard for small programmable calculators that can't play with integer of fractional part of a number. So the taget devices of this test are high end calculators with good mathematical framework.
benchmarks/middlesquare.txt · Last modified: 2017/05/09 09:03 by pier4r