User Tools

Site Tools


benchmarks:middlesquare

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
benchmarks:middlesquare [2017/05/07 01:20]
pier4r
benchmarks:middlesquare [2017/05/09 09:03] (current)
pier4r
Line 1: Line 1:
 +<wrap info>​~~TALKPAGE~~</​wrap>​
  
 +====== Calculator benchmark: middle square method seed test ======
 +
 +**The idea**
 +> In September 2013 I started to contribute on wiki4hp, and my first self-chosen task was: mine information from discussion places of HP calculators'​s community to report links to best/useful discussions and organize these by topics. One of those topic is: a general calculator benchmark for programmable calculators. As September 2013, I think that a lot of calculator benchmarks were discussed by the community, but only two has earned enough popularity: the //​[[benchmarks:​addloop|calculator add loop]]// and the //​[[benchmarks:​nqueens|nqueens (where n is 8) problem]]//​. ​
 +> Benchmarks are not the ultimate test to measure the absolute computational power of a calculator (or a computer), but they give a good approximation of relative speed between calculators;​ that is: if a calculator A is faster than a calculator B in a benchmark, for computations that involves similar operations as in the benchmark, we can expect that the calculator A will take less time to finish them.
 +>
 +> Now, if we look at code of the two popular benchmark, we can see that the //​Calculator add loop// is really simple: just add one to a previous number, starting from 0, in the smartest way that you can code. The second is more complex but it involves: memory, subtractions and additions, and the "​n"​ is not variable, is fixed to 8. This is a great limitation for new calculators that are really fast. Unfortunately,​ with a greater n maybe some old calculators can't do anything (since they don't have enough memory), or maybe the community likes the number 8 since the problem is related to chess (that has a chessboard of 8x8). Anyway the benchmark can be extended (increasing the n value by fixed steps), but this can be tried later, hoping that the community will contribute (again).
 +>
 +> So thinking about a benchmark that use at least the four (+,-,*,/) operations, I stumbled upon the middle square method to produce, easily, random numbers((Even if the method has a lot of flaws from the point of view of randomness.)). Since we don't care about a particular benchmark algorithm, but we use the benchmark only to compare devices, it should be ok to test almost all programmable calculators,​ since it doesn'​t require much memory and it involves the four operations.
 +>
 +>So, if you agree (else, feel free to fork the idea on this wiki!) let's look the pseudocode.
 +
 +See also [[http://​www.hpmuseum.org/​forum/​thread-8287-post-72772.html#​pid72772|here(MoHPC)]]
 +
 +**Criticism**\\
 +[[http://​www.hpmuseum.org/​cgi-sys/​cgiwrap/​hpmuseum/​archv021.cgi?​read=249996|MoHPC]] . Message #6. In short, a benchmark should be helpful to compare calculators performance on the same set of the instructions. While the middle square benchmark is more a programming challenge, because some calculators have more instructions than others or have other data types that allows faster/​slower/​more precise/​etc.. operations.
 +
 +So actually this should be seen as challenge, and only after that as an approximative benchmark.
 +
 +**The Pseudocode** \\
 +As the two popular benchmarks shows, we don't test only the computational power of the devices, ​
 +but also the programming ability of the user to "​trick"​ the game.
 +<​code>​
 +k:= a positive integer
 +n:= 100^k
 +For i:=(n/10) to (n-1) do {
 +  old := i
 +  sequenceIsConvergent := false
 +  limitCyclicSequences := 0 //because we can end in cyclic sequences and we don't want to use a lot of memory!
 +  while ( (NOT sequenceIsConvergent) AND (limitCyclicSequences < n) ) {
 +    new = extract_middle_number from old^2 (see below)
 +    if new == old { 
 +      sequenceIsConvergent = true
 +    }
 +    else { 
 +      old := new
 +      limitCyclicSequences = limitCyclicSequences+1;​
 +    }    ​
 +  }//end while
 +}//end for
 +
 +->​Result:​ the time spent to finish the computation given the starting k value.
 +
 +How does extract_middle_digits work?
 +Given n, that is a power of 10 of the type 10^d, 
 +its square is equal to 10^(d*2) and it has (d*2+1) digits.
 +
 +We consider only the last (d*2) digits. ​
 +That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1.
 +
 +Then we '​consider'​ all the numbers lower than 10^(d*2) with (d*2) digits. ​
 +For example if d=4 and the number under process is 1542, 
 +we consider 1542^2 as 02377764 instead of 2377764.
 +
 +Why d=4 ? Because we use as seed, with n=10000, numbers from 1000 to 9999, so numbers having 4 digits.
 +
 +After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64.
 +So the middle number extracted is 3777.
 +</​code>​
 +
 +The method of extraction for the digits can be anything you want, a standard method involving modulo and divisions or some other method. You can use an iterative subtraction to get the modulo instead of using the calculator'​s built-in function to compute it. Here we will see the user ability to exploit all the possibilities of the calculator, hoping that even algorithms with "​classic"​ operations will be reported for a comparison.
 +
 +**More details!**
 +<​code>​
 +Let k=1 , so we have as reference n:= 100^k = 100
 +Do sequences for all the seeds from a:=n/10 to n-1.
 +So the seeds are {a:​=10,​11,​12,​...,​99}
 +
 +How do we compute a sequence?
 +a:=10
 +seed:=a
 +1. we first square the number
 +10*10 = 100
 +2. 100 squared is 10000, with 4 zeroes, then we consider the previous number with four digits (with a leading zero)
 +0100
 +3. Then we pick the middle digits of the number, as many as the zeroes of n (that is 100), the new seed is
 +0[10]0
 +4. The new seed is the same as the initial seed, then if we reapply the process we end up always with 10. Therefore the sequence is convergent.
 +
 +Next step
 +a:=11
 +seed:=a
 +seed^2 -> 121
 +See it with 4 digits -> 0121
 +Pick middle digits -> 0[12]1 , seed:= 12
 +seed^2 -> 144
 +See it with 4 digits -> 0144
 +Pick middle digits -> 0[14]4, seed:=14
 +seed^2 -> 196
 +See it with 4 digits -> 0196
 +Pick middle digits -> 0[19]6, seed:=19
 +seed^2 -> 361
 +See it with 4 digits -> 0361
 +Pick middle digits -> 0[36]1, seed:=36
 +seed^2 -> 1296
 +See it with 4 digits -> 1296
 +Pick middle digits -> 1[29]6, seed:=29
 +seed^2 -> 841
 +See it with 4 digits -> 0841
 +Pick middle digits -> 0[84]1, seed:=84
 +seed^2 -> 7056
 +See it with 4 digits -> 7056
 +Pick middle digits -> 7[05]6, seed:=5
 +seed^2 -> 25
 +See it with 4 digits -> 0025
 +Pick middle digits -> 0[02]5, seed:=2
 +seed^2 -> 4
 +See it with 4 digits -> 0004
 +Pick middle digits -> 0[00]4, seed:=0
 +seed^2 -> 0
 +See it with 4 digits -> 0000
 +Pick middle digits -> 0[00]0, seed:=0
 +Here the new seed is the same of the previous, so the process stops
 +
 +let's skip some iterations to show another problem
 +a:=24
 +seed:=a
 +seed^2 -> 576
 +See it with 4 digits -> 0576
 +Pick middle digits -> 0[57]6, seed:=57
 +seed^2 -> 3249
 +See it with 4 digits -> 3249
 +Pick middle digits -> 3[24]9, seed:=24
 +
 +At this time you can do an infinite loop that is cyclic,
 +so a way is to limit any sequence to max 100^k iterations.
 +
 +In this case one could consider the sentence "For a generator of n-digit numbers the period can be no longer than 8^n" on wikipedia, but to make it standard, let's limit it to 100^k iterations.
 +</​code>​
 +
 +**How to report a result** \\
 +<​code>​
 +A result is composed by the following list
 +- the device used plus the language used, overclock if any, custom firmware if any and so on.
 +- time elapsed for a given k in seconds (see below)
 +- the code used.
 +
 +if the calculator is too slow, or limited, to compute a given n, then report "for n=100^k the computation
 +takes too much time". Conversely, if the calculator is too fast to compute a given n, then report ​
 +"for n=100^k the computation takes too little time, I skipped it"
 +</​code>​
 +
 +===== Physical calculators =====
 +
 +==== Results for n:=100^k, k=1 ====
 +  - HP 50g , 2.15 with HPGCC3 patch, ARM ASM
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58809|3298 on Casio forum]]
 +    * Time: 0.0093 s @75mhz
 +    * Notes: For the code, I obviously needed division. This time, I searched for proper implementations of division on the web and found [[http://​infocenter.arm.com/​help/​index.jsp?​topic=/​com.arm.doc.dui0040d/​ch05s08s02.html|this]] one in the official infocenter. It is pretty scary how close I got with my own modulo routine, so it was quite optimized after all. Not many processors allow the programmer to write such a compact version (''​CMP R7,R8; SUBNES R6,R6,1; BNE WHILEloop''​) of ''​if(R7!=R8 && (--R6)!=0) goto WHILEloop;''​
 +    * Code: ((<​code>​
 +CODE
 +A=0.W
 +GOSBVL POP#
 +GOSBVL SAVPTR
 +SKUB {
 +*start
 +!ARM
 +STMDB sp! {R4 R5 R6 R7 R8 R9 R10 LP }
 +LDR R2,​[R1,#​2316]
 +MOV R3,1
 +MOV R10,10
 +*ALOGloop
 +MUL R3,R3,R10
 +SUBS R2,R2,1
 +BNE ALOGloop
 +MUL R2,R3,R3
 +MOV R7,R2
 +BL divmod
 +MOV R4,R9
 +*FORloop
 +MOV R5,R4
 +MOV R6,R4
 +*WHILEloop
 +MUL R,R5,R5
 +MOV R10,R3
 +BL divmod
 +MOV R7,R9
 +MOV R10,R2
 +BL divmod
 +MOV R8,R5
 +MOV R5,R7
 +CMP R7,R8
 +SUBNES R6,R6,1
 +BNE WHILEloop
 +ADD R4,R4,1
 +CMP R4,R2
 +BNE FORloop
 +LDMIA sp! {R4 R5 R6 R7 R8 R9 R10 PC}
 +*divmod
 +MOV R8,R10
 +MOV R9,0
 +CMP R7,R10
 +MOVLO PC,LR
 +*.LSloop
 +MOV R8,R8 LSL #1
 +CMP R8,R7
 +BLO .LSloop
 +*.RSloop
 +MOVHI R8,R8 LSR #1
 +CMP R7,R8
 +SUBHS R7,R7,R8
 +ADC R9,R9,R9
 +CMP R8,R10
 +BHI .RSloop
 +MOV PC,LR
 +!ASM
 +*end
 +}
 +C=RSTK
 +D0=C
 +D1=80100
 +LC(5)end-start
 +MOVEDN
 +LC 80100
 +ARMSAT
 +GOVLNG GETPTRLOOP
 +ENDCODE
 +</​code>​))
 +  - HP 50g , 2.15 with HPGCC3 patch, SaturnASM
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58800|3298 on Casio forum]]
 +    * Time: 0.0359 s @75mhz
 +    * Notes: just with Saturn ASM, though I have to admit that I used some instructions the real Saturn didn't have - only the emulated Saturn can multiply and divide with a single instruction. So what will ARM ASM bring? \\ Another note on optimization in this code: I let the limitCyclicSequences counter run backwards from n to 0, this saves an instruction or two in the WHILE loop compared to the pseudocode version.
 +    * Code: ((<​code>​
 +CODE
 +GOSBVL POP#
 +GOSBVL SAVPTR
 +B=A.A
 +C=0.W
 +A=0.W
 +C+1.W
 +A+10.W
 +*ALOGloop
 +C*A.W
 +B-1.W
 +?B#0.A
 +GOYES ALOGloop
 +R0=C.W
 +C*C.W
 +R1=C.W
 +C/A.W
 +R2=C.W
 +*FORloop
 +B=C.W
 +D=C.W
 +*WHILEloop
 +D-1.W
 +?D=0.W
 +GOYES endFORloop
 +C*C.W
 +A=R0.W
 +C/A.WA=R1.W
 +C%A.W
 +A=B.W
 +B=C.W
 +?A#C.W
 +GOYES WHILEloop
 +*endFORloop
 +C=R2.W
 +A=R1.W
 +C+1.W
 +R2=C.W
 +?C#A.W
 +GOYES FORloop
 +GOVLNG GETPTRLOOP
 +ENDCODE
 +</​code>​))
 +  - HP 50g , 2.15 with HPGCC3 patch, SysRPL
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58798|3298 on Casio forum]]
 +    * Time: 1.1421 s @75mhz
 +    * Notes: Due to the size of the numbers this test produces I can't use normal bints if I want to run it for k=2. As a consequence,​ my first try was to translate my UserRPL version to SysRPL using HXS numbers. That one got k=1 solved in about 4 seconds with hardcoded constants n and n² for k=1. When I replaced them by a piece of code that was supposed to calculate them from user input, it was for some reason slowed down to about 12 seconds (yes, slower than my previous UserRPL version). I could have done a version that is slightly faster than the UserRPL version by using real numbers as well, but as the highest number calculated for k=1 is still below the limit of 1048576, I can safely use bints for that one.
 +    * Code: ((<​code>​
 +::
 +  BINT10 BINT100
 +  BINT100 BINT10 DO
 +        DUPINDEX@ BEGIN
 +          DUPDUP #* 5PICK #/ SWAPDROP 4PICK #/ DROP
 +          SWAPOVER #=
 +          ROT #1- DUP4UNROLL #0=
 +          OR
 +        UNTIL 2DROP
 +  LOOP 2DROP
 +;
 +</​code>​))
 +  - HP 50g , 2.15 HRAST BASIC
 +    * Source: [[http://​www.hpmuseum.org/​forum/​thread-8287-post-72903.html#​pid72903|On MoHPC forum]]
 +    * Time: 1.94s 
 +    * Code: ((<​code>​
 +WATCH INTEGER K=1,​N=100^K,​Q=SQR N
 +FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/​Q,​N) IF O<>R@ O=R NEXT ELSE LEAVE
 +NEXT ? TICKS
 +</​code>​))
 +  - HP 49g, HRAST BASIC
 +    * Source: [[http://​www.hpmuseum.org/​forum/​thread-8287-post-72903.html#​pid72903|On MoHPC forum]]
 +    * Time: 4.81s 
 +    * Code: ((<​code>​
 +WATCH INTEGER K=1,​N=100^K,​Q=SQR N
 +FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/​Q,​N) IF O=R LEAVE ELSE O=R NEXT
 +NEXT ? TICKS
 +</​code>​))
 +  - HP 50g , 2.15 with HPGCC3 patch, UserRPL
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58798|3298 on Casio forum]]
 +    * Time: 10.6349 s @75mhz
 +    * Code: ((<​code>​
 +<<
 +ALOG DUP SQ
 +DUP 10. / OVER 1. - FOR a
 +  DUP a DO
 +        DUP SQ 5. PICK / IP 4. PICK MOD
 +  UNTIL
 +        SWAP OVER ==
 +        ROT 1. - UNROT PICK3 NOT
 +        OR
 +  END DROP2
 +NEXT DROP2
 +>>
 +</​code>​))
 +  - HP 50g , 2.08, UserRPL 75mhz , #1
 +    * Time: 24.4003 sec
 +    * Code: ((<​code>​
 +\<<
 +  @k
 +  100 SWAP ^ @n:= 100^k
 +  DUP \v/ @\v/n
 +  0 @old
 +  0 @new
 +  0 @limit
 +  \->
 +  n
 +  nsqr
 +  old
 +  new
 +  limit
 +  \<<
 +    PUSH @saves system flags
 +    -105 SF @approx mode
 +
 +    n 10 /
 +    n 1 -
 +    FOR a
 +      a '​old'​ STO
 +      1 CF @the sequence is not convergent
 +      0 '​limit'​ STO @to limit cyclic sequences
 +      WHILE
 +        1 FC?
 +        limit n <
 +        AND
 +      REPEAT
 +        @extracts middle digits
 +        old old * @square
 +        nsqr / IP @integer part of old^2/\v/n
 +        n / FP @the previous result / n^2, picks the decimals
 +        n * @restores the decimals as integers
 +        '​new'​ STO
 +        IF
 +          new old ==
 +        THEN
 +          1 SF
 +        ELSE
 +          new '​old'​ STO
 +          '​limit'​ 1 STO+
 +        END
 +      END
 +    NEXT
 +
 +    POP @restores system flags
 +  \>>
 +\>>
 +
 +executed with TEVAL
 +</​code>​))
 +
 +==== Results for n:=100^k, k=2 ====
 +  - HP 50g , 2.15 with HPGCC3 patch, ARM ASM
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58809|3298 on Casio forum]]
 +    * Time: 169.9734 s @75mhz
 +    * refer to the same entry for k:=1
 +  - HP 50g , 2.15 with HPGCC3 patch, SaturnASM
 +    * Source: [[http://​community.casiocalc.org/​topic/​7183-help-needed-calculator-benchmark/​page__st__40#​entry58800|3298 on Casio forum]]
 +    * Time: 1445.8055 s @75mhz
 +    * Code: see the entry for k=1.
 +  - HP 50g , 2.08, UserRPL 75mhz , #1
 +    * Time: estimated 250'​000 secs ((Because the complexity of the task for the input k=1 is around $ O(100 \cdot 100=100^2) $ and it takes 24.4 secs. The complexity of the task with the input k=2 is around $ O(100^2 \cdot 100^2 = 100^4) $ that is $ 100^2 $ times the complexity with k=1. So, while the Hp50g is still crunching for a confirmation,​ the task will take around 250'​000 seconds. \\ 
 +\\ 
 +Today i interrupted the computation after almost 48 hours and it is still unfinished as expected. \\
 +**To avoid battery consumption you must use the an usb charger!**))
 +    * Code: The same of the entry "HP 50g , 2.08, UserRPL 75mhz , #1" for the 100^k, k=1 input.
 +
 +==== Results for n:=100^k, k=3 ====
 +
 +==== Results for n:=100^k, k=4 ====
 +
 +===== Emulators on mobile/​handheld device OR mobile/​handheld devices itself =====
 +Why? See the note ((Because, a part from school, one can use an handheld device to do math, maybe simple math using built-in functions of the framework, as well. Sure, it is not comfortable as a physical calculator unless it is large enough to simulate a calculator keyboard.)) . \\
 +**>>​Devices smaller than a tablet of 7 inches, 7 inches included<<​**
 +
 +==== Results for n:=100^k, k=1 ====
 +  - Palm treo pro GSM (400mhz, 128Mb ram), [[http://​pythonce.sourceforge.net/​|pythonCE 2.5-20061219]] #1
 +    * Time: around 3 seconds (doing also other things, it is multitasking)
 +    * <code python>
 +def middleSquareTest(k):​
 +  import time;
 +  print time.localtime();​
 +  n = 100**k;
 +  nsqrt = n**(1/2.0);
 +  temp = None;
 +  new = None;
 +  for a in range(n/10, n):
 +    old = a;
 +    for j in range(n):
 +      #limit to max '​n'​ iterations because we don't want unlimited cycles
 +      temp = int(old*old / nsqrt)/​float(n) ;
 +      new = (temp-int(temp))*n;​
 +      if new == old:
 +        #exit
 +        j = n;
 +      else:
 +        #print old;
 +        old = new;
 +        ​
 +  print time.localtime();​
 +      </​code>​
 +
 +==== Results for n:=100^k, k=2 ====
 +  - Palm treo pro GSM (400mhz, 128Mb ram), [[http://​pythonce.sourceforge.net/​|pythonCE 2.5-20061219]] #1
 +    * Time: 27'706 seconds (doing also other things, it is multitasking)
 +    * See entry for k=1
 +
 +==== Results for n:=100^k, k=3 ====
 +
 +==== Results for n:=100^k, k=4 ====
 +
 +===== Target devices of this test =====
 +As Marcel Samek on 11 Sept 2013 wrote on MoHPC'​s forum, this test is hard for small programmable calculators that can't play with integer of fractional part of a number. So the taget devices of this test are high end calculators with good mathematical framework.
benchmarks/middlesquare.txt · Last modified: 2017/05/09 09:03 by pier4r