benchmarks:middlesquare

**The idea**

In September 2013 I started to contribute on wiki4hp, and my first self-chosen task was: mine information from discussion places of HP calculators's community to report links to best/useful discussions and organize these by topics. One of those topic is: a general calculator benchmark for programmable calculators. As September 2013, I think that a lot of calculator benchmarks were discussed by the community, but only two has earned enough popularity: thecalculator add loopand thenqueens (where n is 8) problem.

Benchmarks are not the ultimate test to measure the absolute computational power of a calculator (or a computer), but they give a good approximation of relative speed between calculators; that is: if a calculator A is faster than a calculator B in a benchmark, for computations that involves similar operations as in the benchmark, we can expect that the calculator A will take less time to finish them.

Now, if we look at code of the two popular benchmark, we can see that theCalculator add loopis really simple: just add one to a previous number, starting from 0, in the smartest way that you can code. The second is more complex but it involves: memory, subtractions and additions, and the “n” is not variable, is fixed to 8. This is a great limitation for new calculators that are really fast. Unfortunately, with a greater n maybe some old calculators can't do anything (since they don't have enough memory), or maybe the community likes the number 8 since the problem is related to chess (that has a chessboard of 8×8). Anyway the benchmark can be extended (increasing the n value by fixed steps), but this can be tried later, hoping that the community will contribute (again).

So thinking about a benchmark that use at least the four (+,-,*,/) operations, I stumbled upon the middle square method to produce, easily, random numbers^{1)}. Since we don't care about a particular benchmark algorithm, but we use the benchmark only to compare devices, it should be ok to test almost all programmable calculators, since it doesn't require much memory and it involves the four operations.

So, if you agree (else, feel free to fork the idea on this wiki!) let's look the pseudocode.

See also here(MoHPC)

**Criticism**

MoHPC . Message #6. In short, a benchmark should be helpful to compare calculators performance on the same set of the instructions. While the middle square benchmark is more a programming challenge, because some calculators have more instructions than others or have other data types that allows faster/slower/more precise/etc.. operations.

So actually this should be seen as challenge, and only after that as an approximative benchmark.

**The Pseudocode**

As the two popular benchmarks shows, we don't test only the computational power of the devices,
but also the programming ability of the user to “trick” the game.

k:= a positive integer n:= 100^k For i:=(n/10) to (n-1) do { old := i sequenceIsConvergent := false limitCyclicSequences := 0 //because we can end in cyclic sequences and we don't want to use a lot of memory! while ( (NOT sequenceIsConvergent) AND (limitCyclicSequences < n) ) { new = extract_middle_number from old^2 (see below) if new == old { sequenceIsConvergent = true } else { old := new limitCyclicSequences = limitCyclicSequences+1; } }//end while }//end for ->Result: the time spent to finish the computation given the starting k value. How does extract_middle_digits work? Given n, that is a power of 10 of the type 10^d, its square is equal to 10^(d*2) and it has (d*2+1) digits. We consider only the last (d*2) digits. That is: if n=10000, 10000*10000 = 100000000 we consider only 00000000 without the first 1. Then we 'consider' all the numbers lower than 10^(d*2) with (d*2) digits. For example if d=4 and the number under process is 1542, we consider 1542^2 as 02377764 instead of 2377764. Why d=4 ? Because we use as seed, with n=10000, numbers from 1000 to 9999, so numbers having 4 digits. After that we pick the d=4 middle digits, from 02377764 we pick 02[3777]64. So the middle number extracted is 3777.

The method of extraction for the digits can be anything you want, a standard method involving modulo and divisions or some other method. You can use an iterative subtraction to get the modulo instead of using the calculator's built-in function to compute it. Here we will see the user ability to exploit all the possibilities of the calculator, hoping that even algorithms with “classic” operations will be reported for a comparison.

**More details!**

Let k=1 , so we have as reference n:= 100^k = 100 Do sequences for all the seeds from a:=n/10 to n-1. So the seeds are {a:=10,11,12,...,99} How do we compute a sequence? a:=10 seed:=a 1. we first square the number 10*10 = 100 2. 100 squared is 10000, with 4 zeroes, then we consider the previous number with four digits (with a leading zero) 0100 3. Then we pick the middle digits of the number, as many as the zeroes of n (that is 100), the new seed is 0[10]0 4. The new seed is the same as the initial seed, then if we reapply the process we end up always with 10. Therefore the sequence is convergent. Next step a:=11 seed:=a seed^2 -> 121 See it with 4 digits -> 0121 Pick middle digits -> 0[12]1 , seed:= 12 seed^2 -> 144 See it with 4 digits -> 0144 Pick middle digits -> 0[14]4, seed:=14 seed^2 -> 196 See it with 4 digits -> 0196 Pick middle digits -> 0[19]6, seed:=19 seed^2 -> 361 See it with 4 digits -> 0361 Pick middle digits -> 0[36]1, seed:=36 seed^2 -> 1296 See it with 4 digits -> 1296 Pick middle digits -> 1[29]6, seed:=29 seed^2 -> 841 See it with 4 digits -> 0841 Pick middle digits -> 0[84]1, seed:=84 seed^2 -> 7056 See it with 4 digits -> 7056 Pick middle digits -> 7[05]6, seed:=5 seed^2 -> 25 See it with 4 digits -> 0025 Pick middle digits -> 0[02]5, seed:=2 seed^2 -> 4 See it with 4 digits -> 0004 Pick middle digits -> 0[00]4, seed:=0 seed^2 -> 0 See it with 4 digits -> 0000 Pick middle digits -> 0[00]0, seed:=0 Here the new seed is the same of the previous, so the process stops let's skip some iterations to show another problem a:=24 seed:=a seed^2 -> 576 See it with 4 digits -> 0576 Pick middle digits -> 0[57]6, seed:=57 seed^2 -> 3249 See it with 4 digits -> 3249 Pick middle digits -> 3[24]9, seed:=24 At this time you can do an infinite loop that is cyclic, so a way is to limit any sequence to max 100^k iterations. In this case one could consider the sentence "For a generator of n-digit numbers the period can be no longer than 8^n" on wikipedia, but to make it standard, let's limit it to 100^k iterations.

**How to report a result**

A result is composed by the following list - the device used plus the language used, overclock if any, custom firmware if any and so on. - time elapsed for a given k in seconds (see below) - the code used. if the calculator is too slow, or limited, to compute a given n, then report "for n=100^k the computation takes too much time". Conversely, if the calculator is too fast to compute a given n, then report "for n=100^k the computation takes too little time, I skipped it"

- HP 50g , 2.15 with HPGCC3 patch, ARM ASM
- Source: 3298 on Casio forum
- Time: 0.0093 s @75mhz
- Notes: For the code, I obviously needed division. This time, I searched for proper implementations of division on the web and found this one in the official infocenter. It is pretty scary how close I got with my own modulo routine, so it was quite optimized after all. Not many processors allow the programmer to write such a compact version (
`CMP R7,R8; SUBNES R6,R6,1; BNE WHILEloop`

) of`if(R7!=R8 && (–R6)!=0) goto WHILEloop;`

- Code:
^{2)}

- HP 50g , 2.15 with HPGCC3 patch, SaturnASM
- Source: 3298 on Casio forum
- Time: 0.0359 s @75mhz
- Notes: just with Saturn ASM, though I have to admit that I used some instructions the real Saturn didn't have - only the emulated Saturn can multiply and divide with a single instruction. So what will ARM ASM bring?

Another note on optimization in this code: I let the limitCyclicSequences counter run backwards from n to 0, this saves an instruction or two in the WHILE loop compared to the pseudocode version. - Code:
^{3)}

- HP 50g , 2.15 with HPGCC3 patch, SysRPL
- Source: 3298 on Casio forum
- Time: 1.1421 s @75mhz
- Notes: Due to the size of the numbers this test produces I can't use normal bints if I want to run it for k=2. As a consequence, my first try was to translate my UserRPL version to SysRPL using HXS numbers. That one got k=1 solved in about 4 seconds with hardcoded constants n and n² for k=1. When I replaced them by a piece of code that was supposed to calculate them from user input, it was for some reason slowed down to about 12 seconds (yes, slower than my previous UserRPL version). I could have done a version that is slightly faster than the UserRPL version by using real numbers as well, but as the highest number calculated for k=1 is still below the limit of 1048576, I can safely use bints for that one.
- Code:
^{4)}

- HP 50g , 2.15 HRAST BASIC
- Source: On MoHPC forum
- Time: 1.94s
- Code:
^{5)}

- HP 49g, HRAST BASIC
- Source: On MoHPC forum
- Time: 4.81s
- Code:
^{6)}

- HP 50g , 2.15 with HPGCC3 patch, UserRPL
- Source: 3298 on Casio forum
- Time: 10.6349 s @75mhz
- Code:
^{7)}

- HP 50g , 2.08, UserRPL 75mhz , #1
- Time: 24.4003 sec
- Code:
^{8)}

- HP 50g , 2.15 with HPGCC3 patch, ARM ASM
- Source: 3298 on Casio forum
- Time: 169.9734 s @75mhz
- refer to the same entry for k:=1

- HP 50g , 2.15 with HPGCC3 patch, SaturnASM
- Source: 3298 on Casio forum
- Time: 1445.8055 s @75mhz
- Code: see the entry for k=1.

- HP 50g , 2.08, UserRPL 75mhz , #1
- Time: estimated 250'000 secs
^{9)} - Code: The same of the entry “HP 50g , 2.08, UserRPL 75mhz , #1” for the 100^k, k=1 input.

Why? See the note ^{10)} .

**»Devices smaller than a tablet of 7 inches, 7 inches included«**

- Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
- Time: around 3 seconds (doing also other things, it is multitasking)
def middleSquareTest(k): import time; print time.localtime(); n = 100**k; nsqrt = n**(1/2.0); temp = None; new = None; for a in range(n/10, n): old = a; for j in range(n): #limit to max 'n' iterations because we don't want unlimited cycles temp = int(old*old / nsqrt)/float(n) ; new = (temp-int(temp))*n; if new == old: #exit j = n; else: #print old; old = new; print time.localtime();

- Palm treo pro GSM (400mhz, 128Mb ram), pythonCE 2.5-20061219 #1
- Time: 27'706 seconds (doing also other things, it is multitasking)
- See entry for k=1

As Marcel Samek on 11 Sept 2013 wrote on MoHPC's forum, this test is hard for small programmable calculators that can't play with integer of fractional part of a number. So the taget devices of this test are high end calculators with good mathematical framework.

Even if the method has a lot of flaws from the point of view of randomness.

CODE A=0.W GOSBVL POP# GOSBVL SAVPTR SKUB { *start !ARM STMDB sp! {R4 R5 R6 R7 R8 R9 R10 LP } LDR R2,[R1,#2316] MOV R3,1 MOV R10,10 *ALOGloop MUL R3,R3,R10 SUBS R2,R2,1 BNE ALOGloop MUL R2,R3,R3 MOV R7,R2 BL divmod MOV R4,R9 *FORloop MOV R5,R4 MOV R6,R4 *WHILEloop MUL R,R5,R5 MOV R10,R3 BL divmod MOV R7,R9 MOV R10,R2 BL divmod MOV R8,R5 MOV R5,R7 CMP R7,R8 SUBNES R6,R6,1 BNE WHILEloop ADD R4,R4,1 CMP R4,R2 BNE FORloop LDMIA sp! {R4 R5 R6 R7 R8 R9 R10 PC} *divmod MOV R8,R10 MOV R9,0 CMP R7,R10 MOVLO PC,LR *.LSloop MOV R8,R8 LSL #1 CMP R8,R7 BLO .LSloop *.RSloop MOVHI R8,R8 LSR #1 CMP R7,R8 SUBHS R7,R7,R8 ADC R9,R9,R9 CMP R8,R10 BHI .RSloop MOV PC,LR !ASM *end } C=RSTK D0=C D1=80100 LC(5)end-start MOVEDN LC 80100 ARMSAT GOVLNG GETPTRLOOP ENDCODE

CODE GOSBVL POP# GOSBVL SAVPTR B=A.A C=0.W A=0.W C+1.W A+10.W *ALOGloop C*A.W B-1.W ?B#0.A GOYES ALOGloop R0=C.W C*C.W R1=C.W C/A.W R2=C.W *FORloop B=C.W D=C.W *WHILEloop D-1.W ?D=0.W GOYES endFORloop C*C.W A=R0.W C/A.WA=R1.W C%A.W A=B.W B=C.W ?A#C.W GOYES WHILEloop *endFORloop C=R2.W A=R1.W C+1.W R2=C.W ?C#A.W GOYES FORloop GOVLNG GETPTRLOOP ENDCODE

:: BINT10 BINT100 BINT100 BINT10 DO DUPINDEX@ BEGIN DUPDUP #* 5PICK #/ SWAPDROP 4PICK #/ DROP SWAPOVER #= ROT #1- DUP4UNROLL #0= OR UNTIL 2DROP LOOP 2DROP ;

WATCH INTEGER K=1,N=100^K,Q=SQR N FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O<>R@ O=R NEXT ELSE LEAVE NEXT ? TICKS

WATCH INTEGER K=1,N=100^K,Q=SQR N FOR I=N/10 TO N-1 O=I FOR L=N R=MOD(^O/Q,N) IF O=R LEAVE ELSE O=R NEXT NEXT ? TICKS

<< ALOG DUP SQ DUP 10. / OVER 1. - FOR a DUP a DO DUP SQ 5. PICK / IP 4. PICK MOD UNTIL SWAP OVER == ROT 1. - UNROT PICK3 NOT OR END DROP2 NEXT DROP2 >>

\<< @k 100 SWAP ^ @n:= 100^k DUP \v/ @\v/n 0 @old 0 @new 0 @limit \-> n nsqr old new limit \<< PUSH @saves system flags -105 SF @approx mode n 10 / n 1 - FOR a a 'old' STO 1 CF @the sequence is not convergent 0 'limit' STO @to limit cyclic sequences WHILE 1 FC? limit n < AND REPEAT @extracts middle digits old old * @square nsqr / IP @integer part of old^2/\v/n n / FP @the previous result / n^2, picks the decimals n * @restores the decimals as integers 'new' STO IF new old == THEN 1 SF ELSE new 'old' STO 'limit' 1 STO+ END END NEXT POP @restores system flags \>> \>> executed with TEVAL

Because the complexity of the task for the input k=1 is around $ O(100 \cdot 100=100^2) $ and it takes 24.4 secs. The complexity of the task with the input k=2 is around $ O(100^2 \cdot 100^2 = 100^4) $ that is $ 100^2 $ times the complexity with k=1. So, while the Hp50g is still crunching for a confirmation, the task will take around 250'000 seconds.

Today i interrupted the computation after almost 48 hours and it is still unfinished as expected.

**To avoid battery consumption you must use the an usb charger!**

Today i interrupted the computation after almost 48 hours and it is still unfinished as expected.

Because, a part from school, one can use an handheld device to do math, maybe simple math using built-in functions of the framework, as well. Sure, it is not comfortable as a physical calculator unless it is large enough to simulate a calculator keyboard.

benchmarks/middlesquare.txt · Last modified: 2017/05/09 09:03 by pier4r