[Harbour] 2008-09-15 13:38 UTC+0200 Przemyslaw Czerpak (druzus/at/priv.onet.pl)

Mindaugas Kavaliauskas dbtopas at dbtopas.lt
Wed Oct 1 03:34:51 EDT 2008


Przemyslaw Czerpak wrote:
>> The only reason I see for binding stack preload with "no tls" is that stack 
>> preload also uses inlined Windows like function to access tls. But I see it 
>> as to separate features stack: stack preload and tls access method 
>> (compiler native or system API)?
> 
> When compiler native TLS is disabled and file define HB_STACK_PRELOAD
> before including harbour header files then each function which have
> to access hb_stack buffers it's address by HB_STACK_TLS_PRELOAD.
> If possible assembler inline function is used to retrieve stack address
> which is a little bit faster then call to OS TLS function and even native
> TLS support in some compilers (f.e.BCC).
> Compile current SVN code without any additional switches and compare
> the tstspeed.prg results to previous ones.

Hi,


thanks for explanation. I just want to run all tests one after another 
to make results comparable. Because sometimes numbers obtained a few 
days ago can be calculated in a different OS memory/CPU usage state, so 
results can give a few seconds difference.
I've used -DHB_USE_TLS to obtain "previous" results (use compiler native 
TLS, no stack preloading).

The results are:

10/01/08 10:03:30 Harbour 1.1.0dev (Rev. 9523), Windows XP 5.1.2600 
Service Pack 2

ARR_LEN =         16                      ST      MT      MT
N_LOOPS =    1000000                            USE_TLS
empty loops overhead =                   0.19    0.30    0.28
CPU usage -> secondsCPU()

c:=L_C ->                                0.19    0.39    0.31
n:=L_N ->                                0.19    0.27    0.19
d:=L_D ->                                0.22    0.27    0.19
c:=M_C ->                                0.23    0.45    0.38
n:=M_N ->                                0.20    0.30    0.23
d:=M_D ->                                0.22    0.30    0.23
(sh) c:=F_C ->                           0.38    0.81    0.84
(sh) n:=F_N ->                           0.58    0.61    0.64
(sh) d:=F_D ->                           0.30    0.34    0.36
(ex) c:=F_C ->                           0.38    0.81    0.83
(ex) n:=F_N ->                           0.56    0.64    0.66
(ex) d:=F_D ->                           0.30    0.34    0.33
n:=o:GenCode ->                          0.45    0.81    0.78
n:=o[8] ->                               0.42    0.63    0.52
round(i/1000,2) ->                       0.63    0.92    0.81
str(i/1000) ->                           1.50    2.27    2.03
val(a3[i%ARR_LEN+1]) ->                  1.36    1.84    1.64
dtos(j+i%10000-5000) ->                  1.39    2.06    2.03
eval({||i%ARR_LEN}) ->                   0.69    1.03    0.89
eval({|x|x%ARR_LEN},i) ->                0.78    1.20    1.02
eval({|x|f1(x)},i) ->                    1.28    1.81    1.42
&('f1('+str(i)+')') ->                   7.66   15.13   13.14
eval([&('{|x|f1(x)}')]) ->               1.25    1.81    1.39
j := valtype(a)+valtype(i) ->            1.08    2.00    1.86
j := str(i%100,2) $ a2[i%ARR_LEN+1] ->   2.27    3.45    3.02
j := val(a2[i%ARR_LEN+1]) ->             1.55    2.11    1.91
j := a2[i%ARR_LEN+1] == s ->             1.06    1.70    1.50
j := a2[i%ARR_LEN+1] = s ->              1.11    1.69    1.58
j := a2[i%ARR_LEN+1] >= s ->             1.17    1.67    1.52
j := a2[i%ARR_LEN+1] < s ->              1.13    1.67    1.53
aadd(aa,{i,j,s,a,a2,t,bc}) ->            4.38    5.92    5.81
f0() ->                                  0.33    0.55    0.42
f1(i) ->                                 0.55    0.89    0.64
f2(c[8]) ->                              0.45    0.81    0.63
f2(c[40000]) ->                          0.47    0.80    0.64
f2(@c[40000]) ->                         0.36    0.64    0.47
f2(c[40000]); c2:=c ->                   0.69    1.19    1.00
f2(@c[40000]); c2:=c ->                  0.56    1.03    0.88
f3(a,a2,c,i,j,t,bc) ->                   1.16    2.05    1.70
f2(a2) ->                                0.44    0.81    0.66
s:=f4() ->                               2.13    2.56    2.47
s:=f5() ->                               0.84    1.38    1.27
ascan(a,i%ARR_LEN) ->                    0.73    1.25    1.06
ascan(a2,c+chr(i%64+64)) ->              2.47    3.67    3.33
ascan(a,{|x|x==i%ARR_LEN}) ->           10.95   13.48   11.66
=============================================================
total application time:                 64.95  100.00   89.30
total real time:                        65.89  100.97   90.36

Previous MT overhead was 54%, current 37%.

One thing is not clear for me. You've committed exactly the same inlined 
tls accessing as I've used in my test, but your code does not GPF. My 
was GPFing because of wrong generated CPU code.


Best regards,
Mindaugas



More information about the Harbour mailing list