Board index » delphi » Fastcode Int64Div B&V 3.4.1

Fastcode Int64Div B&V 3.4.1


2006-09-21 03:21:14 AM
delphi140
Hi All
Let us have a look at the benchmark and find a way to make it better.
We have 2 subbenchmarks and these to codesnippets are the loops we time
Sub1
X1 := 1;
while X1 <= MAXINT64-BENCHSTEPSIZE do
begin
Y1 := X1;
while Y1 <= MAXINT64-BENCHSTEPSIZE do
begin
Z1 := Int64DivFunction(Y1, X1);
Z2 := Int64DivFunction(Y1, X1);
Inc(Y1, BENCHSTEPSIZE);
end;
Inc(X1, BENCHSTEPSIZE);
end;
X2 := -MAXINT64;
while X2 <= -1-BENCHSTEPSIZE do
begin
Y2 := X2;
while Y2 <= -1-BENCHSTEPSIZE do
begin
Z3 := Int64DivFunction(X2, Y2);
Z4 := Int64DivFunction(X2, Y2);
Inc(Y2, BENCHSTEPSIZE);
end;
Inc(X2, BENCHSTEPSIZE);
end;
Sub2
X3 := -10;
while X3 <= 10 do
begin
Y3 := -MaxInt;
Y3 := Y3 * 4;
while Y3 <= MaxInt do
begin
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(X3, Y3);
Inc(Y3, 234567);
end;
Inc(X3);
end;
Best regards
Dennis Kjaer Christensen
 
 

Re:Fastcode Int64Div B&V 3.4.1

Hi
The innerloop of both subbenchmarks are making to identical calls to the
function
Quote
Z1 := Int64DivFunction(Y1, X1);
Z2 := Int64DivFunction(Y1, X1);
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(X3, Y3);
to amortize the overhead of the loop code.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Hi
First we can investigate how good/bad the original benchmark is and then we
can make a new one.
The Sub1 has two sets of nested loops
X1 := 1;
while X1 <= MAXINT64-BENCHSTEPSIZE do
begin
Y1 := X1;
while Y1 <= MAXINT64-BENCHSTEPSIZE do
begin
Inc(Y1, BENCHSTEPSIZE);
end;
Inc(X1, BENCHSTEPSIZE);
end;
The first set runs in the parameter area X1,Y1 = [1,MAXINT64-BENCHSTEPSIZE],
[X1, MAXINT64-BENCHSTEPSIZE] with linear steps = BENCHSTEPSIZE
This loop covers the first quadrant, because X and Y are both positive
always.
X2 := -MAXINT64;
while X2 <= -1-BENCHSTEPSIZE do
begin
Y2 := X2;
while Y2 <= -1-BENCHSTEPSIZE do
begin
Inc(Y2, BENCHSTEPSIZE);
end;
Inc(X2, BENCHSTEPSIZE);
end;
The second set runs in the parameter area X2,Y2 =
[-MAXINT64,-1-BENCHSTEPSIZE], [X2, -1-BENCHSTEPSIZE] with linear steps =
BENCHSTEPSIZE
We observe that Y>= X always for both sets. This means that Y/X>= 1
always. This is realistic.
This loop covers the third quadrant, because X and Y are both negative
always.
No loops cover quadrant 2 and 4. This is a problem.
Because the stepping is linear we will get very few cases where one or both
parameters are small in value. Perhaps stepping should be logarithmic?
Best regards
Dennis Kjaer Christensen
"Dennis" <XXXX@XXXXX.COM>writes
Quote
Hi

The innerloop of both subbenchmarks are making to identical calls to the
function

>Z1 := Int64DivFunction(Y1, X1);
>Z2 := Int64DivFunction(Y1, X1);

>Z5 := Int64DivFunction(X3, Y3);
>Z6 := Int64DivFunction(X3, Y3);

to amortize the overhead of the loop code.

Best regards
Dennis Kjaer Christensen


 

Re:Fastcode Int64Div B&V 3.4.1

Hi
Sub 2
Has these nested loops
X3 := -10;
while X3 <= 10 do
begin
Y3 := -MaxInt;
Y3 := Y3 * 4;
while Y3 <= MaxInt do
begin
Inc(Y3, 234567);
end;
Inc(X3);
end;
Observe the hardcode stepsize !!! Inc(Y3, 234567);
All 4 quadrants are covered and mostly small (fit in 32 bit) numbers are
used. X is only run from -10->10, but Y is run from -MaxInt * 4 ->MaxInt
This subbenchmark can add weigth to the small numbers cases.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

My biggest problem with SubBenchmark1 is that it is far too heavily weighted
in favour of very large divisions. SubBenchmark1 performs over 5,000,000
divisions. Of these, less that 5,000 (0.1%) do NOT involve very large
divisions (very large division = |dividend| and |divisor|>= 2^32). This,
in my experience, is not realistic.
--
regards,
John
The Fastcode Project:
www.fastcodeproject.org/
"Dennis" <XXXX@XXXXX.COM>writes
Quote

First we can investigate how good/bad the original benchmark is and then
we
can make a new one.

 

Re:Fastcode Int64Div B&V 3.4.1

Hi
I hope you can come up with a new benchmark.
I can think of equal coverage of all 4 quadrant and a division of each in
large and small numbers. This gives 8 subbenchmarks that could have the same
weigth, but should be skewed towards more weigth to small numbers according
to Johns real world data.
Have fun ;-) I have to sleep now ;-)
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

My biggest problem with subBenchmark2 is with the lines:-
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(X3, Y3);
In every case |X3| is less than |Y3|, so the result of the divisions is
always 0.
Changing the parameter order in the second call to
Z6 := Int64DivFunction(Y3, X3);
would rectify this.
Also, changing the constant 234567 to a smaller value (say 9999) would give
SubBenchmark2 a higher overall weighting.
--
regards,
John
The Fastcode Project:
www.fastcodeproject.org/
"Dennis" <XXXX@XXXXX.COM>writes
Quote
Hi

Sub 2

Has these nested loops

X3 := -10;
while X3 <= 10 do
begin
Y3 := -MaxInt;
Y3 := Y3 * 4;
while Y3 <= MaxInt do
begin
Inc(Y3, 234567);
end;
Inc(X3);
end;

Observe the hardcode stepsize !!! Inc(Y3, 234567);

All 4 quadrants are covered and mostly small (fit in 32 bit) numbers are
used. X is only run from -10->10, but Y is run from -MaxInt * 4 ->MaxInt

This subbenchmark can add weigth to the small numbers cases.

Best regards
Dennis Kjaer Christensen


 

Re:Fastcode Int64Div B&V 3.4.1

John O'Harrow writes:
Quote
Also, changing the constant 234567 to a smaller value (say 9999) would give
SubBenchmark2 a higher overall weighting.
StepX = 9
for( x=0; x<Max; x+= StepX )
{ StepY = 7
for( y=0; y<Max; y+= StepY
{
div( Y, X )
StepY += 97
}
StepX += 99
}
The Steps get bigger, giving fewer tries on larger numbers.
 

Re:Fastcode Int64Div B&V 3.4.1

I would like to propose the following loop within SubBenchmark2. I believe
this give a more representative version of normal use.
X3 := -9999999;
while X3 <= 9999999 do
begin
if X3 <>0 then
begin
Y3 := -MaxInt;
while (Y3 <= MaxInt) and (Y3<>0) do
begin
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(Y3, X3);
Inc(Y3, 9999);
end;
end;
Inc(X3, 99999);
end;
--
regards,
John
The Fastcode Project:
www.fastcodeproject.org/
 

Re:Fastcode Int64Div B&V 3.4.1

Hi John
A possible fix.
X3 := -MaxInt;
while X3 <= MaxInt do
begin
Y3 := -10;
while Y3 <= 10 do
begin
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(X3, Y3);
Inc(Y3);
end;
Inc(X3, 234567);
end;
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Hi John
I like it. Can you give a few explanations and arguments?
Do you agree with this implementation?
const
SUBBENCH2XMIN : Integer = -9999999;
SUBBENCH2XMAX : Integer = 9999999;
SUBBENCH2YMIN : Integer = -MaxInt;
SUBBENCH2YMAX : Integer = MaxInt;
SUBBENCH2XSTEPSIZE : Integer = 99999;
SUBBENCH2YSTEPSIZE : Integer = 9999;
X3 := SUBBENCH2XMIN;
while X3 <= SUBBENCH2XMAX do
begin
if X3 <>0 then
begin
Y3 := -MaxInt;
while (Y3 <= MaxInt) and (Y3<>0) do
begin
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(Y3, X3);
Inc(Y3, SUBBENCH2YSTEPSIZE);
end;
end;
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Hi John
The distribution of subbench1 should not be realistic !
The distribution for the benchmark should be realistic.
Subbench2 adds division of small numbers and the weigth of big versus small
numbers is adjusted via the subbench weigths. I normally aim at equal
weigths for the individual subbenchmarks on the fastest function on a blend
of processors = our CPU target set.
I do not know why the weigths are so much out of balance
Int64Div_DKC_IA32ext_2 67 3 70
Int64Div_DKC_IA32_2 68 3 71
Int64Div_JOH_IA32_3 68 3 71
Something was changed at some point, but I cannot remember the history.
Let us simply agree on a weight.
Big numbers = Subbench1
Small numbers = subbench2
My original aim : Subbench1 / subbench2 = 50/50
Johns real world aim : Subbench1 / subbench2 = 5/95
My new proposal: Subbench1 / subbench2 = 30/70
Opinions wanted - perhaps a poll.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Hi
These two constant adjust the subbenchweigths
const
SUBBENCH1SCALE : Double = 300;
SUBBENCH2SCALE : Double = 150
The new subbench2 changes the weigths and a first temporary adjustment is
const
SUBBENCH1SCALE : Double = 300;
SUBBENCH2SCALE : Double = 15;
This is the result from Opteron
Int64Div_DKC_IA32_2 103 76 179
Int64Div_DKC_IA32ext_2 106 73 179
Int64Div_JOH_IA32_3 106 74 180
Int64Div_JOH_IA32_2 97 118 215
Int64Div_JOH_IA32_4 133 112 245
Int64Div_AMD_IA32_1 135 115 250
Int64Div_JOH_IA32_1 136 116 252
Int64Div_JOH_PAS_1 162 95 257
Int64Div_DKC_Pas_1 168 100 268
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Hi
Observe that we have special subbenchmarks for the RTL function
They are 100% identical to the other subbenches except for
Z1 := Y1 div X1;
Z2 := Y1 div X1;
and not
Z3 := Int64DivFunction(X2, Y2);
Z4 := Int64DivFunction(X2, Y2);
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode Int64Div B&V 3.4.1

Your 6 constants need to be Int64's, not Integers.
You are not incrementing X3 within the outer Loop
You are not using SUBBENCH2YMIN, SUBBENCH2YMAX or SUBBENCH2XSTEPSIZE
Apart from that, it would be Ok :)
--
regards,
John
The Fastcode Project:
www.fastcodeproject.org/
"Dennis" <XXXX@XXXXX.COM>writes
Quote
Hi John

I like it. Can you give a few explanations and arguments?

Do you agree with this implementation?

const
SUBBENCH2XMIN : Integer = -9999999;
SUBBENCH2XMAX : Integer = 9999999;
SUBBENCH2YMIN : Integer = -MaxInt;
SUBBENCH2YMAX : Integer = MaxInt;
SUBBENCH2XSTEPSIZE : Integer = 99999;
SUBBENCH2YSTEPSIZE : Integer = 9999;

X3 := SUBBENCH2XMIN;
while X3 <= SUBBENCH2XMAX do
begin
if X3 <>0 then
begin
Y3 := -MaxInt;
while (Y3 <= MaxInt) and (Y3<>0) do
begin
Z5 := Int64DivFunction(X3, Y3);
Z6 := Int64DivFunction(Y3, X3);
Inc(Y3, SUBBENCH2YSTEPSIZE);
end;
end;

Best regards
Dennis Kjaer Christensen