Board index » delphi » Fastcode CharPosRev B&V 0.4.0

Fastcode CharPosRev B&V 0.4.0


2006-11-25 05:08:17 PM
delphi201
Hi
Lets play a little with simple loop unrolling.
First I pick the fastest of my non-unrolled functions for each CPU.
On Opteron it is CharPosRev_DKC_Pas_8
CharPosRev_DKC_Pas_8_c 0 1447 2036 3483
CharPosRev_DKC_Pas_7_a 8 1469 2065 3534
CharPosRev_DKC_Pas_4_b 0 1697 2602 4299
CharPosRev_DKC_Pas_1_c 4 1706 2622 4328
CharPosRev_DKC_Pas_5_d 4 1712 2618 4330
CharPosRev_DKC_Pas_3_d 0 1723 2608 4331
CharPosRev_DKC_Pas_2_d 0 1735 2619 4354
CharPosRev_DKC_Pas_6_b 0 1735 2623 4358
function CharPosRev_DKC_Pas_8_a(SearchChar : Char; const S: string) :
Integer;
var
Str, StrStart : PChar;
LengthS : Integer;
begin
Result := 0;
LengthS := Length(S);
if LengthS>0 then
begin
StrStart := Pointer(S);
Str := StrStart + LengthS - 1;//Points to last char in S
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
Result := Str-StrStart+1;
end;
end;
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi
Simple loop unrolling goes like this. Take the loop
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
duplicate it
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
The loop break at this condition (Str < StrStart); and the first until is
converted into
if (Str < StrStart) then
Break
The unrolled loop becomes
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
if (Str < StrStart) then
Break
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
and the entire function becomes
function CharPosRev_DKC_Pas_101_a(SearchChar : Char; const S: string) :
Integer;
var
Str, StrStart : PChar;
LengthS : Integer;
begin
Result := 0;
LengthS := Length(S);
if LengthS>0 then
begin
StrStart := Pointer(S);
Str := StrStart + LengthS - 1;//Points to last char in S
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
if (Str < StrStart) then
Break;
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
Result := Str-StrStart+1;
end;
end;
I made a copy of _8 and called it _100
The logic now goes that _100 is zero times unrolled, _101 is one time
unrolled etc.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi
The next step in unrolling is simply to copy paste the block
if Str^ = SearchChar then
Break;
Dec(Str);
if (Str < StrStart) then
Break;
function CharPosRev_DKC_Pas_102_a(SearchChar : Char; const S: string) :
Integer;
var
Str, StrStart : PChar;
LengthS : Integer;
begin
Result := 0;
LengthS := Length(S);
if LengthS>0 then
begin
StrStart := Pointer(S);
Str := StrStart + LengthS - 1;//Points to last char in S
repeat
if Str^ = SearchChar then
Break;
Dec(Str);
if (Str < StrStart) then
Break;
if Str^ = SearchChar then
Break;
Dec(Str);
if (Str < StrStart) then
Break;
if Str^ = SearchChar then
Break;
Dec(Str);
until (Str < StrStart);
Result := Str-StrStart+1;
end;
end;
What are the benefits of this optimization? All it does is to convert a
backward taken branch on each loop iteration into some forward non taken
branches. This can remove misaligned branch target penalties. Is the cost of
a taken correctly predicted branch the same as a nontaken correctly
predicted branch? On which CPU's - I do not know.
I call this simple unrolling because it does not reduce the number of times
the loop termination condition is evaluated. Real unrolling proccesors
2,4,8,16 bytes for each time the loop condition is checked. This is of
course faster, but not always possible. For AnsiString where we know the
string size up front it is possible, but for zero terminated strings it is
only partly possible. It is not possible if we do not allow reading past the
end of the string. In short - the functions I present here will not be
winners, they will just allow us to learn something. I expect none of them
to be faster than John's _2 function on any CPU's. Also they are much bigger
and will not be winners on any size penalty targets.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi
Here are some numbers
Unrollcount 0 1 2 3 4 5 6 7 8 9
Opteron 3509 3385 3257 3278 3294 3338 3459 3019 3303 3112
Yonah 3018 3068 2975 2935 2951 2912 2870 2845 2857 2815
Dothan 2900 2890 2907 2835 2861 2813 2757 2693 2708 2713
Northwood 2177 2133 2032 2096 2036 2016 2009 2000 1979 2014
Prescott 2744 2654 2604 2577 2560 2523 2535 2525 2514 2495
Spreadsheet is here
tech.groups.yahoo.com/group/fastcodeproject/files/CharPosRev/Version%
200.4.0/
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi
Released at
tech.groups.yahoo.com/group/fastcodeproject/files/CharPosRev/Version%
200.4.0/
This release is dedicated to the unrolling experiment and has the relevant
functions enabled.
A real release with all functions coming very soon
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi Dennis,
I would like to see my _6 version in the release too, because I am not sure
if my _7 is faster on all cpu's...
( _6 uses more movzx edx,[esi+ecx+$01..$03] and _7 more movzx
edx,[esi+ecx-$01..$03])
(so just out of curiosity .. please include that one too? :))
Thanks,
Davy
"Dennis" <XXXX@XXXXX.COM>writes
Quote
Hi

Released at

tech.groups.yahoo.com/group/fastcodeproject/files/CharPosRev/Version%
200.4.0/

This release is dedicated to the unrolling experiment and has the relevant
functions enabled.

A real release with all functions coming very soon

Best regards
Dennis Kjaer Christensen

----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.


 

Re:Fastcode CharPosRev B&V 0.4.0

Hi Davy
Sure - I just missed it - sorry.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi Davy
I did not include it because it borrows to much from Johns function.
I prefer that we make our own functions and also fix any issues in our own
functions.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Hi John and Davy
Quote
I did not include it because it borrows to much from Johns function.
Please correct me if I am wrong.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.
 

Re:Fastcode CharPosRev B&V 0.4.0

Quote
Sorry
No worries, I see you doing a lot of good work
Quote

I did not include it because it borrows to much from Johns function.

Okay, I totaly agree on that one.
that's why I included the comment with that version...
Regards,
Davy Landman
"Dennis" <XXXX@XXXXX.COM>writes
Quote
Hi Davy

I did not include it because it borrows to much from Johns function.

I prefer that we make our own functions and also fix any issues in our own
functions.

Best regards
Dennis Kjaer Christensen

----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.


 

Re:Fastcode CharPosRev B&V 0.4.0

Hi
Interesting results ;-)
All processors benefit from unrolling.
Opteron benefits most, but the curve has some weird bends. This is probably
due to misalignment of the branch target when execution breaks out of the
loop. Opteron could probably benefit further from unrolling, but the other
processors will benefit only little from going about 9 times unrolling.
Both Prescott and Northwood are running at 2800 MHz, but Northwood is much
faster.
Dothan is sligthly faster than Yonah at the same clock = 1733 MHz.
Opteron is slowest but it runs at 1400 MHz only.
Best regards
Dennis Kjaer Christensen
----------------------------------------------------------------------------
----
Jeg beskyttes af den gratis SPAMfighter til privatbrugere.
Den har indtil videre sparet mig for at f?5740 spam-mails
Betalende brugere får ikke denne besked i deres e-mails.
Hent en gratis SPAMfighter her.