Board index » delphi » Fastcode StrLen B&V 0.4.0

Fastcode StrLen B&V 0.4.0


2005-12-03 06:49:26 PM
delphi238
Hi
If SubBench1 is accepted then we could consider modifying SubBench2 in the
same way too. I need your opinion.
Precision is not good enough yet.
Inclusion of new functions.
Validation should be improved
SubBench weigths will be adjusted and slowly converge to the optimum from
release to release. I need your accept of the method.
My Northwood machine has a problem with precision. I am trying to work it
out.
Going to run benchmarks and validations on Banias and Athlon XP now also.
Best regards
Dennis Kjaer Christensen
 
 

Re:Fastcode StrLen B&V 0.4.0

Hi
I have been using the wrong compiler for all releases so far.
We also have anew record for number of functions in a B&V. 210 functions in
0.3.0. ;-)
I will clean up as soon as the benchmark is stable and I have results for
all targets.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode StrLen B&V 0.4.0

Hi Dennis
New functions for the next release.
//Author: Lars Bloch Gravengaard
//Date: 3/12 2005
//Instructionset(s): IA32
function StrLen_LBG_IA32_6(const Str: PChar): Cardinal;
asm
MOV EDX,7
ADD EDX,EAX { pointer+7 used in the end }
PUSH EBX { is necessary; even in your
version}
MOV EBX,[EAX] { read first 4 bytes}
ADD EAX,4 { increment pointer}
@L1: LEA ECX,[EBX-$01010101] { subtract 1 from each byte}
XOR EBX,-1 { invert all bytes}
AND ECX,EBX { and these two}
MOV EBX,[EAX] { read next 4 bytes}
ADD EAX,4 { increment pointer}
TEST ECX,80808080H { test all sign bits}
JZ @L1 { no zero bytes, continue loop}
TEST ECX,00008080H { test first two bytes}
JNZ @L2 { *was JNZ SHORT @L2*}
SHR ECX,16 { not in the first 2 bytes}
ADD EAX,2
@L2: SHL CL,1 { use carry flag to avoid a branch}
POP EBX { Likewise; see above}
SBB EAX,EDX { compute length}
end;
//Uploader Lars Bloch Gravengaard
//Author: Robert Lee
//Date: 2/12 2005
//Instructionset(s): IA32
//Coment: This very nice function I found on the web.archive.org
// It is from 1999! and stil very fast !!!
function StrLen_LBG_IA32_5(const Str: PChar): Cardinal;
asm
MOV EDX,7
ADD EDX,EAX { pointer+7 used in the end }
PUSH EBX { is necessary; even in your
version}
MOV EBX,[EAX] { read first 4 bytes}
ADD EAX,4 { increment pointer}
@L1: LEA ECX,[EBX-$01010101] { subtract 1 from each byte}
XOR EBX,-1 { invert all bytes}
AND ECX,EBX { and these two}
MOV EBX,[EAX] { read next 4 bytes}
ADD EAX,4 { increment pointer}
AND ECX,80808080H { test all sign bits}
JZ @L1 { no zero bytes, continue loop}
TEST ECX,00008080H { test first two bytes}
JNZ @L2 { *was JNZ SHORT @L2*}
SHR ECX,16 { not in the first 2 bytes}
ADD EAX,2
@L2: SHL CL,1 { use carry flag to avoid a branch}
POP EBX { Likewise; see above}
SBB EAX,EDX { compute length}
end;
Regards,
Lars G
--
The Fastcode Project:
www.fastcodeproject.org/
 

Re:Fastcode StrLen B&V 0.4.0

"Dennis" <XXXX@XXXXX.COM>skrev i en meddelelse
Quote
Hi

If SubBench1 is accepted then we could consider modifying SubBench2 in the
same way too. I need your opinion.
The new SubBench1 looks better. You got my vote.
Regards,
Lars G
 

Re:Fastcode StrLen B&V 0.4.0

www.azillionmonkeys.com/qed/asmexample.html
Item 5
--
Robert AH Prins
prino at prino dot plus dot com
 

Re:Fastcode StrLen B&V 0.4.0

Here's another new function:-
function StrLen_JOH_IA32_3(const Str: PChar): Cardinal;
asm
push eax
mov edx, eax
@@Loop:
mov eax, [edx] {4 Chars per Loop}
add edx, 4
lea ecx, [eax-$01010101]
not eax
and eax, ecx
and eax, $80808080 {Set Byte to $80 at each #0 Position}
jz @@Loop {Loop until any #0 Found}
bsf eax, eax {Find First #0 Position}
shr eax, 3 {Byte Offset of First #0}
pop ecx
lea eax, [edx+eax-4] {Address of First #0}
sub eax, ecx
end;
--
regards,
John
The Fastcode Project:
www.fastcodeproject.org/
 

Re:Fastcode StrLen B&V 0.4.0

and another:-
function StrLen_JOH_IA32_4(const Str: PChar): Cardinal;
asm
push eax
lea edx, [eax+4]
mov eax, [eax]
lea ecx, [eax-$01010101]
not eax
and eax, ecx
and eax, $80808080 {Set Byte to $80 at each #0 Position}
jnz @@SetResult
and edx, -3 {DWORD Align Reads}
@@Loop:
mov eax, [edx] {4 Chars per Loop}
add edx, 4
lea ecx, [eax-$01010101]
not eax
and eax, ecx
and eax, $80808080 {Set Byte to $80 at each #0 Position}
jz @@Loop {Loop until any #0 Found}
@@SetResult:
bsf eax, eax {Find First #0 Position}
shr eax, 3 {Byte Offset of First #0}
pop ecx
lea eax, [eax+edx-4] {Address of First #0}
sub eax, ecx
end;
 

Re:Fastcode StrLen B&V 0.4.0

Hi
Released in attachments.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode StrLen B&V 0.4.0

Hi
OK. I have seen it earlier.
I will try make a function based on this code and include it in the next
release.
Best regards
Dennis Kjaer Christensen
 

Re:Fastcode StrLen B&V 0.4.0

Hi
Some kind soul etc....... ;-)
Best regards
Dennis Kjaer Christensen