Board index » delphi » Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?


2006-08-04 05:46:09 AM
delphi170
I constructed two sample applications (VB6 and Delphi 2006 Win32) that do a
simple search through a text file line by line. (see source code at that
bottom of this email). I must be off on my approach or is there a 3rd party
solution to make this faster... this file has 2.4 million lines (rows) and
is 806MB in size.
VB6
----------------------
8/3/2006 4:35:54 PM
Lines: 2401548
Found: 37
8/3/2006 4:37:25 PM
90.28125
DELPHI 2006 Win32
----------------------
8/3/2006 4:32:04 PM
Lines: 2401548
Found: 37
8/3/2006 4:34:41 PM
156.69099970255
SOURCE
==========
VB
---------------
Private Sub cmdSimpleSearch_Click()
Dim sBuffer As String
Dim nCount As Long
Dim nLine As Long
Dim sSearch As String
Dim sFileIn As String
Dim fIn As Long
Dim dStart As Single
Dim dEnd As Single
sFileIn = File1.Path + "\" + File1.filename
txtText1.Text = Now & vbCrLf
dStart = Timer
fIn = FreeFile
Open sFileIn For Input As #fIn
sSearch = Me.txtSearch.Text
Do While Not EOF(fIn)
Line Input #fIn, sBuffer
If InStr(sBuffer, sSearch)>0 Then
nCount = nCount + 1
End If
nLine = nLine + 1
Loop
Close #fIn
dEnd = Timer
txtText1.Text = txtText1.Text + "Lines: " & nLine & vbCrLf
txtText1.Text = txtText1.Text + "Found: " & nCount & vbCrLf
txtText1.Text = txtText1.Text & Now & vbCrLf
txtText1.Text = txtText1.Text & dEnd - dStart
lblStatus.Caption = "Searched " + Str(nLine) + " lines, found: " +
Str(nCount) + " of " + sSearch
End Sub
------------------
DELPHI
------------------
procedure TForm1.cmdSimpleSearchClick(Sender: TObject);
var
hHandle : TextFile;
sBuffer : String;
sSearch : String;
nCount : Integer;
nFound : Integer;
dStart : TDateTime;
dEnd : TDateTime;
begin
Memo1.Lines.Clear ;
nCount := 0 ;
nFound := 0 ;
if Dialog.Execute then
begin
sSearch := Edit1.Text ;
Memo1.Lines.Add((DateTimeToStr(Now)));
dStart := Now;
AssignFile(hHandle, Dialog.FileName);
Reset(hHandle);
while Not EOF(hHandle) do
begin
Readln(hHandle, sBuffer);
if Pos(sSearch, sBuffer)>0 then
Inc(nFound);
Inc(nCount);
end;
CloseFile(hHandle);
dEnd := Now;
Memo1.Lines.Add('Lines: ' + IntToStr(nCount));
Memo1.Lines.Add('Found: ' + IntToStr(nFound));
Memo1.Lines.Add((DateTimeToStr(Now)));
Memo1.Lines.Add(FloatToStr(SecondSpan(dEnd, dStart)));
end;
end;
 
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Same Delphi project compiled in Delphi 7
8/3/2006 5:01:14 PM
Lines: 2401548
Found: 37
8/3/2006 5:02:56 PM
101.561999414116
Quote
VB6
----------------------
8/3/2006 4:35:54 PM
Lines: 2401548
Found: 37
8/3/2006 4:37:25 PM
90.28125

DELPHI 2006 Win32
----------------------
8/3/2006 4:32:04 PM
Lines: 2401548
Found: 37
8/3/2006 4:34:41 PM
156.69099970255
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

"Doug Olson" <XXXX@XXXXX.COM>wrote
Quote
I constructed two sample applications (VB6 and
Delphi 2006 Win32) that do a simple search through
a text file line by line. ... I must be off on my approach
this file has 2.4 million lines (rows) and is 806MB in size.
Doug,
If you really meant that the lines were all the same length,
then look at or try the following code.
(1) Will the user be scanning the lines in forward or referse order?
(2) Will you be parsing the CSV lines for the display to user?
(How many columns?)
(3) Will the lines be edited into an output file?
Regards, JohnH
procedure TForm1.Button5Click(Sender: TObject);
var MyFile: file; Line, SearchStr: String;
dStart, dEnd: TDateTime; SizeOfFileBytes, SizeOfRecord,
NbrOfRecords, NbrBytesRemainder, i, nFound: Integer;
begin
Memo1.Lines.Clear ;
{ Get path to file: }
If OpenDialog1.Execute then begin
{ Start the clock: }
dStart := Now;
Memo1.Lines.Add((DateTimeToStr(dStart)));
{ Open the file and specify record size of one byte: }
AssignFile(MyFile,OpenDialog1.FileName);
Reset(MyFile,1);
Try{Finally}
{ Check the file and record sizes: }
SizeOfFileBytes := FileSize(MyFile);
SizeOfRecord := 8;
Repeat
If (SizeOfRecord>SizeOfFileBytes)
then SizeOfRecord := SizeOfFileBytes;
SetLength(Line,SizeOfRecord);
Seek(MyFile,0);
BlockRead(MyFile,Line[1],SizeOfRecord);
i := Pos(^M^J,Line); {<== look for CRLF}
If (i <>0)
then begin
SizeOfRecord := i + 1;
BREAK;
end
else begin
If (SizeOfRecord = SizeOfFileBytes)
then begin
SizeOfRecord := 0;
BREAK;
end
else SizeOfRecord := SizeOfRecord*2;
end;
Until false;
If SizeOfRecord = 0
then NbrOfRecords := 0
else NbrOfRecords := SizeOfFileBytes div SizeOfRecord;
NbrBytesRemainder
:= SizeOfFileBytes - SizeOfRecord*NbrOfRecords;
Memo1.Lines.Add(Format(
'SizeOfFileBytes=%D, SizeOfRecord=%D',
[SizeOfFileBytes, SizeOfRecord]));
Memo1.Lines.Add(Format(
'NbrOfRecords=%D, NbrBytesRemainder=%D',
[NbrOfRecords, NbrBytesRemainder]));
{ Get search string: }
SearchStr := Edit1.Text ;
{ Now do the scan, line by line: }
nFound := 0;
Reset(MyFile,SizeOfRecord);
SetLength(Line,SizeOfRecord);
For i := 0 to NbrOfRecords-1 do begin
{ The Seek is only needed if recs are needed out of order: }
Seek(MyFile,i);
{ Read one row record: }
BlockRead(MyFile,Line[1],1);
{ Check for and count presence of search string: }
if (SearchStr <>'') and (Pos(SearchStr, Line)>0)
then Inc(nFound);
end{i-loop};
Finally
CloseFile(MyFile);
End;
{ Stop the clock, and report: }
dEnd := Now;
Memo1.Lines.Add('Found: ' + IntToStr(nFound));
Memo1.Lines.Add((DateTimeToStr(Now)));
Memo1.Lines.Add(FormatFloat
('0.00',(dEnd-dStart)*(24*60*60)));
end{if Execute};
end;
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Quote
while Not EOF(hHandle) do
begin
Readln(hHandle, sBuffer);
if Pos(sSearch, sBuffer)>0 then
Inc(nFound);
Inc(nCount);
end;
The SysTools package on SourceForge has a buffered
stream class named TstAnsiTextStream that would
probably do it quite well..it's also got some text
export, merge and data classes..
Part of the problem you're going to have is that
POS is not all that snappy. You might want to look
through the HyperStr library or maybe FastStrings
for something a little faster..
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Hi John,
Thanks for the feedback. I will be working with both fixed length and csv
files... actually we get data a TON of different formats... I will review
your code and thank you in advance for providing it!
Doug
"John Herbster" <herb-sci1_AT_sbcglobal.net>writes
Quote

"Doug Olson" <XXXX@XXXXX.COM>wrote
>I constructed two sample applications (VB6 and
>Delphi 2006 Win32) that do a simple search through
>a text file line by line. ... I must be off on my approach
>this file has 2.4 million lines (rows) and is 806MB in size.

Doug,
If you really meant that the lines were all the same length,
then look at or try the following code.
(1) Will the user be scanning the lines in forward or referse order?
(2) Will you be parsing the CSV lines for the display to user?
(How many columns?)
(3) Will the lines be edited into an output file?
Regards, JohnH

procedure TForm1.Button5Click(Sender: TObject);

var MyFile: file; Line, SearchStr: String;
dStart, dEnd: TDateTime; SizeOfFileBytes, SizeOfRecord,
NbrOfRecords, NbrBytesRemainder, i, nFound: Integer;

begin

Memo1.Lines.Clear ;

{ Get path to file: }
If OpenDialog1.Execute then begin

{ Start the clock: }
dStart := Now;
Memo1.Lines.Add((DateTimeToStr(dStart)));

{ Open the file and specify record size of one byte: }
AssignFile(MyFile,OpenDialog1.FileName);
Reset(MyFile,1);
Try{Finally}

{ Check the file and record sizes: }
SizeOfFileBytes := FileSize(MyFile);
SizeOfRecord := 8;
Repeat
If (SizeOfRecord>SizeOfFileBytes)
then SizeOfRecord := SizeOfFileBytes;
SetLength(Line,SizeOfRecord);
Seek(MyFile,0);
BlockRead(MyFile,Line[1],SizeOfRecord);
i := Pos(^M^J,Line); {<== look for CRLF}
If (i <>0)
then begin
SizeOfRecord := i + 1;
BREAK;
end
else begin
If (SizeOfRecord = SizeOfFileBytes)
then begin
SizeOfRecord := 0;
BREAK;
end
else SizeOfRecord := SizeOfRecord*2;
end;
Until false;
If SizeOfRecord = 0
then NbrOfRecords := 0
else NbrOfRecords := SizeOfFileBytes div SizeOfRecord;
NbrBytesRemainder
:= SizeOfFileBytes - SizeOfRecord*NbrOfRecords;
Memo1.Lines.Add(Format(
'SizeOfFileBytes=%D, SizeOfRecord=%D',
[SizeOfFileBytes, SizeOfRecord]));
Memo1.Lines.Add(Format(
'NbrOfRecords=%D, NbrBytesRemainder=%D',
[NbrOfRecords, NbrBytesRemainder]));

{ Get search string: }
SearchStr := Edit1.Text ;

{ Now do the scan, line by line: }
nFound := 0;
Reset(MyFile,SizeOfRecord);
SetLength(Line,SizeOfRecord);

For i := 0 to NbrOfRecords-1 do begin

{ The Seek is only needed if recs are needed out of order: }
Seek(MyFile,i);

{ Read one row record: }
BlockRead(MyFile,Line[1],1);

{ Check for and count presence of search string: }
if (SearchStr <>'') and (Pos(SearchStr, Line)>0)
then Inc(nFound);

end{i-loop};

Finally
CloseFile(MyFile);
End;

{ Stop the clock, and report: }
dEnd := Now;
Memo1.Lines.Add('Found: ' + IntToStr(nFound));
Memo1.Lines.Add((DateTimeToStr(Now)));
Memo1.Lines.Add(FormatFloat
('0.00',(dEnd-dStart)*(24*60*60)));

end{if Execute};

end;

 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Hi John,
Thanks - I am amazed that, with the approach I took, comparing nearly the
same style of code, VB(Visual Basic) was faster... I did some further testing on
different sizes and types of files with the same basic results :(
I am going to remove POS and INSTR functions from both to see if it is the
basic file i/o handling... or maybe I could find a different way to allocate
the buffer... but at first glance, apples to apples, I am disappointed.
Here I was raving about the string speed of Delphi... I know this is a small
case, but still...
Doug
"John McTaggart" <XXXX@XXXXX.COM>writes
Quote
>while Not EOF(hHandle) do
>begin
>Readln(hHandle, sBuffer);
>if Pos(sSearch, sBuffer)>0 then
>Inc(nFound);
>Inc(nCount);
>end;

The SysTools package on SourceForge has a buffered
stream class named TstAnsiTextStream that would
probably do it quite well..it's also got some text
export, merge and data classes..

Part of the problem you're going to have is that
POS is not all that snappy. You might want to look
through the HyperStr library or maybe FastStrings
for something a little faster..



 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Doug Olson napsal(a):
Quote
I constructed two sample applications (VB6 and Delphi 2006 Win32) that do a
simple search through a text file line by line.

while Not EOF(hHandle) do
begin
Readln(hHandle, sBuffer);
if Pos(sSearch, sBuffer)>0 then
Inc(nFound);
Inc(nCount);
end;
Just a shot in the dark. If you run this program against a text file
which has lines of varying size, the memory pointed to by sBuffer
variable needs to be reallocated pretty frequently. Isn't that the
case? You might try using a reasonably big fixed-size buffer,
something like array [0..1023] of Char instead of a string in order to
avoid those frequent reallocations. Also you would need to replace a
call to Pos by a call StrPos instead. Do you see any difference?
--
Ivo Bauer [OZM Research]
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Hi Doug,
Doing many small reads on a file is S-L-O-W :)
What you should do is use a buffer, so you read each time one chunk into the
buffer and then read your lines from the buffer.
I have implemented a special "buffered read stream" for my component
TNativeXml, see code below. VB(Visual Basic) probably uses something like that under the
hood. I am sure if you implement this you will see that Delphi is faster.
Kind regards,
Nils Haeck
www.simdesign.nl
code extracted from NativeXml.pas:
// TsdBufferedReadStream is a buffered stream that takes another TStream
// and reads only buffer-wise from it, and reads to the stream are first
// done from the buffer. This stream type can only support reading.
TsdBufferedReadStream = class(TStream)
private
FStream: TStream;
FBuffer: PBigByteArray;
FPage: integer;
FBufPos: integer;
FBufSize: integer;
FPosition: longint;
FOwned: boolean;
FMustCheck: boolean;
protected
procedure CheckPosition;
public
// Create the buffered reader stream by passing the source stream in
AStream,
// this source stream must already be initialized. If Owned is set to
True,
// the source stream will be freed by TsdBufferedReadStream.
constructor Create(AStream: TStream; Owned: boolean{$IFDEF D4UP} =
False{$ENDIF});
destructor Destroy; override;
function Read(var Buffer; Count: Longint): Longint; override;
function Write(const Buffer; Count: Longint): Longint; override;
function Seek(Offset: Longint; Origin: Word): Longint; override;
end;
---
{ TsdBufferedReadStream }
const
cMaxBufferSize = $10000; // 65536 bytes in the buffer
procedure TsdBufferedReadStream.CheckPosition;
var
NewPage: integer;
FStartPos: longint;
begin
// Page and buffer position
NewPage := FPosition div cMaxBufferSize;
FBufPos := FPosition mod cMaxBufferSize;
// Read new page if required
if (NewPage <>FPage) then begin
// New page and buffer
FPage := NewPage;
// Start position in stream
FStartPos := FPage * cMaxBufferSize;
FBufSize := Min(cMaxBufferSize, FStream.Size - FStartPos);
FStream.Seek(FStartPos, soBeginning);
if FBufSize>0 then
FStream.Read(FBuffer^, FBufSize);
end;
FMustCheck := False;
end;
constructor TsdBufferedReadStream.Create(AStream: TStream; Owned: boolean);
begin
inherited Create;
FStream := AStream;
FOwned := Owned;
FMustCheck := True;
FPage := -1; // Set to invalid number to force an update on first read
ReallocMem(FBuffer, cMaxBufferSize);
end;
destructor TsdBufferedReadStream.Destroy;
begin
if FOwned then FreeAndNil(FStream);
ReallocMem(FBuffer, 0);
inherited;
end;
function TsdBufferedReadStream.Read(var Buffer; Count: longint): Longint;
var
Packet: PByte;
PacketCount: integer;
begin
// Set the right page
if FMustCheck then CheckPosition;
// Special case - read one byte, most often
if (Count = 1) and (FBufPos < FBufSize - 1) then begin
byte(Buffer) := FBuffer^[FBufPos];
inc(FBufPos);
inc(FPosition);
Result := 1;
exit;
end;
// general case
Packet := @Buffer;
Result := 0;
while Count>0 do begin
PacketCount := min(FBufSize - FBufPos, Count);
if PacketCount <= 0 then exit;
Move(FBuffer^[FBufPos], Packet^, PacketCount);
dec(Count, PacketCount);
inc(Packet, PacketCount);
inc(Result, PacketCount);
inc(FPosition, PacketCount);
inc(FBufPos, PacketCount);
if FBufPos>= FBufSize then CheckPosition;
end;
end;
function TsdBufferedReadStream.Seek(Offset: longint; Origin: Word): Longint;
begin
case Origin of
soFromBeginning:
FPosition := Offset;
soFromCurrent:
begin
// no need to check in this case - it is the GetPosition command
if Offset = 0 then begin
Result := FPosition;
exit;
end;
FPosition := FPosition + Offset;
end;
soFromEnd:
FPosition := FStream.Size + Offset;
end;//case
Result := FPosition;
FMustCheck := True;
end;
function TsdBufferedReadStream.Write(const Buffer; Count: longint): Longint;
begin
raise EStreamError.Create(sxeCannotWriteCodecForReading);
end;
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Quote
Thanks - I am amazed that, with the approach I took, comparing nearly the
same style of code, VB(Visual Basic) was faster... I did some further testing on
different sizes and types of files with the same basic results :(
I was a VB(Visual Basic) guy (and before that a QB and PowerBasic guy)
and even though they do fundamentally the same thing, they
are different beasts.
Quote
I am going to remove POS and INSTR functions from both to see if it is the
basic file i/o handling... or maybe I could find a different way to
allocate
the buffer...
It's definitely related to too small a buffer and POS. Every
time you fetch a line, you're taking a speed hit. Also, when
doing buffered I/O, try to make the buffer size the same as
the cluster size on the hard drive.
Another thought - what about file mapping?
Quote
but at first glance, apples to apples, I am disappointed.
Here I was raving about the string speed of Delphi...
Trust me, done properly there's no comparison.
I wrote a tag based parsing engine that parses HTML, XHTML,
XML, ASP etcetera and I have been able to parse a 100MB file
in about 3 seconds on an older Athlon 750.
There's no way I could have written it in VB..
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Hi John,
Looks like we followed the same path: I am an old QB guy myself, got into VB
(purchased myself since work didn't want to pay the 99 bucks for VB(Visual Basic) 1) and
kept going through to VB6... There was nothing I couldn't do in VB(Visual Basic) (I
thought) -- I did realize I was doing more API calls and work arounds to
get things done raather than regular coding, then a buddy of mine introduced
me to Delphi (4 at that time). I have been hooked ever since! So much, in
fact, PocketStudio was based on pascal/Delphi IDE :)
I am not giving up on this, some nice folks responded with some great ideas
(thanks everyone), so I am going to continue my quest -- Doug
"John McTaggart" <XXXX@XXXXX.COM>writes
Quote
>Thanks - I am amazed that, with the approach I took, comparing nearly the
>same style of code, VB(Visual Basic) was faster... I did some further testing on
>different sizes and types of files with the same basic results :(

I was a VB(Visual Basic) guy (and before that a QB and PowerBasic guy)
and even though they do fundamentally the same thing, they
are different beasts.

>I am going to remove POS and INSTR functions from both to see if it is
>the
>basic file i/o handling... or maybe I could find a different way to
allocate
>the buffer...

It's definitely related to too small a buffer and POS. Every
time you fetch a line, you're taking a speed hit. Also, when
doing buffered I/O, try to make the buffer size the same as
the cluster size on the hard drive.

Another thought - what about file mapping?

>but at first glance, apples to apples, I am disappointed.
>Here I was raving about the string speed of Delphi...

Trust me, done properly there's no comparison.

I wrote a tag based parsing engine that parses HTML, XHTML,
XML, ASP etcetera and I have been able to parse a 100MB file
in about 3 seconds on an older Athlon 750.

There's no way I could have written it in VB..


 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

"Doug Olson" <XXXX@XXXXX.COM>writes:
Doug,
I created a file of 832Mb with a million lines; the timings I get tell
me that your effort should be on reading the file faster. I have used
memmory mapped files for this combined with Boyer-Moore searching.
Create: 25.6s, Read 14.4s, Read with Pos 16.1s, Memory mapped
with Boyer-Moore 6s. If you know that the lines are always shorter
then 255 characters, use short strings to avoid reallocations of the
Ansi strings.
--Paul
Some numbers
Quote
I constructed two sample applications (VB6 and Delphi 2006 Win32) that do a
simple search through a text file line by line. (see source code at that
bottom of this email). I must be off on my approach or is there a 3rd party
solution to make this faster... this file has 2.4 million lines (rows) and
is 806MB in size.
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Quote
your effort should be on reading the file faster.
IIRC, TFileStream is slow. There are buffered versions around which make
it rocket.
/Matthew Jones/
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Matthew Jones writes:
Quote
IIRC, TFileStream is slow. There are buffered versions around which make
it rocket.
AFAIK, TFileStream is not slow as it is merely a wrapper for the Win32API
functions. However, if you don't read/write in a proper manner, it will seem
slow.
The best idea is to read in 64KB or 256KB chunks when reading from NTFS.
HTH
Jonathan
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Quote
However, if you don't read/write in a proper manner,
Well, yes, but define proper. The VCL streams out integers and suchlike
one at a time, so you might perhaps define that as proper, and you agree
that it is slow for that. The buffered streams give the best of both
worlds, offering good performance for small data items.
/Matthew Jones/
 

Re: Working with very large text files - VB(Visual Basic) is faster than Delphi?

Doug,
There are LOTS of ways to search text FAR more quickly... but you might
be able to get a very nice speed jump in your simple, straightforward
test with only about 2 extra lines of code. They set Delphi to use a
much larger buffer for its built-in text file processing; the default
text buffer is only 128 bytes, IIRC. (Other techniques require explicit
buffer management... or, of course, getting some component...)
Either get some heap memory, or allocate some stack for a buffer. I'll
go quick and easy here with stack. Declare:
BigBuffer : array[0..256000 {more or less ;-) }] of byte; // Watch
your stack limits, if you use stack
Then, IMMEDIATELY after your AssignFile() call:
System.SetTextBuf(hHandle, BigBuffer);
Of course, searching larger chunks (not line-by-line), if possible,
should yield more speed, and using search techniques like
Boyer-Moore... would help even more...
Good Luck.