Board index » delphi » Reading null-terminated strings from files

Reading null-terminated strings from files

Hello
I would like to read null-terminated strings from a file, in binary mode. I
don't know their lengths, I only know that the last byte of each string is
0. How can I do that?
I want to avoid this method:
- read byte per byte until finding 0
- calculate the length of the string
- read the string
because it's quite slow.

Thanks
syl...@club-internet.fr

 

Re:Reading null-terminated strings from files


"SylV" <syl...@club-internet.fr> schrieb im Newsbeitrag
news:3e0c2f1c$0$7360$7a628cd7@news.club-internet.fr...

Quote
> Hello
> I would like to read null-terminated strings from a file, in binary mode.
I
> don't know their lengths, I only know that the last byte of each string is
> 0. How can I do that?
> I want to avoid this method:
> - read byte per byte until finding 0
> - calculate the length of the string
> - read the string
> because it's quite slow.

> Thanks
> syl...@club-internet.fr

High Whoever

A maybe usable solution

1.) Open a Handle to the file with   CreateFile
2.) Use GetFileSize to obtain the amount chars in the file
3.) Create a Pchar-pointer with that size.
4.) Read the file into the Pchar
5.) Scan the Pchar for #0, the chars between are Your String

Season greetings
Peter

BTW: Realnames are highly appriciated in this group

Diese Mitteilung besteht zu 99,5% aus recycleten Bytes.
Ersetze die Zahlen in der emailadresse  durch Vokale
replace the numbers in the emiladdress with the according vocals

Re:Reading null-terminated strings from files


On Fri, 27 Dec 2002 16:42:25 +0100, "Peter Schultheis"
Quote
<peterschulth...@5t1n2t.1t> wrote:

<snip>

Quote

>A maybe usable solution

>1.) Open a Handle to the file with   CreateFile
>2.) Use GetFileSize to obtain the amount chars in the file
>3.) Create a Pchar-pointer with that size.
>4.) Read the file into the Pchar
>5.) Scan the Pchar for #0, the chars between are Your String

Alternatively :-

Var
   FS :TFileStream;
   Start, BufLen, L9 :Integer;
   AStr, Buffer :String;
   SL :TStringList;
Begin
  FS := TFileStream.Create( 'thefile.dat', fmOpenRead );
  BufLen := FS.Size;
  SetLength( Buffer, BufLen );
  FS.Read( Buffer[1], BufLen );
  FS.Free;

  // Now the Data is in Buffer
  SL := TStringList.Create;

  Start := 1;
  For L9 := 1 To BufLen Do
        If Buffer[L9] = #0 Then
           Begin
             AStr := Copy( Buffer, Start, L9 - Start );
             SL.Add( AStr ) ;
             Start := L9 + 1;
           End;

The last string needs to be handled, that can be done by making Buffer
1 byte longer and holding #0 in the last byte, if necessary.

Re:Reading null-terminated strings from files


Thanks for your answers.
It's a pity that TStringList.DelmimitedText doesn't work with #0 as
Delimiter...

Re:Reading null-terminated strings from files


In article <3e0c2f1c$0$7360$7a628...@news.club-internet.fr>, "SylV"

Quote
<syl...@club-internet.fr> writes:
>I would like to read null-terminated strings from a file, in binary mode. I
>don't know their lengths, I only know that the last byte of each string is
>0. How can I do that?
>I want to avoid this method:
>- read byte per byte until finding 0
>- calculate the length of the string
>- read the string
>because it's quite slow.

If the file text does not include carriage returns, and is not incredibly long,
you can do (untested) ...

var
  MS : TMemoryStream;
  FileStr : string;
  i : integer;
  SL : TStringList;
begin
  MS := TMemoryStream.Create;
  MS.LoadfromFile('D:\MyFile.txt');
  {make string appropriate length}
  SetLength(FileStr, MS.Size);
  {read stream int the string}
  MS.Read(FileStr[1], MS.Size);
  MS.Free;
  {change #0 to #13 to act as a separator}
  for i := 1 to Length(FileStr) do
    if FileStr[i] = #0 then
      FileStr[i] := #13;
  {create your stringlist}
  SL := TstringList.Create;
  {put the strings into the stringlist, they will break at the #13}
  SL.Text := FileStr;
  //
  // Now do what you want with the separate strings
  //
  SL.Free;
end;

Alan Lloyd
alangll...@aol.com

Re:Reading null-terminated strings from files


I'm assuming that you aren't using a carriage return to define each line -
you are using Chr(0).

A solution might be to just read the entire file into a string - you will
need to make your own routine (using read in conjunction with a textfile var
is easiest). Then create an array (dynamic if you like). Then iterate
through the chars of the string and set each array element with the index of
the null char (your custom linebreak). When using the data, refer to this
integer array for the linebreak positions.

var
  F: TextFile;
  Line, s: string;
  i: Integer;
  IntArray: array of Integer;
begin
  AssignFile(F, 'MyFile.txt');
  Rewrite(F);
  s := '';
  while not EOF(F) do
  begin
    Line := Read(F, Line);
    s := s + Line;
  end;
  for i:= 1 to Length(s) do
    if Ord(s[i]) = 0 then
    begin
      SetLength(IntArray, Length(IntArray) + 1);
      IntArray[Length(IntArray) - 1] := i;
    end;
  CloseFile(F);
end;

The way you solve this problem depends of it you want to load the file fast
or use the data loaded from the file fast. If the latter, load each line
(using the method that you mentioned you didn't like) into a tringlist or
string array. This will grant you faster line by line access. If you go the
string way, you will be able to load the file marginally faster, but
line-by-line access will be slower.

I would prefer the latter method (set up string array).

Hope this helps.

Re:Reading null-terminated strings from files


Loading the file into a string/buffer and changing #0 to #13 and then
loading into a stringlist will break the lines at #13, but it will also
break the lines at #10, #13, and #13#10. To get just the #0 terminated
strings
somewhere, somplace, you have to read byte by byte to find the #0s.

Following takes about 10sec to get all #0 terminated strings out of the
.exe file, while 10 sec may seem slow it is returning 56K..60K of
strings (.exe is chock full of #0s).

..............
function StreamNullStr(Stream: TStream; var bEOF: boolean): string;
var
  buf : array[0..$F000]of byte;
  i, red, Offset, len, xlen : longint;
begin
  result := '';

  // reads from current stream position
  Offset := Stream.Position;

  xlen := 0;
  len := -1;
  red := Stream.Read(buf[0], $F000);
  while (red > 0) and (len < 0) do begin
    i := 0;
    while (i <= red) and (len < 0) do begin
      if buf[i] = 0 then
        len := i;

      Inc(i);
    end;

    if len < 0 then begin
      Inc(xlen, red);
      red := Stream.Read(buf[0], $F000);
    end;
  end;

  bEOF := red = 0;

  if len > 0 then begin
    Inc(len, xlen);
    SetLength( result, len );

    Stream.Seek(Offset, soFromBeginning);
    Stream.Read( result[1], len);
    // next string position
    Stream.Seek(1, soFromCurrent);
  end else
    Stream.Seek(Offset +1, soFromBeginning);

end;

procedure TForm1.Button8Click(Sender: TObject);
var
  FS : TFileStream;
  bEOF : boolean;
  i : integer;
  Strs : TStringList;
begin
  Memo1.Clear;
  Memo1.Update;
  Memo1.Lines.BeginUpdate;

  Strs := TStringList.Create;
  FS := TFileStream.Create( ParamStr(0), fmOpenRead or
fmShareDenyWrite);
  try
    FS.Seek(0, soFromBeginning);
    bEOF := false;
    i := 0;
    while not bEOF do begin
      Strs.Add( StreamNullStr(FS, bEOF) );
      Inc(i);
    end;
    Strs.Add(Format('  %.0d lines read.',[i]));

    Memo1.Lines.Assign( Strs );
  finally
    FS.Free;
    Strs.Free;
    Memo1.Lines.EndUpdate;
    Memo1.SetFocus;
  end;
end;

..............

Quote
SylV wrote:

> Hello
> I would like to read null-terminated strings from a file, in binary mode. I
> don't know their lengths, I only know that the last byte of each string is
> 0. How can I do that?
> I want to avoid this method:
> - read byte per byte until finding 0
> - calculate the length of the string
> - read the string
> because it's quite slow.

> Thanks
> syl...@club-internet.fr

Re:Reading null-terminated strings from files


if your file size is too large to read into a string, and if you're using
WinNT or one of it's decendents (XP, 2000), you can use Memory Mapped files
(see CreateFileMapping and MapViewOfFile).

these may work on some other OSes (95, 98, etc.), but I'm not sure and I
think there are some limitations.  See MSDN for details:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/file...

Kelly

Re:Reading null-terminated strings from files


Quote
Kelly Leahy wrote in message ...
>if your file size is too large to read into a string, ...

...

...

Yeah, right.

If your file is larger than 2 GB, you have other problems.

And if anybody ever designed a video format where the
tags could be anywhere in the file, I suggest we make
him transmit it over an RFC 1149-compliant network
connection. Using pigs.

Groetjes,
Maarten Wiltink

Re:Reading null-terminated strings from files


Not everyone wants to allocate a large block of memory just for the purpose
of reading in the contents of a file (I'd rather not have to allocate a
500MB buffer just because I'm looking for a few strings in a file that big).

Mapping a view is an easy way to let the operating system decide whether the
buffer is too big or not (i.e. by swapping out blocks of memory that aren't
actively being used).

Kelly

Re:Reading null-terminated strings from files


In article <JDMQ9.57$yJ6.50052...@newssvr12.news.prodigy.com>, "Kelly Leahy"

Quote
<kellyle...@nospam.swbell.net> writes:
>Mapping a view is an easy way to let the operating system decide whether the
>buffer is too big or not (i.e. by swapping out blocks of memory that aren't
>actively being used).

Doesn't it do that anyway, however you allocate memory.

Alan Lloyd
alangll...@aol.com

Re:Reading null-terminated strings from files


It doesn't know as much if you allocate the memory yourself.  For instance,
you can use a flag to tell windows that you're going to go forward only
through the file, so it will swap out the earlier pages once you've loaded
the next page from the file.  It doesn't have quite as much information when
you use something like GlobalAlloc (or other similar allocation methods that
are based on the heap).

An example of another interesting effect: consider an allocation method that
zeros out the allocated memory (like GlobalAlloc with GPTR as the allocation
flags).  Let's say your file is 100MB in size.  You'll allocate a 100MB
buffer, windows will zero the memory (which will mark every page as "active"
for the time being).  Then you will read from the buffer sequentially.

1) How long does it take windows to zero the memory?
2) How long before the OS knows that you're not using the first page
anymore?
3) What if you don't need to read the entire file?  You just read it anyway,
since you loaded the entire file into the buffer.
4) How much swap space did you occupy by writing 100MB into memory when you
only needed a small amount.
5) What optimizations could be performed by the OS if it knows you are only
reading (not modifying) the data?  (it can safely destroy the pages since
they can be reread from the file, rather than committing them to the
swapfile).  The cost of writing the page to the swapfile is just as bad (if
not worse) than that of reading the page (again) from the input file, so why
waste the swap space?

I think you can see my point.

Kelly

Other Threads