Board index » delphi » Checking whether a file is binary or text.

Checking whether a file is binary or text.

How to check it?

Michal Jaskolski

 

Re:Checking whether a file is binary or text.


On Thu, 03 Jul 1997 14:02:30 GMT, j...@gdansk.sprint.pl (Michal

Quote
Jaskolski) wrote:
>How to check it?

>Michal Jaskolski

You could open the file as binary, read the first 512 (or 1024, or
whatever certainty you want) bytes or so into a array of char, then
check whether this looks like ASCII text. This could be defined as
something like:
1) generally normal ASCII text characters (between #32 and #127)
a complication could be the use of characters above #128
2) most characters should come from 'a'..'z', 'A'..'Z', '0'..'9'
3) no 'funny characters' such as characters below #32 (except of
course CR (#13) and LF (#10), maybe TAB (#8) as well)
4) regular CR/LF or LF sequences (to indicate new lines)

This should generally enable you to distinguish between binary and
text files.

hth
David

------------------
David A. Schweizer

iec ProGAMMA, The Netherlands
d.a.schwei...@gamma.rug.nl

Re:Checking whether a file is binary or text.


Quote
Michal Jaskolski wrote:
> How to check it?

The only way to be *SURE* that a file is binary is if you write it
and include some kind of signature at the beginning. Apart from that,
you stuck with making (dodgy) assumptions based on file extensions.

Chris.

Re:Checking whether a file is binary or text.


On Fri, 04 Jul 1997 09:41:24 GMT, d.a.schwei...@gamma.rug.nl (David A.

Quote
Schweizer) wrote:
>On Thu, 03 Jul 1997 14:02:30 GMT, j...@gdansk.sprint.pl (Michal
>Jaskolski) wrote:

>>How to check it?

>>Michal Jaskolski
>You could open the file as binary, read the first 512 (or 1024, or
>whatever certainty you want) bytes or so into a array of char, then
>check whether this looks like ASCII text. This could be defined as
>something like:

[deletia]

Quote
>3) no 'funny characters' such as characters below #32 (except of
>course CR (#13) and LF (#10), maybe TAB (#8) as well)

The standard Horizontal Tab character is actually #9, #8 is BackSpace.

Regards,

Stephen Posey
slpo...@concentric.net

Other Threads