Board index » delphi » need a utility to remove comment-lines

need a utility to remove comment-lines

Does somebody know about a utility to remove comment and comment-lines
from Delphi (D5+) sources.
I've thought about building such a thing myself. But 'adding comments
in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
that I can't imagine how to build a utility that is a 100% secure (or
better to say: I think I will have a great risc of destroying the
significant sourcecode, when I build something myself....)

(The pgm/utility needs not to be 'for free')

Alfons

 

Re:need a utility to remove comment-lines


"Alfons" <A...@infobron.nl> skrev i melding
news:ea850fa7.0106190432.75244425@posting.google.com...

Quote
> Does somebody know about a utility to remove comment and comment-lines
> from Delphi (D5+) sources.
> I've thought about building such a thing myself. But 'adding comments
> in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
> that I can't imagine how to build a utility that is a 100% secure (or
> better to say: I think I will have a great risc of destroying the
> significant sourcecode, when I build something myself....)

> (The pgm/utility needs not to be 'for free')

There are editors that do this. I use UltraEdit and I love it, but it costs
some $s. There may be free ones, using 'regular expressions'.
Actually, Delphi has it's own replacement routine with regular expressions.
Nevere succeeded using it, but I'm a vovice on regex.
--
Bjoerge Saether
Consultant / Developer
http://www.itte.no
Asker, Norway
bjorgeremovet...@itte.no (remove the obvious)

Re:need a utility to remove comment-lines


JRS:  In article <ea850fa7.0106190432.75244...@posting.google.com>, seen
in news:comp.lang.pascal.delphi.misc, Alfons <A...@infobron.nl> wrote at
Tue, 19 Jun 2001 05:32:32 :-

Quote
>Does somebody know about a utility to remove comment and comment-lines
>from Delphi (D5+) sources.
>I've thought about building such a thing myself. But 'adding comments
>in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
>that I can't imagine how to build a utility that is a 100% secure (or
>better to say: I think I will have a great risc of destroying the
>significant sourcecode, when I build something myself....)

I think that it would be fairly easy; my <URL: http://www.merlyn.demon.
co.uk/programs/clean-tp.pas> recognises comment in Pascal (no // there).

One needs, I believe, to read the file token by token looking for
"openers" such as { (* ' // and on finding an opener to look only for
the corresponding closer.  While doing so, output only if not in
comment.

Beware : *if* the compiler has a line-length limit, valid Delphi such as

   Write('kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk', (*
*) 'hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh', (*
*) 'gggggggggggggggggggggggggggggggggggggggggggggggggggggggggg') ;

might cease to be valid on comment removal.  If comment contains
newlines, output one newline.

--
? John Stockton, Surrey, UK.  j...@merlyn.demon.co.uk   Turnpike v4.00   MIME. ?
 <URL: http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
 <URL: http://www.merlyn.demon.co.uk/clpb-faq.txt> Pedt Scragg: c.l.p.b. mFAQ;
 <URL: ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.

Re:need a utility to remove comment-lines


On Tue, 19 Jun 2001 21:31:16 +0100, Dr John Stockton

Quote
<s...@merlyn.demon.co.uk> wrote:
>JRS:  In article <ea850fa7.0106190432.75244...@posting.google.com>, seen
>in news:comp.lang.pascal.delphi.misc, Alfons <A...@infobron.nl> wrote at
>Tue, 19 Jun 2001 05:32:32 :-
>>Does somebody know about a utility to remove comment and comment-lines
>>from Delphi (D5+) sources.
>>I've thought about building such a thing myself. But 'adding comments
>>in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
>>that I can't imagine how to build a utility that is a 100% secure (or
>>better to say: I think I will have a great risc of destroying the
>>significant sourcecode, when I build something myself....)

>I think that it would be fairly easy; my <URL: http://www.merlyn.demon.
>co.uk/programs/clean-tp.pas> recognises comment in Pascal (no // there).

>One needs, I believe, to read the file token by token looking for
>"openers" such as { (* ' // and on finding an opener to look only for
>the corresponding closer.  While doing so, output only if not in
>comment.

Except it gets a little trickier than that, because you also
have to keep track of whether you're inside a quoted string:

s:= '//there are no comments on this line';

Seems like we could read the file one character at a time,
keeping track of the "state" inCode, inQuote, inComment.
Or it might be simpler to have three states for the
three sorts of comments.

(Hmm, you said "token by token". _If_ we already have a valid
tokenizer then presumably it returns quoted strings as
single tokens already.)

Quote
>Beware : *if* the compiler has a line-length limit, valid Delphi such as

>   Write('kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk', (*
>*) 'hhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh', (*
>*) 'gggggggggggggggggggggggggggggggggggggggggggggggggggggggggg') ;

>might cease to be valid on comment removal.  If comment contains
>newlines, output one newline.

>--
>? John Stockton, Surrey, UK.  j...@merlyn.demon.co.uk   Turnpike v4.00   MIME. ?
> <URL: http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
> <URL: http://www.merlyn.demon.co.uk/clpb-faq.txt> Pedt Scragg: c.l.p.b. mFAQ;
> <URL: ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Re:need a utility to remove comment-lines


A comment-removal program could most easily be built using a Finite State
Automaton (FSA) algorithm.

An FSA uses two inputs:  { current_state, current_input_character }

to determine two outputs:  { new_state, action_to_perform }

The FSA is initialized to a known starting state.

As you can see, the algorithm immediately lends itself to use of a
two-dimensional table, or nested CASE statements, to determine the
behavior of the algorithm.  

The only tricky part of designing an FSA is to very carefully work out all
of the state-table information and to thoroughly debug it.  

Compiler-generators, such as LEX and YACC, make heavy use of
state-machines to drive their processing.  They allow you to create
descriptions of the grammar using a formal notation language, and produce
state-tables as their output.  But for an application as simple as this
one, you could do the job yourself.

Quote
In article <3b30ba1a.6053...@nntp.sprynet.com>, ullr...@math.okstate.edu wrote:
> On Tue, 19 Jun 2001 21:31:16 +0100, Dr John Stockton
> <s...@merlyn.demon.co.uk> wrote:

> >JRS:  In article <ea850fa7.0106190432.75244...@posting.google.com>, seen
> >in news:comp.lang.pascal.delphi.misc, Alfons <A...@infobron.nl> wrote at
> >Tue, 19 Jun 2001 05:32:32 :-
> >>Does somebody know about a utility to remove comment and comment-lines
> >>from Delphi (D5+) sources.
> >>I've thought about building such a thing myself. But 'adding comments
> >>in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
> >>that I can't imagine how to build a utility that is a 100% secure (or
> >>better to say: I think I will have a great risc of destroying the
> >>significant sourcecode, when I build something myself....)

> >I think that it would be fairly easy; my <URL: http://www.merlyn.demon.
> >co.uk/programs/clean-tp.pas> recognises comment in Pascal (no // there).

> >One needs, I believe, to read the file token by token looking for
> >"openers" such as { (* ' // and on finding an opener to look only for
> >the corresponding closer.  While doing so, output only if not in
> >comment.

> Except it gets a little trickier than that, because you also
> have to keep track of whether you're inside a quoted string:

> s:= '//there are no comments on this line';

> Seems like we could read the file one character at a time,
> keeping track of the "state" inCode, inQuote, inComment.
> Or it might be simpler to have three states for the
> three sorts of comments.

> (Hmm, you said "token by token". _If_ we already have a valid
> tokenizer then presumably it returns quoted strings as
> single tokens already.)

Re:need a utility to remove comment-lines


"David C. Ullrich" <ullr...@math.okstate.edu> wrote in message

Quote

> Seems like we could read the file one character at a time,
> keeping track of the "state" inCode, inQuote, inComment.
> Or it might be simpler to have three states for the
> three sorts of comments.

One needs 1 character look-ahead, but essentially I think a character scan
would be the best way. Actually tokenizing the input is more trouble than
its worth and would just slow things down. The look-ahead is required to
cope with (*, {$, etc. One problem of course is that the source needs to be
free of syntactic errors that might mislead the scanner, e.g. un-terminated
strings.

     Procedure RemoveComments (src, dst : tStream);

     type CharSet        = set of char;

     const lnTerminators = [#10, #13];

     var  nxtCh          : char;
          lnStart,
          scanLth        : integer;
          commentLine,
          allBlank       : boolean;

          function ReadCheck (var aCh : Char; chk : Char) : boolean;

          begin
          src.Read (aCh, 1);
          result := aCh = chk;
          end;

          procedure WriteAChar (ch : Char);

          const{*word*19}ed : boolean = false;

          begin
          if ch in lnTerminators
          then begin
               if allBlank and commentLine
               then dst.Position := lnStart
               else begin
                    dst.Write (ch, 1);
                    lnStart := dst.Position;
                    allBlank := True;
                    end;
               if{*word*19}ed
               then begin
                    commentLine := False;
                   {*word*19}ed := False;
                    end
               else{*word*19}ed := True;
               end
          else begin
               if{*word*19}ed
               then begin
                    commentLine := False;
                   {*word*19}ed := False;
                    end;
               allBlank := allBlank and ((ch = ' ') or (ch = #9));
               dst.Write (ch, 1);
               end;
          end;

          procedure WriteTwo (one, two : Char);

          begin
          WriteAChar (one);
          WriteAChar (two);
          end;

          procedure ScanToChar (wantChar : CharSet; var aCh : Char);

          begin
          repeat
               src.Read (aCh, 1);
          until (src.Position > src.Size) or (aCh in wantChar);
          end;

          procedure OpenBrace;

          var  aCh            : Char;

          begin
          if src.Position < (scanLth - 1)
          then begin
               if ReadCheck (aCh, '$')
               then WriteTwo (nxtCh, aCh)
               else begin
                    ScanToChar (['}'], aCh);
                    commentLine := True;
                    end;
               end
          else WriteAChar (nxtCh);
          end;

          procedure OpenBracket;

          var  aCh            : Char;

          begin
          if src.Position < (scanLth - 1)
          then begin
               if not ReadCheck (aCh, '*')
               then WriteTwo (nxtCh, aCh)
               else begin
                    repeat
                         repeat
                              src.Read (aCh, 1);
                         until (src.Position >= scanLth) or (aCh = '*');
                         if src.Position < scanLth
                         then src.Read (aCh, 1);
                    until (src.Position >= scanLth) or (aCh = ')');
                    commentLine := True;
                    end;
               end
          else WriteAChar (nxtCh);
          end;

          procedure Slash;

          var  aCh            : Char;

          begin
          if src.Position < (scanLth - 1)
          then begin
               if not ReadCheck (aCh, '/')
               then WriteTwo (nxtCh, aCh)
               else begin
                    ScanToChar (lnTerminators, aCh);
                    commentLine := True;
                    WriteAChar (aCh);
                    end;
               end
          else WriteAChar (nxtCh);
          end;

          procedure Quote;

          var  aCh            : Char;

          begin
          WriteAChar (nxtCh);
          if src.Position < (scanLth - 1)
          then begin
               repeat
                    src.Read (aCh, 1);
                    WriteAChar (aCh);
               until (src.Position >= scanLth) or (aCh = '''');
               end;
          end;

     begin
     src.Position := 0;
     dst.Position := 0;
     scanLth := src.Size;
     commentLine := False;
     allBlank := True;
     while (src.Position < scanLth) do
          begin
          src.Read (nxtCh, 1);
          case nxtCh of
               '{' : OpenBrace;
               '(' : OpenBracket;
               '/' : Slash;
               '''' : Quote;
               else WriteAChar (nxtCh);
               end;
          end;
     end;

Re:need a utility to remove comment-lines


In article <info-2006011315580...@slip-32-102-36-7.ca.us.prserv.net>,
i...@sundialservices.com (Sundial Services International, Inc.) wrote:

Quote
> A comment-removal program could most easily be built using a Finite State
> Automaton (FSA) algorithm.

> An FSA uses two inputs:  { current_state, current_input_character }

> to determine two outputs:  { new_state, action_to_perform }

To elaborate on my FSA suggestion, here are some possible "states" that
the machine could have:

ST_INITIAL:
The initial state; the scanner is not in any known comment.  The current
position in the file is the beginning of a contiguous block of text that
is "not within a comment."

ST_INITIAL_2:
Same as above, but we're somewhere in the middle of a contiguous block of
text that isn't in a comment, and have not yet reached the end of it.

ST_IN_QUOTED_STRING:
The scanner has seen a single-quote character (while not in a comment) and
is waiting for an ending quote (or an #0 character indicating end-of-file,
see below).

ST_IN_BRACES_COMMENT:
The scanner has seen a '{' character and is looping for the '}'.  The
computer has therefore now reached the end of any contiguous block of
un-commented text (if it was in one previously).

ST_MAYBE_SLASHSLASH_COMMENT:
The scanner has seen the first '/' that could plausibly indicate the start
of a "//"-style comment but doesn't know yet if the next character is also
a "/".  This would mark the end of a block of un-commented text IF the
next character turns out to be another '/'.

ST_IN_SLASHSLASH_COMMENT:
The scanner has indeed seen the second '/' and is now looping for end-of-line.

It is often helpful if the FSA's "get next character" routine can
gloss-over such details as the fact that end-of-line in MS-DOS is
indicated by two bytes (CR/LF); and it's helpful if the scanner can
indicate end-of-file by returning a null character ($00).  The scanner
should abort, assuming a bug has been found, if it is asked to return too
many $00 characters after reaching end-of-file.

My cryptic comments about "a contiguous block of un-commented text"
reflect my idea that it would be more efficient if the computer could
write out an entire range of bytes to the output all at once.  But then
again, maybe this is not such a good idea; maybe it doesn't matter; maybe
it adds more complexity and debugging-time than it could ever pay back in
speed.  

An FSA algorithm contributes a lot of speed just by being simple.  After
writing the last paragraph I briefly mused going back and ripping out all
references to the "contiguous block" idea .. but then decided to leave it
in because it does reflect the extemporaneous thought-process that went
through my head while writing this.  

Quote
> The FSA is initialized to a known starting state.

> As you can see, the algorithm immediately lends itself to use of a
> two-dimensional table, or nested CASE statements, to determine the
> behavior of the algorithm.  

> The only tricky part of designing an FSA is to very carefully work out all
> of the state-table information and to thoroughly debug it.  

> Compiler-generators, such as LEX and YACC, make heavy use of
> state-machines to drive their processing.  They allow you to create
> descriptions of the grammar using a formal notation language, and produce
> state-tables as their output.  But for an application as simple as this
> one, you could do the job yourself.

Re:need a utility to remove comment-lines


JRS:  In article <3b30ba1a.6053...@nntp.sprynet.com>, seen in news:comp.
lang.pascal.delphi.misc, David C. Ullrich <ullr...@math.okstate.edu>
wrote at Wed, 20 Jun 2001 15:04:07 :-

Quote
>On Tue, 19 Jun 2001 21:31:16 +0100, Dr John Stockton
><s...@merlyn.demon.co.uk> wrote:

>>JRS:  In article <ea850fa7.0106190432.75244...@posting.google.com>, seen
>>in news:comp.lang.pascal.delphi.misc, Alfons <A...@infobron.nl> wrote at
>>Tue, 19 Jun 2001 05:32:32 :-
>>>Does somebody know about a utility to remove comment and comment-lines
>>>from Delphi (D5+) sources.
>>>I've thought about building such a thing myself. But 'adding comments
>>>in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
>>>that I can't imagine how to build a utility that is a 100% secure (or
>>>better to say: I think I will have a great risc of destroying the
>>>significant sourcecode, when I build something myself....)

>>I think that it would be fairly easy; my <URL: http://www.merlyn.demon.
>>co.uk/programs/clean-tp.pas> recognises comment in Pascal (no // there).

>>One needs, I believe, to read the file token by token looking for
>>"openers" such as { (* ' // and on finding an opener to look only for
>>the corresponding closer.  While doing so, output only if not in
>>comment.

>Except it gets a little trickier than that, because you also
>have to keep track of whether you're inside a quoted string:

>s:= '//there are no comments on this line';

No, my words allow for that.  On finding the opener, "'", one looks only
for the corresponding closer, which in this case is "'".  For the
present purpose, the special case of "''" within a string can be
disregarded.

Quote
>Seems like we could read the file one character at a time,
>keeping track of the "state" inCode, inQuote, inComment.
>Or it might be simpler to have three states for the
>three sorts of comments.

One IMHO *must* distinguish the three types of comments (fortunately,
this isn't Algol 60), and one can do so by storing the closer currently
being sought - and if no closer is being sought, one is seeking any
opener.

Quote
>(Hmm, you said "token by token". _If_ we already have a valid
>tokenizer then presumably it returns quoted strings as
>single tokens already.)

Token is perhaps the wrong word, then - one considers one item at a
time, where an item is one of the set [ "'", "{", "}", "(*", "*)", "//",
newline, anything-else ], so to speak.  I don't recall whether Delphi
allows any relevant multi-character substitutions, as Pascal did.

I have used the aforementioned clean-tp.pas, which finds comment and
strings that way, for several years - it's a tool in my BP7 Tools menu,
and I apply it frequently when writing any Pascal program to do the line
indentation (for which it must ignore begin end etc., when in
comment/string).  It certainly accommodates all that I write (and, in
development, it was very clear that errors become conspicuous by their
effects).

But I've not upgraded it to Delphi.

--
? John Stockton, Surrey, UK.  j...@merlyn.demon.co.uk   Turnpike v4.00   MIME. ?
 <URL: http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
 <URL: http://www.merlyn.demon.co.uk/clpb-faq.txt> Pedt Scragg: c.l.p.b. mFAQ;
 <URL: ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.

Re:need a utility to remove comment-lines


I must ask; Why do you wish to remove comment lines?

It wont make your EXEs smaller, as I've heard misguided people say in
the past, as they are removed during compilation, anyway.

Surely it would be better to leave your source code commented?

Re:need a utility to remove comment-lines


Quite - I've been thinking the same thing

On Thu, 21 Jun 2001 14:36:45 +1200, Gurble

Quote
<gurbleREM...@THISclear.net.nz> wrote:
>I must ask; Why do you wish to remove comment lines?

>It wont make your EXEs smaller, as I've heard misguided people say in
>the past, as they are removed during compilation, anyway.

>Surely it would be better to leave your source code commented?

Re:need a utility to remove comment-lines


On Wed, 20 Jun 2001 13:15:58 -0400, i...@sundialservices.com (Sundial

Quote
Services International, Inc.) wrote:
>A comment-removal program could most easily be built using a Finite State
>Automaton (FSA) algorithm.

Right. That's a _much_ better idea than my suggestion

"Seems like we could read the file one character at a time,
keeping track of the "state" inCode, inQuote, inComment."

(heh-heh.)

Quote
>An FSA uses two inputs:  { current_state, current_input_character }

>to determine two outputs:  { new_state, action_to_perform }

>The FSA is initialized to a known starting state.

>As you can see, the algorithm immediately lends itself to use of a
>two-dimensional table, or nested CASE statements, to determine the
>behavior of the algorithm.  

>The only tricky part of designing an FSA is to very carefully work out all
>of the state-table information and to thoroughly debug it.  

I've actually written plenty of finite-state machines to do text
processing. Seemed like a pain the first time, but I was
astonished at how well the thing worked. Nested case statements
is exactly what I've tended to use for those.

The other day I made a much more OOP and extremely general
Delphi finite-state machine. There's a TState class, which
you subclass, one subclass for each state in the machine.
The machine has an Execute method. You pass some text and
an InitialState to Execute. The machine calls the state's
Transition method in a loop; the Transition method
reads a character, does whatever needs to be done on the
basis of that character, then returns another State. the
machine calls the next State, continues until one of the
Transition calls returns HaltState.

No doubt far from the best possible performance but things
are nicely encapsulated; that {*word*193} nseted case statement
gets broken into one case statement inside each state's
Execute method. Haven't had a chance to give it a
non-trivial test, but the single-state test, converting
"text" to "tteexxtt", worked just fine.

Quote
>Compiler-generators, such as LEX and YACC, make heavy use of
>state-machines to drive their processing.  They allow you to create
>descriptions of the grammar using a formal notation language, and produce
>state-tables as their output.  But for an application as simple as this
>one, you could do the job yourself.

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Re:need a utility to remove comment-lines


On Wed, 20 Jun 2001 15:39:12 -0400, i...@sundialservices.com (Sundial

Quote
Services International, Inc.) wrote:
>In article <info-2006011315580...@slip-32-102-36-7.ca.us.prserv.net>,
>i...@sundialservices.com (Sundial Services International, Inc.) wrote:

>> A comment-removal program could most easily be built using a Finite State
>> Automaton (FSA) algorithm.

>> An FSA uses two inputs:  { current_state, current_input_character }

>> to determine two outputs:  { new_state, action_to_perform }

>To elaborate on my FSA suggestion, here are some possible "states" that
>the machine could have:

>ST_INITIAL:
>The initial state; the scanner is not in any known comment.  The current
>position in the file is the beginning of a contiguous block of text that
>is "not within a comment."

>ST_INITIAL_2:
>Same as above, but we're somewhere in the middle of a contiguous block of
>text that isn't in a comment, and have not yet reached the end of it.

>ST_IN_QUOTED_STRING:
>The scanner has seen a single-quote character (while not in a comment) and
>is waiting for an ending quote (or an #0 character indicating end-of-file,
>see below).

>ST_IN_BRACES_COMMENT:
>The scanner has seen a '{' character and is looping for the '}'.  The
>computer has therefore now reached the end of any contiguous block of
>un-commented text (if it was in one previously).

>ST_MAYBE_SLASHSLASH_COMMENT:
>The scanner has seen the first '/' that could plausibly indicate the start
>of a "//"-style comment but doesn't know yet if the next character is also
>a "/".  This would mark the end of a block of un-commented text IF the
>next character turns out to be another '/'.

>ST_IN_SLASHSLASH_COMMENT:
>The scanner has indeed seen the second '/' and is now looping for end-of-line.

Again - that's a _much_ better idea than the collection of states
I suggested yesterday..

- Show quoted text -

Quote
>It is often helpful if the FSA's "get next character" routine can
>gloss-over such details as the fact that end-of-line in MS-DOS is
>indicated by two bytes (CR/LF); and it's helpful if the scanner can
>indicate end-of-file by returning a null character ($00).  The scanner
>should abort, assuming a bug has been found, if it is asked to return too
>many $00 characters after reaching end-of-file.

>My cryptic comments about "a contiguous block of un-commented text"
>reflect my idea that it would be more efficient if the computer could
>write out an entire range of bytes to the output all at once.  But then
>again, maybe this is not such a good idea; maybe it doesn't matter; maybe
>it adds more complexity and debugging-time than it could ever pay back in
>speed.  

>An FSA algorithm contributes a lot of speed just by being simple.  After
>writing the last paragraph I briefly mused going back and ripping out all
>references to the "contiguous block" idea .. but then decided to leave it
>in because it does reflect the extemporaneous thought-process that went
>through my head while writing this.  

Yup. In the thing I did the rules are much less rigid - a state
_can_ read as much as it wants, (it can also UnGetChar if it
wants, for lookahead.) And it _can_ do whatever it wants to
the output stream.

So for example the states I have in mind for the comment
{*word*49} _could_ be made more efficient by having a single
call to Transition loop and look for the end of the current
comment. But they're not going to - seems more complicated,
and the speedup should not be that great. Instead they're
going to determine that they're still inside a comment
and return themselves to the machine, so they get called again.

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Re:need a utility to remove comment-lines


On Wed, 20 Jun 2001 15:02:35 -0400, "Bruce Roberts"

Quote
<b...@bounceitattcanada.xnet> wrote:

>"David C. Ullrich" <ullr...@math.okstate.edu> wrote in message

>> Seems like we could read the file one character at a time,
>> keeping track of the "state" inCode, inQuote, inComment.
>> Or it might be simpler to have three states for the
>> three sorts of comments.

>One needs 1 character look-ahead, but essentially I think a character scan
>would be the best way.

You need one-character lookahead to do it the way I have in mind.
You _can_ do it with _no_ lookahead, with a larger number of states:
If the state is InCode and we read a '/' character then instead
of looking ahead to see whether it's the start of a comment we
could just make a transition to the MaybeSlashComment state.

Quote
>Actually tokenizing the input is more trouble than
>its worth and would just slow things down.

Unless we already have a tokenizer...

Quote
> The look-ahead is required to
>cope with (*, {$, etc. One problem of course is that the source needs to be
>free of syntactic errors that might mislead the scanner, e.g. un-terminated
>strings.

>     Procedure RemoveComments (src, dst : tStream);

[example snipped]

Now I have to work out my version...

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Re:need a utility to remove comment-lines


On Wed, 20 Jun 2001 23:19:59 +0100, Dr John Stockton

Quote
<s...@merlyn.demon.co.uk> wrote:
>JRS:  In article <3b30ba1a.6053...@nntp.sprynet.com>, seen in news:comp.
>lang.pascal.delphi.misc, David C. Ullrich <ullr...@math.okstate.edu>
>wrote at Wed, 20 Jun 2001 15:04:07 :-
>>On Tue, 19 Jun 2001 21:31:16 +0100, Dr John Stockton
>><s...@merlyn.demon.co.uk> wrote:

>>>JRS:  In article <ea850fa7.0106190432.75244...@posting.google.com>, seen
>>>in news:comp.lang.pascal.delphi.misc, Alfons <A...@infobron.nl> wrote at
>>>Tue, 19 Jun 2001 05:32:32 :-
>>>>Does somebody know about a utility to remove comment and comment-lines
>>>>from Delphi (D5+) sources.
>>>>I've thought about building such a thing myself. But 'adding comments
>>>>in delphi-sources' has so much flexibility (i.e. '{','(*','//'..),
>>>>that I can't imagine how to build a utility that is a 100% secure (or
>>>>better to say: I think I will have a great risc of destroying the
>>>>significant sourcecode, when I build something myself....)

>>>I think that it would be fairly easy; my <URL: http://www.merlyn.demon.
>>>co.uk/programs/clean-tp.pas> recognises comment in Pascal (no // there).

>>>One needs, I believe, to read the file token by token looking for
>>>"openers" such as { (* ' // and on finding an opener to look only for
>>>the corresponding closer.  While doing so, output only if not in
>>>comment.

>>Except it gets a little trickier than that, because you also
>>have to keep track of whether you're inside a quoted string:

>>s:= '//there are no comments on this line';

>No, my words allow for that.  On finding the opener, "'", one looks only
>for the corresponding closer, which in this case is "'".

right - didn't realize you were using "opener" to refer to anything
but comments.

Quote
>  For the
>present purpose, the special case of "''" within a string can be
>disregarded.

s:= 'I don''t see why you say that - this is valid code...'

Quote
>>Seems like we could read the file one character at a time,
>>keeping track of the "state" inCode, inQuote, inComment.
>>Or it might be simpler to have three states for the
>>three sorts of comments.

>One IMHO *must* distinguish the three types of comments (fortunately,
>this isn't Algol 60), and one can do so by storing the closer currently
>being sought - and if no closer is being sought, one is seeking any
>opener.

Oh of course we need to distinguish the three sorts of comments.
I was just musing on whether each should have its own state
or the InState state should "remember" which sort of comment
it is. (For the thing I've thought of each will have a separate
state - seems simpler when I think about it, because some comments
require lookahead to find when we've got to the end and some
do not.)

- Show quoted text -

Quote
>>(Hmm, you said "token by token". _If_ we already have a valid
>>tokenizer then presumably it returns quoted strings as
>>single tokens already.)

>Token is perhaps the wrong word, then - one considers one item at a
>time, where an item is one of the set [ "'", "{", "}", "(*", "*)", "//",
>newline, anything-else ], so to speak.  I don't recall whether Delphi
>allows any relevant multi-character substitutions, as Pascal did.

>I have used the aforementioned clean-tp.pas, which finds comment and
>strings that way, for several years - it's a tool in my BP7 Tools menu,
>and I apply it frequently when writing any Pascal program to do the line
>indentation (for which it must ignore begin end etc., when in
>comment/string).  It certainly accommodates all that I write (and, in
>development, it was very clear that errors become conspicuous by their
>effects).

I'm certain it works just fine - was just commenting on the
fact that you seemed to be ignoring quoted strings (and the
only reason I thought that was that I wasn't paying attention.)

Quote
>But I've not upgraded it to Delphi.

>--
>? John Stockton, Surrey, UK.  j...@merlyn.demon.co.uk   Turnpike v4.00   MIME. ?
> <URL: http://www.merlyn.demon.co.uk/> TP/BP/Delphi/&c., FAQqy topics & links;
> <URL: http://www.merlyn.demon.co.uk/clpb-faq.txt> Pedt Scragg: c.l.p.b. mFAQ;
> <URL: ftp://garbo.uwasa.fi/pc/link/tsfaqp.zip> Timo Salmi's Turbo Pascal FAQ.

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Re:need a utility to remove comment-lines


On Thu, 21 Jun 2001 14:36:45 +1200, Gurble

Quote
<gurbleREM...@THISclear.net.nz> wrote:
>I must ask; Why do you wish to remove comment lines?

>It wont make your EXEs smaller, as I've heard misguided people say in
>the past, as they are removed during compilation, anyway.

>Surely it would be better to leave your source code commented?

What would be really nice would be an editor that hides
comments on request. The point being that when I
comment things carefully I find the code looks more
like a manual than like code, with maybe five lines
of code spread through 20 lines of comments. You
need the comments to explain what the code does,
but they can reduce "readability" in _one_ sense.

David C. Ullrich
*********************
"Sometimes you can have access violations all the
time and the program still works." (Michael Caracena,
comp.lang.pascal.delphi.misc 5/1/01)

Go to page: [1] [2]

Other Threads