Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #2395

Re: Reachability of DFA part

From Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Newsgroups comp.compilers
Subject Re: Reachability of DFA part
Date 2019-12-21 19:58 +0100
Organization Compilers Central
Message-ID <19-12-014@comp.compilers> (permalink)
References <19-12-008@comp.compilers> <19-12-009@comp.compilers> <19-12-012@comp.compilers>

Show all headers | View raw


Am 21.12.2019 um 10:15 schrieb Andy:
> W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
> napisał:

> Machine finally accept all strings begin from "ab" but "ba" will unused.
> This is similar to definition of comment: in Pascal. comment begin at { and
> end of }, careless definition is {*} which mark as comment to rest of file.

Pascal is a too general term, with no special implementation implied.

A Pascal lexer typically is handcrafted or generated by CoCo/r, not with
lex/yacc or regular expressions.


> Good definition would be {[^}]*}
> Complexity of problem increases when comment ends with string len >1, for
> example C: */ or Pascal *)

Digraphs are a problem with many old languages, including C. They are
intended as direct replacements for other characters, conversion
performed before or in tokenization.


> if we renaming : /->a *->b other->c
> then bad definition will ab(a|b|b)*ba and good definition is complicated:
> ab(b|(a|c)*b*)*a (if I not make mistake)
>
> Commments should maybe be defined in other way, especially comments can be
> nested in Object Pascal. Comment nesting can using stack or simply counter.

> I see, in Pascal is using counter. Difference: Pascal has two types of multiline
> comments { } and (* *)

For digraphs see above. Nested comments are problematic only with single
pass lexer generators. With multiple stages for character substitution,
whitespace etc. no problems are known with Pascal tokens.


> If we use stack, closing comment type must be equal last open comment type,
> for counter - only count comments of type first opening, example
> { { (* } *) }

This again is implementation specific.


If you mean scannerless parsers with embedded regular expressions, they
have several problems with traditional languages. Traditional
tokenization has minor known problems with whitespace (including
comments), numbers and identifiers, which have been discussed and solved
since long. All other tokens are literals which deserve no sophisticated
lexer. In Pascal/Delphi even the dot tokens '.', '..', '...' don't cause
problems, unlike C where '..' is missing.


IMO a language should be constructed for easy compilation, with simple
terminal definitions for handcrafted lexers, or with a fully specified
conflict free formal token grammar, the latter type either with a
separate or embedded lexer. The first type also is human readable, while
the second type tends to result in write-only languages or describes
non-verbal grammars like for DNA. What's a formal token definition worth
if it cannot be proofed error free?

DoDi

Back to comp.compilers | Previous | NextPrevious in thread | Find similar


Thread

Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-20 04:53 -0800
  Re: Reachability of DFA part Kaz Kylheku <493-878-3164@kylheku.com> - 2019-12-20 16:06 +0000
    Re: Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-21 01:15 -0800
      Re: Reachability of DFA part Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-12-21 19:58 +0100

csiph-web