Groups > comp.compilers > #2389 > unrolled thread

Reachability of DFA part

Started by	Andy <borucki.andrzej@gmail.com>
First post	2019-12-20 04:53 -0800
Last post	2019-12-21 19:58 +0100
Articles	4 — 3 participants

Back to article view | Back to comp.compilers

  Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-20 04:53 -0800
    Re: Reachability of DFA part Kaz Kylheku <493-878-3164@kylheku.com> - 2019-12-20 16:06 +0000
      Re: Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-21 01:15 -0800
        Re: Reachability of DFA part Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-12-21 19:58 +0100

#2389 — Reachability of DFA part

From	Andy <borucki.andrzej@gmail.com>
Date	2019-12-20 04:53 -0800
Subject	Reachability of DFA part
Message-ID	<19-12-008@comp.compilers>

Sometmimes part of DFA can be unreached: ab(a|b)*ba - ba is unreachable.
Solution is check back transitions - must be >0 back transitions from each
state (and >0 transitions from each non-finished state)? Or are better methods
- if can exists cases, if all backtransitions count > 0 and still
unreachable?

[toc] | [next] | [standalone]

#2390

From	Kaz Kylheku <493-878-3164@kylheku.com>
Date	2019-12-20 16:06 +0000
Message-ID	<19-12-009@comp.compilers>
In reply to	#2389

On 2019-12-20, Andy <borucki.andrzej@gmail.com> wrote:
> Sometmimes part of DFA can be unreached: ab(a|b)*ba - ba is unreachable.

That doesn't appear to be so; of course, the regular expression demands
a match on the trailing "ba".

In the NFA automaton, every time the prefix of the input given so far
ends in ba, the machine will be in an acceptance state corresponding
to that trailing ba.

Possible NFA graph:

                      .b -. .- b -> (3) -> a -> ((4))
                      |    |
                      v    ,
   (0) -> a (1) -> b (2) --.
                      ^    |
                      |    |
                      ` a -'

In the DFA, there will be a reachable state (subset of NFA states) which
contains ((4)).

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List:  http://www.kylheku.com/diy
ADA MP-1 Mailing List:   http://www.kylheku.com/mp1

[toc] | [prev] | [next] | [standalone]

#2393

From	Andy <borucki.andrzej@gmail.com>
Date	2019-12-21 01:15 -0800
Message-ID	<19-12-012@comp.compilers>
In reply to	#2390

W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
napisał:
> In the NFA automaton, every time the prefix of the input given so far
> ends in ba, the machine will be in an acceptance state corresponding
> to that trailing ba.

Machine finally accept all strings begin from "ab" but "ba" will unused.
This is similar to definition of comment: in Pascal. comment begin at { and
end of }, careless definition is {*} which mark as comment to rest of file.

Good definition would be {[^}]*}
Complexity of problem increases when comment ends with string len >1, for
example C: */ or Pascal *)
if we renaming : /->a *->b other->c
then bad definition will ab(a|b|b)*ba and good definition is complicated:
ab(b|(a|c)*b*)*a (if I not make mistake)

Commments should maybe be defined in other way, especially comments can be
nested in Object Pascal. Comment nesting can using stack or simply counter. I
see, in Pascal is using counter. Difference: Pascal has two types of multiline
comments { } and (* *)

If we use stack, closing comment type must be equal last open comment type,
for counter - only count comments of type first opening, example
{ { (* } *) }

[toc] | [prev] | [next] | [standalone]

#2395

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2019-12-21 19:58 +0100
Message-ID	<19-12-014@comp.compilers>
In reply to	#2393

Am 21.12.2019 um 10:15 schrieb Andy:
> W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
> napisał:

> Machine finally accept all strings begin from "ab" but "ba" will unused.
> This is similar to definition of comment: in Pascal. comment begin at { and
> end of }, careless definition is {*} which mark as comment to rest of file.

Pascal is a too general term, with no special implementation implied.

A Pascal lexer typically is handcrafted or generated by CoCo/r, not with
lex/yacc or regular expressions.


> Good definition would be {[^}]*}
> Complexity of problem increases when comment ends with string len >1, for
> example C: */ or Pascal *)

Digraphs are a problem with many old languages, including C. They are
intended as direct replacements for other characters, conversion
performed before or in tokenization.


> if we renaming : /->a *->b other->c
> then bad definition will ab(a|b|b)*ba and good definition is complicated:
> ab(b|(a|c)*b*)*a (if I not make mistake)
>
> Commments should maybe be defined in other way, especially comments can be
> nested in Object Pascal. Comment nesting can using stack or simply counter.

> I see, in Pascal is using counter. Difference: Pascal has two types of multiline
> comments { } and (* *)

For digraphs see above. Nested comments are problematic only with single
pass lexer generators. With multiple stages for character substitution,
whitespace etc. no problems are known with Pascal tokens.


> If we use stack, closing comment type must be equal last open comment type,
> for counter - only count comments of type first opening, example
> { { (* } *) }

This again is implementation specific.


If you mean scannerless parsers with embedded regular expressions, they
have several problems with traditional languages. Traditional
tokenization has minor known problems with whitespace (including
comments), numbers and identifiers, which have been discussed and solved
since long. All other tokens are literals which deserve no sophisticated
lexer. In Pascal/Delphi even the dot tokens '.', '..', '...' don't cause
problems, unlike C where '..' is missing.


IMO a language should be constructed for easy compilation, with simple
terminal definitions for handcrafted lexers, or with a fully specified
conflict free formal token grammar, the latter type either with a
separate or embedded lexer. The first type also is human readable, while
the second type tends to result in write-only languages or describes
non-verbal grammars like for DNA. What's a formal token definition worth
if it cannot be proofed error free?

DoDi

[toc] | [prev] | [standalone]

csiph-web

Reachability of DFA part

Contents

#2389 — Reachability of DFA part

#2390

#2393

#2395