Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2389 > unrolled thread
| Started by | Andy <borucki.andrzej@gmail.com> |
|---|---|
| First post | 2019-12-20 04:53 -0800 |
| Last post | 2019-12-21 19:58 +0100 |
| Articles | 4 — 3 participants |
Back to article view | Back to comp.compilers
Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-20 04:53 -0800
Re: Reachability of DFA part Kaz Kylheku <493-878-3164@kylheku.com> - 2019-12-20 16:06 +0000
Re: Reachability of DFA part Andy <borucki.andrzej@gmail.com> - 2019-12-21 01:15 -0800
Re: Reachability of DFA part Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2019-12-21 19:58 +0100
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2019-12-20 04:53 -0800 |
| Subject | Reachability of DFA part |
| Message-ID | <19-12-008@comp.compilers> |
Sometmimes part of DFA can be unreached: ab(a|b)*ba - ba is unreachable. Solution is check back transitions - must be >0 back transitions from each state (and >0 transitions from each non-finished state)? Or are better methods - if can exists cases, if all backtransitions count > 0 and still unreachable?
[toc] | [next] | [standalone]
| From | Kaz Kylheku <493-878-3164@kylheku.com> |
|---|---|
| Date | 2019-12-20 16:06 +0000 |
| Message-ID | <19-12-009@comp.compilers> |
| In reply to | #2389 |
On 2019-12-20, Andy <borucki.andrzej@gmail.com> wrote:
> Sometmimes part of DFA can be unreached: ab(a|b)*ba - ba is unreachable.
That doesn't appear to be so; of course, the regular expression demands
a match on the trailing "ba".
In the NFA automaton, every time the prefix of the input given so far
ends in ba, the machine will be in an acceptance state corresponding
to that trailing ba.
Possible NFA graph:
.b -. .- b -> (3) -> a -> ((4))
| |
v ,
(0) -> a (1) -> b (2) --.
^ |
| |
` a -'
In the DFA, there will be a reachable state (subset of NFA states) which
contains ((4)).
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
[toc] | [prev] | [next] | [standalone]
| From | Andy <borucki.andrzej@gmail.com> |
|---|---|
| Date | 2019-12-21 01:15 -0800 |
| Message-ID | <19-12-012@comp.compilers> |
| In reply to | #2390 |
W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
napisał:
> In the NFA automaton, every time the prefix of the input given so far
> ends in ba, the machine will be in an acceptance state corresponding
> to that trailing ba.
Machine finally accept all strings begin from "ab" but "ba" will unused.
This is similar to definition of comment: in Pascal. comment begin at { and
end of }, careless definition is {*} which mark as comment to rest of file.
Good definition would be {[^}]*}
Complexity of problem increases when comment ends with string len >1, for
example C: */ or Pascal *)
if we renaming : /->a *->b other->c
then bad definition will ab(a|b|b)*ba and good definition is complicated:
ab(b|(a|c)*b*)*a (if I not make mistake)
Commments should maybe be defined in other way, especially comments can be
nested in Object Pascal. Comment nesting can using stack or simply counter. I
see, in Pascal is using counter. Difference: Pascal has two types of multiline
comments { } and (* *)
If we use stack, closing comment type must be equal last open comment type,
for counter - only count comments of type first opening, example
{ { (* } *) }
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2019-12-21 19:58 +0100 |
| Message-ID | <19-12-014@comp.compilers> |
| In reply to | #2393 |
Am 21.12.2019 um 10:15 schrieb Andy:
> W dniu piątek, 20 grudnia 2019 17:41:04 UTC+1 użytkownik Kaz Kylheku
> napisał:
> Machine finally accept all strings begin from "ab" but "ba" will unused.
> This is similar to definition of comment: in Pascal. comment begin at { and
> end of }, careless definition is {*} which mark as comment to rest of file.
Pascal is a too general term, with no special implementation implied.
A Pascal lexer typically is handcrafted or generated by CoCo/r, not with
lex/yacc or regular expressions.
> Good definition would be {[^}]*}
> Complexity of problem increases when comment ends with string len >1, for
> example C: */ or Pascal *)
Digraphs are a problem with many old languages, including C. They are
intended as direct replacements for other characters, conversion
performed before or in tokenization.
> if we renaming : /->a *->b other->c
> then bad definition will ab(a|b|b)*ba and good definition is complicated:
> ab(b|(a|c)*b*)*a (if I not make mistake)
>
> Commments should maybe be defined in other way, especially comments can be
> nested in Object Pascal. Comment nesting can using stack or simply counter.
> I see, in Pascal is using counter. Difference: Pascal has two types of multiline
> comments { } and (* *)
For digraphs see above. Nested comments are problematic only with single
pass lexer generators. With multiple stages for character substitution,
whitespace etc. no problems are known with Pascal tokens.
> If we use stack, closing comment type must be equal last open comment type,
> for counter - only count comments of type first opening, example
> { { (* } *) }
This again is implementation specific.
If you mean scannerless parsers with embedded regular expressions, they
have several problems with traditional languages. Traditional
tokenization has minor known problems with whitespace (including
comments), numbers and identifiers, which have been discussed and solved
since long. All other tokens are literals which deserve no sophisticated
lexer. In Pascal/Delphi even the dot tokens '.', '..', '...' don't cause
problems, unlike C where '..' is missing.
IMO a language should be constructed for easy compilation, with simple
terminal definitions for handcrafted lexers, or with a fully specified
conflict free formal token grammar, the latter type either with a
separate or embedded lexer. The first type also is human readable, while
the second type tends to result in write-only languages or describes
non-verbal grammars like for DNA. What's a formal token definition worth
if it cannot be proofed error free?
DoDi
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web