Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3018 > unrolled thread
| Started by | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| First post | 2022-05-22 12:32 +0300 |
| Last post | 2022-05-28 04:51 +0000 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.compilers
Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-22 12:32 +0300
Re: Who first invented dotted-item notation? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-05-22 18:29 +0000
RE: Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-23 13:23 +0300
Re: Who first invented dotted-item notation? gah4 <gah4@u.washington.edu> - 2022-05-23 18:31 -0700
RE: Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-24 11:27 +0300
Re: Who first invented dotted-item notation? antispam@math.uni.wroc.pl - 2022-05-28 04:51 +0000
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2022-05-22 12:32 +0300 |
| Subject | Who first invented dotted-item notation? |
| Message-ID | <22-05-046@comp.compilers> |
I know the notation from LR(0) machine construction, but also know that Gluskhov used it in his solution to NFA construction. Earley also used the notation to describe his method if I understand right. I'm presuming that there is some first use of the notation. Do we know who invented it? Thanks in advance, Chris -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [next] | [standalone]
| From | Kaz Kylheku <480-992-1380@kylheku.com> |
|---|---|
| Date | 2022-05-22 18:29 +0000 |
| Message-ID | <22-05-047@comp.compilers> |
| In reply to | #3018 |
On 2022-05-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> I know the notation from LR(0) machine construction, but also know
> that Gluskhov used it in his solution to NFA construction. Earley
> also used the notation to describe his method if I understand right.
> I'm presuming that there is some first use of the notation. Do we
> know who invented it?
In Lisp, we can write (1 2 3 4 5 6) in any of these ways:
(1 . (2 3 4 5 6))
(1 2 . (3 4.5 6))
(1 2 3 . (4 5 6))
(1 2 3 4 . (5 6))
(1 2 3 4 5 . (6))
(1 2 3 4 5 6 . NIL)
As well as:
(1 . (2 . (3 4 5 6)))
The dot doesn't actually exist; it isn't represented anywhere in the
data structure, but in a hand-written expression it could be used to
convey (purely as a comment) some semanticaly interesting division in
the list.
E.g. suppose we have some data structure which holds a list of symbols,
and an integer indicating a position among those symbols.
A custom printing routine for that structure could take advantage
of this, rendering it as, say:
#S(sym-structure
:dot-position 2
:syms (a b . (c d)))
where the custom printer for sym-structure looks at dot-position
and renders the syms slot accordingly, to provide a visual aid.
The following is not possible in any mainstream Lisp I know of: (. (a b c))
as a way of denoting (a b c). The custom printer could omit the
dot notation.
(I know of a dialect in which it that leading dot notation is allowed,
with that semantics, but starting in a git commit made in 2014, making
it irrelevant here.)
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2022-05-23 13:23 +0300 |
| Message-ID | <22-05-048@comp.compilers> |
| In reply to | #3019 |
Let me explain a little further. Dotted-items are useful in translating (and visualizing) state machines as translations of regular expressions or grammar rules. In Glushkov's construction one takes a regular expression /ab*(cb)+a/ and makes copies of it with a dot before each "symbol" (character you can recognize) and one at the end. Thus, 6 dotted-items: /.ab*(cb)+a/ /a.b*(cb)+a/ /ab*(.cb)+a/ /ab*(c.b)+a/ /ab*(cb)+.a/ /ab*(cb)+a./ Each state in a Glushkov NFA is a subset of the powerset of these items: (see below) state 0: /.ab*(cb)+a/ state 1: /a.b*(cb)+a/ /ab*(.cb)+a/ state 2: /ab*(c.b)+a/ state 3: /ab*(.cb)+a/ /ab*(cb)+.a/ state 4: /ab*(cb)+a./ The algorithm explains how to calculate what the states and their transitions are. Yielding something like this: state 0: /.ab*(cb)+a/ -> state 1 state 1: /a.b*(cb)+a/ -> state 1 /ab*(.cb)+a/ -> state 2 state 2: /ab*(c.b)+a/ -> state 3 state 3: /ab*(.cb)+a/ -> state 2 /ab*(cb)+.a/ -> state 4 state 4: /ab*(cb)+a./ -> accept Below: Actually, in a Gluskov NFA the 6 dotted-items might be the states, and those subsets might be how you change it to a DFA (the subset construction algorithm). In my mind, those things have gotten merged, but that might be an error on my part. Anyway, when building the LR(0) machine, you build similar-dotted items (and subsets) to define the states of the LR(0) DFA. And, the Earley construction adds some context information to the doffed-items. It's how I think about state machines. It's how they are documented in Yacc++ with the debug option. Yacc (and Bison) does something similar using _ instead of dot if I recall correctly. But, someone must have introduced the idea first. My question is who? Did Backus use dotted-items? Or did Knuth or Glushkov invent them? Or maybe Naur. Or someone else, maybe one unfamiliar to me? Maybe it goes all the way back to Kleene. Is the question lost to time? That's my question.... Again, thanks, Chris -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [BNF was descriptive, don't think Backus ever did anything with formal parsers. I looked at Knuth's 1965 LR parsing paper which has a lot of notation but no dots. -John]
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2022-05-23 18:31 -0700 |
| Message-ID | <22-05-049@comp.compilers> |
| In reply to | #3018 |
On Sunday, May 22, 2022 at 10:15:07 AM UTC-7, Christopher F Clark wrote: > I know the notation from LR(0) machine construction, but also know > that Gluskhov used it in his solution to NFA construction. Earley > also used the notation to describe his method if I understand right. > I'm presuming that there is some first use of the notation. Do we > know who invented it? This could be asked in general, for the origin of different notations used in programming languages and their documentation. The first use of dot that I know of, similar to an operator, is for structure reference qualifiers in PL/I, and later adopted by C and others from there. I always thought that PL/I adopted structures from COBOL, but never looked up to see how COBOL did it. It seems that COBOL uses AS for structure references. As well as I know it, it was Fortran that first introduced variable names with more than one character, though mathematics still hasn't done that. (Fortran also uses dot for operators like .AND. and .LT., as in the early days the character set was restricted.) In algebra, it is traditional that two variables next to each other are multiplied, convenient for readers. There needs to be some separator to indicate when one name ends and the next begins. In some languages, that can just be space or some non-operator. [I believe Chris is asking about the specific use of a dot to show the position in a partially parsed BNF rule. PL/I copied its structure declarations close to verbatim from Cobol, with level numbers to show the nesting. It invented . and -> for structures, while Cobol used "OF" to refer to elements and doesn't have pointers. -John]
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2022-05-24 11:27 +0300 |
| Message-ID | <22-05-050@comp.compilers> |
| In reply to | #3021 |
Our wise moderator wrote: > I believe Chris is asking about the specific use of a dot to show the > position in a partially parsed BNF rule. Yes, exactly. It gives a kind of "operational semantics" to BNF/Regexes. "I am here. This is what I expect to see next. Or this, or this." The way many kids learn to write stories with "This happened. And then this. And then this. And then this." It's considered bad style. However, it gives a very simple semantics. Someone must have invented the idea of specifically marking the rule/regular expression with an indication on where you are in it. It may be an obvious thing, but someone had to realize that it should be made explicit and one can reason from it. Glushkov NFAs have fewer states and epsilon transitions than Thompson NFAs although both represent the same regular expressions. It's something I think anyone interested in automata or parsing should learn. I actually started writing about it in a blog/book I was thinking of writing on the topic to explain some things like what we did at Intel making hardware accelerators or how we did ELR parsing (LR + regular expressions) in Yacc++. Or even how one should look at PEG grammars or GLR or Okhotin's work. It's a basic foundation and too many people don't seem to know it. But someone must have first thought of expressing it that way. Kind regards, Chris -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | antispam@math.uni.wroc.pl |
|---|---|
| Date | 2022-05-28 04:51 +0000 |
| Message-ID | <22-05-052@comp.compilers> |
| In reply to | #3022 |
Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> Our wise moderator wrote:
> > I believe Chris is asking about the specific use of a dot to show the
> > position in a partially parsed BNF rule.
>
> Yes, exactly.
>
> It gives a kind of "operational semantics" to BNF/Regexes. "I am
> here. This is what I expect to see next. Or this, or this."
>
> The way many kids learn to write stories with "This happened. And
> then this. And then this. And then this." It's considered bad style.
> However, it gives a very simple semantics.
>
> Someone must have invented the idea of specifically marking the
> rule/regular expression with an indication on where you are in it. It
> may be an obvious thing, but someone had to realize that it should be
> made explicit and one can reason from it.
This idea is in paper by Floyd "A Descriptive Language for Symbol
Manipulation" where he descibes system of rules for parsing.
Instead of dot he uses a triangle as a marker, and his rules
are different than context-free rules.
I think that to have definite answer you need to be very precise
what you ask. Marking positions is very old technique. Implicitely
it appears in math works on Tue systems and word problems.
In a sense position of head in Turing machine is doing the
same work.
Dot is used in Earley paper from 1967. Knuth 1965 paper seem to
be hidden by paywalls, so I can not see what is in it, but
reasonable guess is that Knuth used dot. Knuth ideas are
innovative enough that if you specify your question to parsing
using set of items of context-free grammars, then I believe
he is the first. OTOH in more general setting marker tokens
for "current position" were crucial for proof that systems
of substitutions (rewrites) are Turing complete and this
seem to be done in early fifties.
--
Waldek Hebisch
[Knuth didn't use dots. You can find a copy of the paper at
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.8867
Click the "cached" link at the upper right. -John]
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web