Groups > comp.compilers > #3018 > unrolled thread

Who first invented dotted-item notation?

Started by	Christopher F Clark <christopher.f.clark@compiler-resources.com>
First post	2022-05-22 12:32 +0300
Last post	2022-05-28 04:51 +0000
Articles	6 — 4 participants

Back to article view | Back to comp.compilers

  Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-22 12:32 +0300
    Re: Who first invented dotted-item notation? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-05-22 18:29 +0000
      RE: Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-23 13:23 +0300
    Re: Who first invented dotted-item notation? gah4 <gah4@u.washington.edu> - 2022-05-23 18:31 -0700
      RE: Who first invented dotted-item notation? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-05-24 11:27 +0300
        Re: Who first invented dotted-item notation? antispam@math.uni.wroc.pl - 2022-05-28 04:51 +0000

#3018 — Who first invented dotted-item notation?

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2022-05-22 12:32 +0300
Subject	Who first invented dotted-item notation?
Message-ID	<22-05-046@comp.compilers>

I know the notation from LR(0) machine construction, but also know
that Gluskhov used it in his solution to NFA construction.  Earley
also used the notation to describe his method if I understand right.
I'm presuming that there is some first use of the notation.  Do we
know who invented it?

Thanks in advance,
Chris

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

[toc] | [next] | [standalone]

#3019

From	Kaz Kylheku <480-992-1380@kylheku.com>
Date	2022-05-22 18:29 +0000
Message-ID	<22-05-047@comp.compilers>
In reply to	#3018

On 2022-05-22, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> I know the notation from LR(0) machine construction, but also know
> that Gluskhov used it in his solution to NFA construction.  Earley
> also used the notation to describe his method if I understand right.
> I'm presuming that there is some first use of the notation.  Do we
> know who invented it?

In Lisp, we can write (1 2 3 4 5 6) in any of these ways:

(1 . (2 3 4 5 6))
(1 2 . (3 4.5 6))
(1 2 3 . (4 5 6))
(1 2 3 4 . (5 6))
(1 2 3 4 5 . (6))
(1 2 3 4 5 6 . NIL)

As well as:

(1 . (2 . (3 4 5 6)))

The dot doesn't actually exist; it isn't represented anywhere in the
data structure, but in a hand-written expression it could be used to
convey (purely as a comment) some semanticaly interesting division in
the list.

E.g. suppose we have some data structure which holds a list of symbols,
and an integer indicating a position among those symbols.

A custom printing routine for that structure could take advantage
of this, rendering it as, say:

  #S(sym-structure
    :dot-position 2
    :syms (a b . (c d)))

where the custom printer for sym-structure looks at dot-position
and renders the syms slot accordingly, to provide a visual aid.

The following is not possible in any mainstream Lisp I know of: (. (a b c))
as a way of denoting (a b c).  The custom printer could omit the
dot notation.

(I know of a dialect in which it that leading dot notation is allowed,
with that semantics, but starting in a git commit made in 2014, making
it irrelevant here.)

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal

[toc] | [prev] | [next] | [standalone]

#3020

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2022-05-23 13:23 +0300
Message-ID	<22-05-048@comp.compilers>
In reply to	#3019

Let me explain a little further.

Dotted-items are useful in translating (and visualizing) state
machines as translations of regular expressions or grammar rules.

In Glushkov's construction one takes a regular expression /ab*(cb)+a/
and makes copies of it with a dot before each "symbol" (character you
can recognize) and one at the end.  Thus, 6 dotted-items:
/.ab*(cb)+a/
/a.b*(cb)+a/
/ab*(.cb)+a/
/ab*(c.b)+a/
/ab*(cb)+.a/
/ab*(cb)+a./

Each state in a Glushkov NFA is a subset of the powerset of these
items: (see below)

state 0:
/.ab*(cb)+a/

state 1:
/a.b*(cb)+a/
/ab*(.cb)+a/

state 2:
/ab*(c.b)+a/

state 3:
/ab*(.cb)+a/
/ab*(cb)+.a/

state 4:
/ab*(cb)+a./

The algorithm explains how to calculate what the states and their
transitions are.

Yielding something like this:

state 0:
/.ab*(cb)+a/ -> state 1

state 1:
/a.b*(cb)+a/ -> state 1
/ab*(.cb)+a/ -> state 2

state 2:
/ab*(c.b)+a/ -> state 3

state 3:
/ab*(.cb)+a/ -> state 2
/ab*(cb)+.a/ -> state 4

state 4:
/ab*(cb)+a./ -> accept

Below: Actually, in a Gluskov NFA the 6 dotted-items might be the
states, and those subsets might be how you change it to a DFA (the
subset construction algorithm).  In my mind, those things have gotten
merged, but that might be an error on my part.

Anyway, when building the LR(0) machine, you build similar-dotted
items (and subsets) to define the states of the LR(0) DFA. And, the
Earley construction adds some context information to the doffed-items.

It's how I think about state machines.  It's how they are documented
in Yacc++ with the debug option.  Yacc (and Bison) does something
similar using _ instead of dot if I recall correctly.

But, someone must have introduced the idea first.  My question is who?
 Did Backus use dotted-items?  Or did Knuth or Glushkov invent them?
Or maybe Naur.  Or someone else, maybe one unfamiliar to me? Maybe it
goes all the way back to Kleene.  Is the question lost to time?

That's my question....

Again, thanks,
Chris

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------
[BNF was descriptive, don't think Backus ever did anything with formal parsers.
I looked at Knuth's 1965 LR parsing paper which has a lot of notation but no
dots. -John]

[toc] | [prev] | [next] | [standalone]

#3021

From	gah4 <gah4@u.washington.edu>
Date	2022-05-23 18:31 -0700
Message-ID	<22-05-049@comp.compilers>
In reply to	#3018

On Sunday, May 22, 2022 at 10:15:07 AM UTC-7, Christopher F Clark wrote:
> I know the notation from LR(0) machine construction, but also know
> that Gluskhov used it in his solution to NFA construction. Earley
> also used the notation to describe his method if I understand right.
> I'm presuming that there is some first use of the notation. Do we
> know who invented it?

This could be asked in general, for the origin of different notations
used in programming languages and their documentation.

The first use of dot that I know of, similar to an operator, is for structure
reference qualifiers in PL/I, and later adopted by C and others from there.

I always thought that PL/I adopted structures from COBOL, but never
looked up to see how COBOL did it.  It seems that COBOL uses AS
for structure references.

As well as I know it, it was Fortran that first introduced variable names
with more than one character, though mathematics still hasn't done that.
(Fortran also uses dot for operators like .AND. and .LT., as in the early
days the character set was restricted.)

In algebra, it is traditional that two variables next to each other are
multiplied, convenient for readers. There needs to be some separator
to indicate when one name ends and the next begins.  In some languages,
that can just be space or some non-operator.
[I believe Chris is asking about the specific use of a dot to show the
position in a partially parsed BNF rule.
PL/I copied its structure declarations close to verbatim from Cobol, with level
numbers to show the nesting.  It invented . and -> for structures, while
Cobol used "OF" to refer to elements and doesn't have pointers. -John]

[toc] | [prev] | [next] | [standalone]

#3022

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2022-05-24 11:27 +0300
Message-ID	<22-05-050@comp.compilers>
In reply to	#3021

Our wise moderator wrote:
> I believe Chris is asking about the specific use of a dot to show the
> position in a partially parsed BNF rule.

Yes, exactly.

It gives a kind of "operational semantics" to BNF/Regexes.  "I am
here.  This is what I expect to see next.  Or this, or this."

The way many kids learn to write stories with "This happened.  And
then this.  And then this. And then this."  It's considered bad style.
However, it gives a very simple semantics.

Someone must have invented the idea of specifically marking the
rule/regular expression with an indication on where you are in it.  It
may be an obvious thing, but someone had to realize that it should be
made explicit and one can reason from it.

Glushkov NFAs have fewer states and epsilon transitions than Thompson
NFAs although both represent the same regular expressions.

It's something I think anyone interested in automata or parsing should
learn.  I actually started writing about it in a blog/book I was
thinking of writing on the topic to explain some things like what we
did at Intel making hardware accelerators or how we did ELR parsing
(LR + regular expressions) in Yacc++.  Or even how one should look at
PEG grammars or GLR or Okhotin's work.  It's a basic foundation and
too many people don't seem to know it.  But someone must have first
thought of expressing it that way.

Kind regards,
Chris

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

[toc] | [prev] | [next] | [standalone]

#3024

From	antispam@math.uni.wroc.pl
Date	2022-05-28 04:51 +0000
Message-ID	<22-05-052@comp.compilers>
In reply to	#3022

Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> Our wise moderator wrote:
> > I believe Chris is asking about the specific use of a dot to show the
> > position in a partially parsed BNF rule.
>
> Yes, exactly.
>
> It gives a kind of "operational semantics" to BNF/Regexes.  "I am
> here.  This is what I expect to see next.  Or this, or this."
>
> The way many kids learn to write stories with "This happened.  And
> then this.  And then this. And then this."  It's considered bad style.
> However, it gives a very simple semantics.
>
> Someone must have invented the idea of specifically marking the
> rule/regular expression with an indication on where you are in it.  It
> may be an obvious thing, but someone had to realize that it should be
> made explicit and one can reason from it.

This idea is in paper by Floyd "A Descriptive Language for Symbol
Manipulation" where he descibes system of rules for parsing.
Instead of dot he uses a triangle as a marker, and his rules
are different than context-free rules.

I think that to have definite answer you need to be very precise
what you ask.  Marking positions is very old technique.  Implicitely
it appears in math works on Tue systems and word problems.
In a sense position of head in Turing machine is doing the
same work.

Dot is used in Earley paper from 1967.  Knuth 1965 paper seem to
be hidden by paywalls, so I can not see what is in it, but
reasonable guess is that Knuth used dot.  Knuth ideas are
innovative enough that if you specify your question to parsing
using set of items of context-free grammars, then I believe
he is the first.  OTOH in more general setting marker tokens
for "current position" were crucial for proof that systems
of substitutions (rewrites) are Turing complete and this
seem to be done in early fifties.

--
                              Waldek Hebisch
[Knuth didn't use dots.  You can find a copy of the paper at
https://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.361.8867
Click the "cached" link at the upper right. -John]

[toc] | [prev] | [standalone]

csiph-web

Who first invented dotted-item notation?

Contents

#3018 — Who first invented dotted-item notation?

#3019

#3020

#3021

#3022

#3024