Groups > comp.compilers > #2554 > unrolled thread

Supporting multiple input syntaxes

Started by	luser droog <mijoryx@yahoo.com.dmarc.email>
First post	2020-08-12 15:20 -0700
Last post	2021-03-14 17:36 -0700
Articles	20 on this page of 23 — 12 participants

Back to article view | Back to comp.compilers

  Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-12 15:20 -0700
    Re: Supporting multiple input syntaxes Kaz Kylheku <793-849-0957@kylheku.com> - 2020-08-13 00:43 +0000
    Re: Supporting multiple input syntaxes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-08-13 10:27 +0200
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:36 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 22:16 -0700
    Re: Supporting multiple input syntaxes minforth@arcor.de - 2020-08-13 14:15 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:37 -0700
      RE: Supporting mulitple input syntaxes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-08-14 12:44 +0300
    Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-15 06:42 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-15 15:20 -0700
        Re: Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-16 02:21 -0700
          Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-20 14:45 -0700
            Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 19:35 -0700
              Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 20:39 -0700
                Re: implementation languages, was Supporting multiple input syntaxes Thomas Koenig <tkoenig@netcologne.de> - 2020-08-24 17:01 +0000
                  Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-28 10:56 -0700
                Re: stack languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-24 12:14 -0700
    Re: Supporting multiple input syntaxes gah4 <gah4@u.washington.edu> - 2020-08-23 14:26 -0700
    Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-11 23:27 +0000
      Re: Supporting multiple input syntaxes Elijah Stone <elronnd@elronnd.net> - 2021-02-17 01:44 -0800
        Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-23 23:28 +0000
          Re: Supporting multiple input syntaxes "ltc...@gmail.com" <ltcmelo@gmail.com> - 2021-03-14 21:08 -0400
    Re: Supporting multiple input syntaxes Rock Brentwood <rockbrentwood@gmail.com> - 2021-03-14 17:36 -0700

Page 1 of 2 [1] 2 Next page →

#2554 — Supporting multiple input syntaxes

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-12 15:20 -0700
Subject	Supporting multiple input syntaxes
Message-ID	<20-08-002@comp.compilers>

I've got my project successfully parsing the circa-1975 C syntax
from that old manual. I'd like to add parsers for K&R1 and c90
syntaxes.

How separate should these be? Should they be complete
separate grammars, or more piecewise selection?

My feeling is that separating them will be less headache, but maybe
there's some advantage to changing out smaller pieces of the grammar
in that it might be easier to make sure that they produce the same
structure compatible with the backend.

Any guidance in this area?

https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

[Really, it's up to you.  My inclination would be to make them
separate but use some sort of macro setup so you can insert
common pieces into each of the grammars. -John]

[toc] | [next] | [standalone]

#2555

From	Kaz Kylheku <793-849-0957@kylheku.com>
Date	2020-08-13 00:43 +0000
Message-ID	<20-08-003@comp.compilers>
In reply to	#2554

On 2020-08-12, luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
>
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
>
> Any guidance in this area?
>
> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

I'd say that since you're not using a parser generator, but using code
statements to construct the grammar objects at initialization time, you
have the flexibility to merge the implementation, because you can check
the value of some dialect-selecting variable, and construct the parser
accordingly, and elsewhere check that same variable to do whatever else
needs to be done conditionally.

The trick is to find a way to embed the *semantics* of the older dialects
into the new so then everything after the parsing can be shared.

Similar remarks would apply to recursive descent.

If you were using something clunky like a Yacc, there are still ways
to combine everything into a single grammar. The input stream can be
primed with one of several "secret internal token" objects that has no
lexeme. (Primed, meaming that the first call to the lexer yields this
secret token instead of processing actual input.) Each token indicates
a dialect to parse. The top-level grammar production can then pick
one of several subordinate production rules corresponding to the entry
points for the respective dialects.  Those can then share common rules
as much as possible.

  translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */
                   | C79_TOKEN c79_translation_unit /* "K&R" */
                   | C90_TOKEN c90_translation_unit /* ANSI/ISO */
                   ;

--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List:  http://www.kylheku.com/diy
ADA MP-1 Mailing List:   http://www.kylheku.com/mp1

[toc] | [prev] | [next] | [standalone]

#2556

From	Hans-Peter Diettrich <DrDiettrich1@netscape.net>
Date	2020-08-13 10:27 +0200
Message-ID	<20-08-004@comp.compilers>
In reply to	#2554

Am 13.08.2020 um 00:20 schrieb luser droog:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?

IMO this depends widely on the usage of the parser output (diagnostics,
backend...). C90 is much stricter than K&R, requires more checks. Do you
need extensive error diagnostics, or do you assume that all source code
is free of errors?


> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c

You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
LL(1) parser with one or two locations where more lookahead is required.
Also identifiers are classified as typenames and others prior to their
usage.

For real-world testing (recommended!) a preprocessor is required and a
copy of the standard libraries of existing compiler(s).

Your test_syntax() source misses "=" from the variable declarations
(initializers). What about pointer syntax/semantics? If you add these
(and other) syntax differences conditionally (version specific) to your
code, which way would look better to you? Which way will be safer to
maintain?


Nice code BTW :-)

DoDi

[toc] | [prev] | [next] | [standalone]

#2558

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-13 21:36 -0700
Message-ID	<20-08-006@comp.compilers>
In reply to	#2556

On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote:
> Am 13.08.2020 um 00:20 schrieb luser droog:
> > I've got my project successfully parsing the circa-1975 C syntax
> > from that old manual. I'd like to add parsers for K&R1 and c90
> > syntaxes.
> >
> > How separate should these be? Should they be complete
> > separate grammars, or more piecewise selection?
>
> IMO this depends widely on the usage of the parser output (diagnostics,
> backend...). C90 is much stricter than K&R, requires more checks. Do you
> need extensive error diagnostics, or do you assume that all source code
> is free of errors?
>
>
> > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
>
> You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
> LL(1) parser with one or two locations where more lookahead is required.
> Also identifiers are classified as typenames and others prior to their
> usage.
>

Yes, it's basically LL(1) with backtracking. There's one part of the
grammar I'm using that's left-recursive and I still need to work that
out.

> For real-world testing (recommended!) a preprocessor is required and a
> copy of the standard libraries of existing compiler(s).
>
> Your test_syntax() source misses "=" from the variable declarations
> (initializers). What about pointer syntax/semantics? If you add these
> (and other) syntax differences conditionally (version specific) to your
> code, which way would look better to you? Which way will be safer to
> maintain?
>

That's actually correct for the 1975 dialect: no '=' to initialize
variables. I think it's pretty ugly without it, but it could be
removed anyway for the AST.

>
> Nice code BTW :-)
>

Thanks! I think I need to sidetrack a bit and work up some primitives
for pattern matching and decomposition to make the backend easier.
I'll report back if/when it can do more tricks.

[toc] | [prev] | [next] | [standalone]

#2568

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-23 22:16 -0700
Message-ID	<20-08-016@comp.compilers>
In reply to	#2556

On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote:
> Am 13.08.2020 um 00:20 schrieb luser droog:
> > I've got my project successfully parsing the circa-1975 C syntax
> > from that old manual. I'd like to add parsers for K&R1 and c90
> > syntaxes.
> >
> > How separate should these be? Should they be complete
> > separate grammars, or more piecewise selection?
>
> IMO this depends widely on the usage of the parser output (diagnostics,
> backend...). C90 is much stricter than K&R, requires more checks. Do you
> need extensive error diagnostics, or do you assume that all source code
> is free of errors?
>
>
> > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
>
> You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an
> LL(1) parser with one or two locations where more lookahead is required.

In which places do you need more lookahead? Btw, some of my reading
describes my parsers as LL(infinity) because of the backtracking.

[toc] | [prev] | [next] | [standalone]

#2557

From	minforth@arcor.de
Date	2020-08-13 14:15 -0700
Message-ID	<20-08-005@comp.compilers>
In reply to	#2554

Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection? ...

Why not settle for one master dialect and use awk to translate between dialects?

[Probably because there is a great deal of C code written to comply with
the various versions of the standard, users want error messages that match
the code they wrote rather than some intermediate code, and it would be quite
an awk program that could reconcile all the differences among C flavors. -John]

[toc] | [prev] | [next] | [standalone]

#2559

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-13 21:37 -0700
Message-ID	<20-08-007@comp.compilers>
In reply to	#2557

On Thursday, August 13, 2020 at 5:24:36 PM UTC-5, minf...@arcor.de wrote:
> Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog:
> > I've got my project successfully parsing the circa-1975 C syntax
> > from that old manual. I'd like to add parsers for K&R1 and c90
> > syntaxes.
> >
> > How separate should these be? Should they be complete
> > separate grammars, or more piecewise selection? ...
>
> Why not settle for one master dialect and use awk to translate between dialects?
>
> [Probably because there is a great deal of C code written to comply with
> the various versions of the standard, users want error messages that match
> the code they wrote rather than some intermediate code, and it would be quite
> an awk program that could reconcile all the differences among C flavors. -John]

One of the possible goals for this project is exactly such a translator
that can downgrade or upgrade code from one standard version to another.

Another possible application is a source code formatter. Currently the
CST produced by the parser keeps all the original whitespace attached
to each token.

[toc] | [prev] | [next] | [standalone]

#2560 — RE: Supporting mulitple input syntaxes

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Date	2020-08-14 12:44 +0300
Subject	RE: Supporting mulitple input syntaxes
Message-ID	<20-08-008@comp.compilers>
In reply to	#2557

We did something similar with Yacc++.  We used inheritance of grammars
(a feature of our product) to do so.  In fact, the point of the
exercise was to demonstrate that feature.  I would presume something
similar would work in a hand-written recursive descent parser.

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------

[toc] | [prev] | [next] | [standalone]

#2561

From	David Lovemore <davidlovemore@gmail.com>
Date	2020-08-15 06:42 -0700
Message-ID	<20-08-009@comp.compilers>
In reply to	#2554

It may be useful to consider what you would like to happen if you encounter a
syntax that is ambiguous or works differently or is for another expected
syntax from what you are parsing: produce a warning, error or handle quietly,
or fall over, or don’t care.

[toc] | [prev] | [next] | [standalone]

#2562

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-15 15:20 -0700
Message-ID	<20-08-010@comp.compilers>
In reply to	#2561

On Saturday, August 15, 2020 at 9:46:47 AM UTC-5, David Lovemore wrote:
> It may be useful to consider what you would like to happen if you encounter a
> syntax that is ambiguous or works differently or is for another expected
> syntax from what you are parsing: produce a warning, error or handle quietly,
> or fall over, or don’t care.

Very good points. The parser is backtracking, returning a list of
results. That could conceivably be useful for dealing with ambiguity by
looking at more than just the first result.

Warnings and error messages are going to be trickier I fear. The parser
is built around the idea of Monadic combinators. So my research suggests
that I'll need Monad Transformers to add the state needed for good
messages. There are a bunch of lectures I've found about using these
in Haskell, but not much about how to build them from scratch.

As it is, any error in parsing will just produce no results.

I started a prototype where the input list of characters was instead
a list of (character, line-number, line-position). But it got really
confusing at the time so I stopped. And the few times I've tried to
look at it, I can't figure out what I was thinking.

[toc] | [prev] | [next] | [standalone]

#2563

From	David Lovemore <davidlovemore@gmail.com>
Date	2020-08-16 02:21 -0700
Message-ID	<20-08-011@comp.compilers>
In reply to	#2562

My friend, reporting the furthest position examined by the parser I have
useful in error cases as a simple stop gap when using a combinator approach.

Thinking about it you kind of want to see the furthest failed position and the
stack of rules above it. Such requires meta information when the code is
written in the most natural way. For this reason and others I believe it is
good to represent your grammar in data structures which is further in the
direction of a compiler compiler tool (or compiler interpreter tool).

[toc] | [prev] | [next] | [standalone]

#2564

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-20 14:45 -0700
Message-ID	<20-08-012@comp.compilers>
In reply to	#2563

On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
> My friend, reporting the furthest position examined by the parser I have [found]
> useful in error cases as a simple stop gap when using a combinator approach.
>
> Thinking about it you kind of want to see the furthest failed position and the
> stack of rules above it. Such requires meta information when the code is
> written in the most natural way. For this reason and others I believe it is
> good to represent your grammar in data structures which is further in the
> direction of a compiler compiler tool (or compiler interpreter tool).

Thanks. I've done some further investigating. I built my parsers following
two papers. Hutton and Meijer, Monadic Parser Combinators
  https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
and Hutton, Higher-Order Functions for Parsing
  https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf

The first adds error reporting using Monad Transformers. I'm thinking about
how to move in this direction, but first I'd need to reformulate the code to
make the Monad more explicit. It should be something like an interface or
a mixin, like a base class with all virtual member functions. That could be
done by modelling my objects more like OO objects and have 'bind' and
'result' in a vtable in the Parser object.

But the second paper does it differently, and maybe something I can do
more easily. It redefines the parsers to no longer produce a list of results,
so there's no longer support for ambiguity. Then it defines them to
return a Maybe,

  maybe * ::= Fail [char] | Error [char] | OK *
.
where the OK branch has the parse tree, and Fail or Error both contain an error
message. It describes how a Fail can be transformed into an Error. But it isn't
entirely clear where the messages get injected.

Still need to do some thinking on it, but I think I can rewrite the parsers
to follow this model, and then decorate my grammar with possible errors
at each node.

Thanks for the encouragement. My classes start on Monday so I'm hoping
to accomplish something on this before then.

[toc] | [prev] | [next] | [standalone]

#2566

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-23 19:35 -0700
Message-ID	<20-08-014@comp.compilers>
In reply to	#2564

On Sunday, August 23, 2020 at 1:39:30 PM UTC-5, luser droog wrote:
> On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
> > My friend, reporting the furthest position examined by the parser I have [found]
> > useful in error cases as a simple stop gap when using a combinator approach.
> >
> > Thinking about it you kind of want to see the furthest failed position and the
> > stack of rules above it. Such requires meta information when the code is
> > written in the most natural way. For this reason and others I believe it is
> > good to represent your grammar in data structures which is further in the
> > direction of a compiler compiler tool (or compiler interpreter tool).
>
> Thanks. I've done some further investigating. I built my parsers following
> two papers. Hutton and Meijer, Monadic Parser Combinators
>   https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
> and Hutton, Higher-Order Functions for Parsing
>   https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf
>
> The first adds error reporting using Monad Transformers. [...]>
> But the second paper does it differently, and maybe something I can do
> more easily. It redefines the parsers to no longer produce a list of results,
> so there's no longer support for ambiguity. Then it defines them to
> return a Maybe,
>
>   maybe * ::= Fail [char] | Error [char] | OK *
> .
> where the OK branch has the parse tree, and Fail or Error both contain an error
> message. It describes how a Fail can be transformed into an Error. But it isn't
> entirely clear where the messages get injected.
>
> Still need to do some thinking on it, but I think I can rewrite the parsers
> to follow this model, and then decorate my grammar with possible errors
> at each node.

I've made some progress. I wrote a new prototype following Hutton and
modified it to add position information to the character stream.
And then rewrote the parsers to produce the maybe structure and then
to collect rudimentary error messages.

For these, a positive and negative case in PostScript,

0 0 (abcd\ne) input-pos
(abc) str exec
pc
0 0 (abcd\ne) input-pos
(abd) str nofail exec
pq

I get this output:

$ gsnd -q -dNOSAFER pc11.ps
stack:
[/OK [[(a) [(b) [(c) []]]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]
stack:
[/Error [[(after) (a)] [[(after) (b)] [[{(d) eq} (not satisfied)] [[(c) [0 2]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]]]]

So, this indicates that I *can* modify my C parsers to produce error
messages. The remaining input list has the position information for
where the error occurred, [(c) [0 2]].

Following the prototype, I modified the input functions to add positions
for each character and modified the base parser item() to detect and
remove the position stuff before it passes into the rest of the machinery.

The next hurdle was making the extra position information work with
the Unicode filters ucs4_from_utf8() and utf8_from_ucs4(). And that
all appears to be working now.

But that's probably the end of the story for now. Got to gear up for
Operating Systems and Advanced Web Dev with Jave.

Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
to pass a language id token between tokenizer and parser.

Ps. the prototype is written in PostScript extended with function syntax.
https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps
https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript

--
l droog
[Why Postscript?  I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

[toc] | [prev] | [next] | [standalone]

#2567 — Re: implementation languages, was Supporting multiple input syntaxes

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-23 20:39 -0700
Subject	Re: implementation languages, was Supporting multiple input syntaxes
Message-ID	<20-08-015@comp.compilers>
In reply to	#2566

On Sunday, August 23, 2020 at 10:15:06 PM UTC-5, luser droog wrote:

> Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
> to pass a language id token between tokenizer and parser.
>
>
> Ps. the prototype is written in PostScript extended with function syntax.
> https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps
> https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript
>
> --
> l droog
> [Why Postscript?  I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]

I discovered PostScript around '97 or '98. I was taking Computer Graphics
and it was in an Appendix to the textbook (Salman). At the same time
I was editor of the Honors College student magazine so it really piqued
my interest as a graphics and typography language.

But the language itself I just really enjoy. It's my "Lego blocks"
language. The RPN syntax removes all ambiguity about precedence and
sequencing. It has the same code=data properties as Lisp. Application
code can read from the program stream. It has strings, arrays and
dictionaries. It has first class procedures which can be constructed
on the fly. I've found it a nice playpen for syntax extension.

I was also on a many-decades long crusade to never use MS Word after that
/first/ time they screwed everyone by changing the interface. And
PostScript has slowly become my tool for that as my programming skill grew.
https://github.com/luser-dr00g/ibis.ps

On another front, I wanted to have parsers in PostScript so I could
evaluate infix math expressions. And I wanted regular expression
matching in PS thinking it would help to implement algorithmic
hyphenation of text.
[Take a look at Forth.  Many of the same advantages, runs a lot more places. -John]

[toc] | [prev] | [next] | [standalone]

#2569 — Re: implementation languages, was Supporting multiple input syntaxes

From	Thomas Koenig <tkoenig@netcologne.de>
Date	2020-08-24 17:01 +0000
Subject	Re: implementation languages, was Supporting multiple input syntaxes
Message-ID	<20-08-017@comp.compilers>
In reply to	#2567

luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> schrieb:

[PostScript]

> But the language itself I just really enjoy. It's my "Lego blocks"
> language. The RPN syntax removes all ambiguity about precedence and
> sequencing.

I recently had the doubtful pleasure of evaluating the formula

x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
    +(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))

in Postscript.  (Yes, really.  Don't ask.)

It was the first time in more than a decade that I wrote a
flex/bison grammar (mostly copied from the bison manual).  It was
faster and less error-prone than trying to do it directly.

The grammar actually generated fairly unidiomatic PostScript
because I made it give names to all the variables, so

a-b

became

a b sub

I'm sure a real PostScript aficionado would have done it all
on the stack :-)
[Turning infix into RPN is a pretty basic intro compiler course exercise.  Conceptually,
you make the parse tree and then do a postorder tree walk.  Or if you'rs using yacc or
bison, in the expression grammar you just print the operator or token in the code for each
rule because they are recognized in the right order, e.g.:

expr:   VARIABLE { printf(" %s ", $1): }
  | expr '+' expr ( printf(" add "); }
  | expr '-' expr ( printf(" sub "); }
  | '(' expr ')' { /* print nothing */ }

-John]

[toc] | [prev] | [next] | [standalone]

#2571 — Re: implementation languages, was Supporting multiple input syntaxes

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-28 10:56 -0700
Subject	Re: implementation languages, was Supporting multiple input syntaxes
Message-ID	<20-08-019@comp.compilers>
In reply to	#2569

On Monday, August 24, 2020 at 2:12:13 PM UTC-5, Thomas Koenig wrote:
> luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email.dmarc.email> schrieb:
>
> [PostScript]
>
> > But the language itself I just really enjoy. It's my "Lego blocks"
> > language. The RPN syntax removes all ambiguity about precedence and
> > sequencing.
>
> I recently had the doubtful pleasure of evaluating the formula
>
> x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
>     +(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))
>
> in Postscript.  (Yes, really.  Don't ask.)
>

In case you need it, I've got a PostScript debugger that can single
step into loops and procedures.

https://github.com/luser-dr00g/debug.ps

$ gsnd db5.ps
GPL Ghostscript 9.52 (2020-03-19)
Copyright (C) 2020 Artifex Software, Inc.  All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
GS>{ 0 0 1 5 { add } for } stepon traceon debug
 %|-
0  %|- 0
0  %|- 0 0
1  %|- 0 0 1
5  %|- 0 0 1 5
{add}  %|- 0 0 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?s
 %|- 0 0
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 0
1  %|- 0 1
1  %|- 0 1 1
5  %|- 0 1 1 5
{add}  %|- 0 1 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
 %|- 0 1
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 1
2  %|- 1 2
1  %|- 1 2 1
5  %|- 1 2 1 5
{add}  %|- 1 2 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
 %|- 1 2
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 3
3  %|- 3 3
1  %|- 3 3 1
5  %|- 3 3 1 5
{add}  %|- 3 3 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
 %|- 3 3
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 6
4  %|- 6 4
1  %|- 6 4 1
5  %|- 6 4 1 5
{add}  %|- 6 4 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
 %|- 6 4
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 10
5  %|- 10 5
1  %|- 10 5 1
5  %|- 10 5 1 5
{add}  %|- 10 5 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
 %|- 10 5
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
 %|- 15
6  %|- 15 6
1  %|- 15 6 1
5  %|- 15 6 1 5
{add}  %|- 15 6 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
GS<1>==
15
GS>quit

[This is drifting rather far from compilers now. -John]

[toc] | [prev] | [next] | [standalone]

#2570 — Re: stack languages, was Supporting multiple input syntaxes

From	luser droog <mijoryx@yahoo.com.dmarc.email>
Date	2020-08-24 12:14 -0700
Subject	Re: stack languages, was Supporting multiple input syntaxes
Message-ID	<20-08-018@comp.compilers>
In reply to	#2567

On Monday, August 24, 2020 at 10:42:33 AM UTC-5, luser droog wrote:
> > [Why Postscript?  I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]
>
> I discovered PostScript around '97 or '98. I was taking Computer Graphics
> and it was in an Appendix to the textbook (Salman). At the same time
> I was editor of the Honors College student magazine so it really piqued
> my interest as a graphics and typography language. ...

> [Take a look at Forth.  Many of the same advantages, runs a lot more places. -John]

Good suggestion. I have looked at Forth quite a bit. I lurked in
comp.lang.forth for a number of years. I've got a half-written
interpreter that stalled because my vm doesn't have any I/O.

https://groups.google.com/d/topic/comp.lang.forth/Y1XlX8wD3RQ/discussion
https://retrocomputing.stackexchange.com/questions/6610/how-to-do-i-o-with-emulation-code

I went down a wild rabbit hole after discovering the document "X86 is an
octal machine" and tried to recode the assembler macros using more octal.
But I kind of stalled on that whole area since the first thing I would
want in Forth is tagged objects like PostScript has. There's Oforth and
8th which both supply that sort of thing, but then I'd probably miss
the PS graphics functions.,,. :)

I've also played with APL and tried writing a few interpreters for it.
But the common thread among all these interpreters was coding them all
in C. So I turned my attention to compiling and analyzing C code.
A friend of mine was wanting a really customizable C formatter so I
thought I might be able to make a tool to accommodate lots of different
backends for doing something with the parse tree or syntax tree.
I want to be able to write C99 code and transpile it automatically to
something that will work with the MS compiler without having to maintain
any MS business in the "master" source.

[toc] | [prev] | [next] | [standalone]

#2565

From	gah4 <gah4@u.washington.edu>
Date	2020-08-23 14:26 -0700
Message-ID	<20-08-013@comp.compilers>
In reply to	#2554

On Wednesday, August 12, 2020 at 3:32:56 PM UTC-7, luser droog wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.

(snip)

I remember using  a later C compiler that accepted
the older form of assignment operators, such as =+, =-, etc.
Presumably for those with older code.

I then had to put an extra space when assigning negative values:

   i= -4;

I don't remember now how I found out about that one.

That is the only old C syntax I remember.
(And much later than 1975.)

[toc] | [prev] | [next] | [standalone]

#2629

From	antispam@math.uni.wroc.pl
Date	2021-02-11 23:27 +0000
Message-ID	<21-02-004@comp.compilers>
In reply to	#2554

luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
>
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
>
> Any guidance in this area?
>
> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
>
> [Really, it's up to you.  My inclination would be to make them
> separate but use some sort of macro setup so you can insert
> common pieces into each of the grammars. -John]

Gnu Pascal supports several Pascal dialects.  Gnu Pascal uses
unified parser for all dialects.  Some ideas used:
- flags in scanner decide if dialect specific tokens are
  recognized
- superset parsing: several constructs are generalized so
  that single construct represents things that othewise
  would lead to conflits.  Later semantic stage looks at
  dialects flags, prunes things not allowed in given
  dialect.  Example of superset contruction is rule
  'call_or_cast', it handles several syntactically similar
  constructs that are usually given by separate syntax
  rules.  Semantic rules beside dialect flags use types to
  decide of meaning.
- even after usin two tricks above grammar still have
  LALR conflicts, they are resolved using GLR option
  of Bison.  All conflicts are resolvable using lookahead,
  and AFAICS some are only resolvable with lookahead.
  Parser lookahead means that traditional trick of
  passing semantic info back to scanner does not work
  (parser actions are delayed, so scanner may be forced
  to produce token before semantic info is available).
  Still, it seems that GLR leads to cleaner parser.

My impression is that variation in Pascal dialects is larger
than in C dialects, so case for unified parser in C IMHO
is much stronger.  OTOH Gnu Pascal is full compiler with
semantic actions invoked from grammar rules.  Semantic code
embedded in the parser changed much more than grammar rules,
so maintaining separate parsers probably would be a
nightmare.

--
                              Waldek Hebisch

[toc] | [prev] | [next] | [standalone]

#2630

From	Elijah Stone <elronnd@elronnd.net>
Date	2021-02-17 01:44 -0800
Message-ID	<21-02-005@comp.compilers>
In reply to	#2629

On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:

> My impression is that variation in Pascal dialects is larger than in C
> dialects, so case for unified parser in C IMHO

Pascal is more fragmented, but it's also much easier to parse than C.  I
think it's a wash.

(I also think the whole idea is horrifying and ought not to be pursued;
but.)

  -E

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Supporting multiple input syntaxes

Contents

#2554 — Supporting multiple input syntaxes

#2555

#2556

#2558

#2568

#2557

#2559

#2560 — RE: Supporting mulitple input syntaxes

#2561

#2562

#2563

#2564

#2566

#2567 — Re: implementation languages, was Supporting multiple input syntaxes

#2569 — Re: implementation languages, was Supporting multiple input syntaxes

#2571 — Re: implementation languages, was Supporting multiple input syntaxes

#2570 — Re: stack languages, was Supporting multiple input syntaxes

#2565

#2629

#2630