Groups > comp.compilers > #2554 > unrolled thread

Supporting multiple input syntaxes

Started by	luser droog <mijoryx@yahoo.com.dmarc.email>
First post	2020-08-12 15:20 -0700
Last post	2021-03-14 17:36 -0700
Articles	3 on this page of 23 — 12 participants

Back to article view | Back to comp.compilers

  Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-12 15:20 -0700
    Re: Supporting multiple input syntaxes Kaz Kylheku <793-849-0957@kylheku.com> - 2020-08-13 00:43 +0000
    Re: Supporting multiple input syntaxes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-08-13 10:27 +0200
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:36 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 22:16 -0700
    Re: Supporting multiple input syntaxes minforth@arcor.de - 2020-08-13 14:15 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:37 -0700
      RE: Supporting mulitple input syntaxes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-08-14 12:44 +0300
    Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-15 06:42 -0700
      Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-15 15:20 -0700
        Re: Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-16 02:21 -0700
          Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-20 14:45 -0700
            Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 19:35 -0700
              Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 20:39 -0700
                Re: implementation languages, was Supporting multiple input syntaxes Thomas Koenig <tkoenig@netcologne.de> - 2020-08-24 17:01 +0000
                  Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-28 10:56 -0700
                Re: stack languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-24 12:14 -0700
    Re: Supporting multiple input syntaxes gah4 <gah4@u.washington.edu> - 2020-08-23 14:26 -0700
    Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-11 23:27 +0000
      Re: Supporting multiple input syntaxes Elijah Stone <elronnd@elronnd.net> - 2021-02-17 01:44 -0800
        Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-23 23:28 +0000
          Re: Supporting multiple input syntaxes "ltc...@gmail.com" <ltcmelo@gmail.com> - 2021-03-14 21:08 -0400
    Re: Supporting multiple input syntaxes Rock Brentwood <rockbrentwood@gmail.com> - 2021-03-14 17:36 -0700

Page 2 of 2 — ← Prev page 1 [2]

#2633

From	antispam@math.uni.wroc.pl
Date	2021-02-23 23:28 +0000
Message-ID	<21-02-008@comp.compilers>
In reply to	#2630

Elijah Stone <elronnd@elronnd.net> wrote:
> On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:
>
> > My impression is that variation in Pascal dialects is larger than in C
> > dialects, so case for unified parser in C IMHO
>
> Pascal is more fragmented, but it's also much easier to parse than C.  I
> think it's a wash.

I did a C parser, it was not hard at all.  I in C (like in standard
Pascal) there are conflicts, but that conflicts can be resolved
easily using semantic info.  Alternativly, for C one can use 2
token lookahead.  Turbo Pascal folks introduced "interesting"
difficulty with caret constants.  Frank Heckenbach worked out
how to handle them and his analysis indicates that correct
handling of Turbo Pascal needs IIRC 6 tokens of lookahead.

Note that for both Pascal and C, with 1 token of lokahead
semantic info is available when needed to disambiguate
parsing, once you have more than 1 token of lokahead
semantic info is sometimes too late and in effect paser
must work purely syntactically.

> (I also think the whole idea is horrifying and ought not to be pursued;
> but.)

What you mean by "whole idea"?  Do you think that creating
compiler that can correctly handle multiple dialects (Pascal
or other language) is wrong?

--
                              Waldek Hebisch

[toc] | [prev] | [next] | [standalone]

#2634

From	"ltc...@gmail.com" <ltcmelo@gmail.com>
Date	2021-03-14 21:08 -0400
Message-ID	<21-03-001@comp.compilers>
In reply to	#2633

> Elijah Stone <elr...@elronnd.net> wrote:
> I did a C parser, it was not hard at all. I in C (like in standard
> Pascal) there are conflicts, but that conflicts can be resolved
> easily using semantic info. Alternativly, for C one can use 2
> token lookahead.

I'm not sure whether I captured the full context of your statement, but, if I
did, I don't think it's 100% correct:

- In regards to lookahead of 2:
This isn't enough to disambiguate, e.g., between a cast-expression in 6.5.4,
`( type-name ) cast-expression`, and a compound literal in 6.5.2, `( type-name
) { initializer-list }`.

- In regards to using semantic info:
Yes, with semantic info you can disambiguate things like `x * y;`, so I'd say
that, from a pragmatic/practical standpoint, this affirmation is right.
However, from a more theoretical perspective, a parser (thinking of it a
program that "simply" validates a sentence based on a grammar), isn't expected
— arguably — to rely on anything else other than syntax. Whether or not
the theoretical aspect of it is relevant, depends on the application of the
parser, I guess. For instance, for the implementation of static analysis tool,
not depending, as much as possible, on semantic information to guide parsing
is an advantage.

This is a table (only for expressions) that I recently put up when rewriting
the C parser of my project:
https://docs.google.com/spreadsheets/d/1oGjtFaqLzSoBEp2aGNgHrbEHxSi4Ijv57mXMP
ymZEcQ/edit?usp=sharing

--
Leandro T. C. Melo

[toc] | [prev] | [next] | [standalone]

#2636

From	Rock Brentwood <rockbrentwood@gmail.com>
Date	2021-03-14 17:36 -0700
Message-ID	<21-03-003@comp.compilers>
In reply to	#2554

On Wednesday, August 12, 2020 at 5:32:56 PM UTC-5, luser droog wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?

I'm in a similar situation with a utility that I want to grandfather in the
old syntax for, but write with a new and better syntax. My recommendation is
this: stick to C99, since that's already in POSIX. Write a separate utility to
convert legacy syntax to C99 (and to call out any
irregularities/inconsistencies in the program being converted). That's, like,
"lint" on steroids.

The other syntaxes would be used in the other utilities, only - one per
utility. It can also be hybridized with "indent" and a driver routine can
control the conversion, so that all the conversion utilities can be combined
to one. So, on input, the source syntax is selected, and on output the format
is driven in much the same way that it is with indent. It's an excellent
exercise in Text-To-AST-To-Text programming.

Each program, upon upward conversion to C99, would replace the original, once
it passes the consistency checks provided by the utility; so there isn't a
question of cueing error messages to the format of the older program, because
the older program would be replaced. Doing all of this is an example of
"refactoring" used to pay off "code debt". And there's a lot of code debt out
there that needs to be paid up.

Technical Debt (Wikipedia): https://en.wikipedia.org/wiki/Technical_debt
Code Refactoring (Wikipedia): https://en.wikipedia.org/wiki/Code_refactoring

[toc] | [prev] | [standalone]

Page 2 of 2 — ← Prev page 1 [2]

csiph-web

Supporting multiple input syntaxes

Contents

#2633

#2634

#2636