Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2554 > unrolled thread
| Started by | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| First post | 2020-08-12 15:20 -0700 |
| Last post | 2021-03-14 17:36 -0700 |
| Articles | 3 on this page of 23 — 12 participants |
Back to article view | Back to comp.compilers
Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-12 15:20 -0700
Re: Supporting multiple input syntaxes Kaz Kylheku <793-849-0957@kylheku.com> - 2020-08-13 00:43 +0000
Re: Supporting multiple input syntaxes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-08-13 10:27 +0200
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:36 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 22:16 -0700
Re: Supporting multiple input syntaxes minforth@arcor.de - 2020-08-13 14:15 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:37 -0700
RE: Supporting mulitple input syntaxes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-08-14 12:44 +0300
Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-15 06:42 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-15 15:20 -0700
Re: Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-16 02:21 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-20 14:45 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 19:35 -0700
Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 20:39 -0700
Re: implementation languages, was Supporting multiple input syntaxes Thomas Koenig <tkoenig@netcologne.de> - 2020-08-24 17:01 +0000
Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-28 10:56 -0700
Re: stack languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-24 12:14 -0700
Re: Supporting multiple input syntaxes gah4 <gah4@u.washington.edu> - 2020-08-23 14:26 -0700
Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-11 23:27 +0000
Re: Supporting multiple input syntaxes Elijah Stone <elronnd@elronnd.net> - 2021-02-17 01:44 -0800
Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-23 23:28 +0000
Re: Supporting multiple input syntaxes "ltc...@gmail.com" <ltcmelo@gmail.com> - 2021-03-14 21:08 -0400
Re: Supporting multiple input syntaxes Rock Brentwood <rockbrentwood@gmail.com> - 2021-03-14 17:36 -0700
Page 2 of 2 — ← Prev page 1 [2]
| From | antispam@math.uni.wroc.pl |
|---|---|
| Date | 2021-02-23 23:28 +0000 |
| Message-ID | <21-02-008@comp.compilers> |
| In reply to | #2630 |
Elijah Stone <elronnd@elronnd.net> wrote:
> On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote:
>
> > My impression is that variation in Pascal dialects is larger than in C
> > dialects, so case for unified parser in C IMHO
>
> Pascal is more fragmented, but it's also much easier to parse than C. I
> think it's a wash.
I did a C parser, it was not hard at all. I in C (like in standard
Pascal) there are conflicts, but that conflicts can be resolved
easily using semantic info. Alternativly, for C one can use 2
token lookahead. Turbo Pascal folks introduced "interesting"
difficulty with caret constants. Frank Heckenbach worked out
how to handle them and his analysis indicates that correct
handling of Turbo Pascal needs IIRC 6 tokens of lookahead.
Note that for both Pascal and C, with 1 token of lokahead
semantic info is available when needed to disambiguate
parsing, once you have more than 1 token of lokahead
semantic info is sometimes too late and in effect paser
must work purely syntactically.
> (I also think the whole idea is horrifying and ought not to be pursued;
> but.)
What you mean by "whole idea"? Do you think that creating
compiler that can correctly handle multiple dialects (Pascal
or other language) is wrong?
--
Waldek Hebisch
[toc] | [prev] | [next] | [standalone]
| From | "ltc...@gmail.com" <ltcmelo@gmail.com> |
|---|---|
| Date | 2021-03-14 21:08 -0400 |
| Message-ID | <21-03-001@comp.compilers> |
| In reply to | #2633 |
> Elijah Stone <elr...@elronnd.net> wrote:
> I did a C parser, it was not hard at all. I in C (like in standard
> Pascal) there are conflicts, but that conflicts can be resolved
> easily using semantic info. Alternativly, for C one can use 2
> token lookahead.
I'm not sure whether I captured the full context of your statement, but, if I
did, I don't think it's 100% correct:
- In regards to lookahead of 2:
This isn't enough to disambiguate, e.g., between a cast-expression in 6.5.4,
`( type-name ) cast-expression`, and a compound literal in 6.5.2, `( type-name
) { initializer-list }`.
- In regards to using semantic info:
Yes, with semantic info you can disambiguate things like `x * y;`, so I'd say
that, from a pragmatic/practical standpoint, this affirmation is right.
However, from a more theoretical perspective, a parser (thinking of it a
program that "simply" validates a sentence based on a grammar), isn't expected
— arguably — to rely on anything else other than syntax. Whether or not
the theoretical aspect of it is relevant, depends on the application of the
parser, I guess. For instance, for the implementation of static analysis tool,
not depending, as much as possible, on semantic information to guide parsing
is an advantage.
This is a table (only for expressions) that I recently put up when rewriting
the C parser of my project:
https://docs.google.com/spreadsheets/d/1oGjtFaqLzSoBEp2aGNgHrbEHxSi4Ijv57mXMP
ymZEcQ/edit?usp=sharing
--
Leandro T. C. Melo
[toc] | [prev] | [next] | [standalone]
| From | Rock Brentwood <rockbrentwood@gmail.com> |
|---|---|
| Date | 2021-03-14 17:36 -0700 |
| Message-ID | <21-03-003@comp.compilers> |
| In reply to | #2554 |
On Wednesday, August 12, 2020 at 5:32:56 PM UTC-5, luser droog wrote: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. > > How separate should these be? Should they be complete > separate grammars, or more piecewise selection? I'm in a similar situation with a utility that I want to grandfather in the old syntax for, but write with a new and better syntax. My recommendation is this: stick to C99, since that's already in POSIX. Write a separate utility to convert legacy syntax to C99 (and to call out any irregularities/inconsistencies in the program being converted). That's, like, "lint" on steroids. The other syntaxes would be used in the other utilities, only - one per utility. It can also be hybridized with "indent" and a driver routine can control the conversion, so that all the conversion utilities can be combined to one. So, on input, the source syntax is selected, and on output the format is driven in much the same way that it is with indent. It's an excellent exercise in Text-To-AST-To-Text programming. Each program, upon upward conversion to C99, would replace the original, once it passes the consistency checks provided by the utility; so there isn't a question of cueing error messages to the format of the older program, because the older program would be replaced. Doing all of this is an example of "refactoring" used to pay off "code debt". And there's a lot of code debt out there that needs to be paid up. Technical Debt (Wikipedia): https://en.wikipedia.org/wiki/Technical_debt Code Refactoring (Wikipedia): https://en.wikipedia.org/wiki/Code_refactoring
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.compilers
csiph-web