Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2554 > unrolled thread
| Started by | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| First post | 2020-08-12 15:20 -0700 |
| Last post | 2021-03-14 17:36 -0700 |
| Articles | 20 on this page of 23 — 12 participants |
Back to article view | Back to comp.compilers
Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-12 15:20 -0700
Re: Supporting multiple input syntaxes Kaz Kylheku <793-849-0957@kylheku.com> - 2020-08-13 00:43 +0000
Re: Supporting multiple input syntaxes Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-08-13 10:27 +0200
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:36 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 22:16 -0700
Re: Supporting multiple input syntaxes minforth@arcor.de - 2020-08-13 14:15 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-13 21:37 -0700
RE: Supporting mulitple input syntaxes Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-08-14 12:44 +0300
Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-15 06:42 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-15 15:20 -0700
Re: Supporting multiple input syntaxes David Lovemore <davidlovemore@gmail.com> - 2020-08-16 02:21 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-20 14:45 -0700
Re: Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 19:35 -0700
Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-23 20:39 -0700
Re: implementation languages, was Supporting multiple input syntaxes Thomas Koenig <tkoenig@netcologne.de> - 2020-08-24 17:01 +0000
Re: implementation languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-28 10:56 -0700
Re: stack languages, was Supporting multiple input syntaxes luser droog <mijoryx@yahoo.com.dmarc.email> - 2020-08-24 12:14 -0700
Re: Supporting multiple input syntaxes gah4 <gah4@u.washington.edu> - 2020-08-23 14:26 -0700
Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-11 23:27 +0000
Re: Supporting multiple input syntaxes Elijah Stone <elronnd@elronnd.net> - 2021-02-17 01:44 -0800
Re: Supporting multiple input syntaxes antispam@math.uni.wroc.pl - 2021-02-23 23:28 +0000
Re: Supporting multiple input syntaxes "ltc...@gmail.com" <ltcmelo@gmail.com> - 2021-03-14 21:08 -0400
Re: Supporting multiple input syntaxes Rock Brentwood <rockbrentwood@gmail.com> - 2021-03-14 17:36 -0700
Page 1 of 2 [1] 2 Next page →
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-12 15:20 -0700 |
| Subject | Supporting multiple input syntaxes |
| Message-ID | <20-08-002@comp.compilers> |
I've got my project successfully parsing the circa-1975 C syntax from that old manual. I'd like to add parsers for K&R1 and c90 syntaxes. How separate should these be? Should they be complete separate grammars, or more piecewise selection? My feeling is that separating them will be less headache, but maybe there's some advantage to changing out smaller pieces of the grammar in that it might be easier to make sure that they produce the same structure compatible with the backend. Any guidance in this area? https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c [Really, it's up to you. My inclination would be to make them separate but use some sort of macro setup so you can insert common pieces into each of the grammars. -John]
[toc] | [next] | [standalone]
| From | Kaz Kylheku <793-849-0957@kylheku.com> |
|---|---|
| Date | 2020-08-13 00:43 +0000 |
| Message-ID | <20-08-003@comp.compilers> |
| In reply to | #2554 |
On 2020-08-12, luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
>
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
>
> Any guidance in this area?
>
> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
I'd say that since you're not using a parser generator, but using code
statements to construct the grammar objects at initialization time, you
have the flexibility to merge the implementation, because you can check
the value of some dialect-selecting variable, and construct the parser
accordingly, and elsewhere check that same variable to do whatever else
needs to be done conditionally.
The trick is to find a way to embed the *semantics* of the older dialects
into the new so then everything after the parsing can be shared.
Similar remarks would apply to recursive descent.
If you were using something clunky like a Yacc, there are still ways
to combine everything into a single grammar. The input stream can be
primed with one of several "secret internal token" objects that has no
lexeme. (Primed, meaming that the first call to the lexer yields this
secret token instead of processing actual input.) Each token indicates
a dialect to parse. The top-level grammar production can then pick
one of several subordinate production rules corresponding to the entry
points for the respective dialects. Those can then share common rules
as much as possible.
translation_unit : C75_TOKEN c75_translation_unit /* orig flavor */
| C79_TOKEN c79_translation_unit /* "K&R" */
| C90_TOKEN c90_translation_unit /* ANSI/ISO */
;
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
[toc] | [prev] | [next] | [standalone]
| From | Hans-Peter Diettrich <DrDiettrich1@netscape.net> |
|---|---|
| Date | 2020-08-13 10:27 +0200 |
| Message-ID | <20-08-004@comp.compilers> |
| In reply to | #2554 |
Am 13.08.2020 um 00:20 schrieb luser droog: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. > > How separate should these be? Should they be complete > separate grammars, or more piecewise selection? IMO this depends widely on the usage of the parser output (diagnostics, backend...). C90 is much stricter than K&R, requires more checks. Do you need extensive error diagnostics, or do you assume that all source code is free of errors? > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an LL(1) parser with one or two locations where more lookahead is required. Also identifiers are classified as typenames and others prior to their usage. For real-world testing (recommended!) a preprocessor is required and a copy of the standard libraries of existing compiler(s). Your test_syntax() source misses "=" from the variable declarations (initializers). What about pointer syntax/semantics? If you add these (and other) syntax differences conditionally (version specific) to your code, which way would look better to you? Which way will be safer to maintain? Nice code BTW :-) DoDi
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-13 21:36 -0700 |
| Message-ID | <20-08-006@comp.compilers> |
| In reply to | #2556 |
On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote: > Am 13.08.2020 um 00:20 schrieb luser droog: > > I've got my project successfully parsing the circa-1975 C syntax > > from that old manual. I'd like to add parsers for K&R1 and c90 > > syntaxes. > > > > How separate should these be? Should they be complete > > separate grammars, or more piecewise selection? > > IMO this depends widely on the usage of the parser output (diagnostics, > backend...). C90 is much stricter than K&R, requires more checks. Do you > need extensive error diagnostics, or do you assume that all source code > is free of errors? > > > > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c > > You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an > LL(1) parser with one or two locations where more lookahead is required. > Also identifiers are classified as typenames and others prior to their > usage. > Yes, it's basically LL(1) with backtracking. There's one part of the grammar I'm using that's left-recursive and I still need to work that out. > For real-world testing (recommended!) a preprocessor is required and a > copy of the standard libraries of existing compiler(s). > > Your test_syntax() source misses "=" from the variable declarations > (initializers). What about pointer syntax/semantics? If you add these > (and other) syntax differences conditionally (version specific) to your > code, which way would look better to you? Which way will be safer to > maintain? > That's actually correct for the 1975 dialect: no '=' to initialize variables. I think it's pretty ugly without it, but it could be removed anyway for the AST. > > Nice code BTW :-) > Thanks! I think I need to sidetrack a bit and work up some primitives for pattern matching and decomposition to make the backend easier. I'll report back if/when it can do more tricks.
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-23 22:16 -0700 |
| Message-ID | <20-08-016@comp.compilers> |
| In reply to | #2556 |
On Thursday, August 13, 2020 at 5:22:51 PM UTC-5, Hans-Peter Diettrich wrote: > Am 13.08.2020 um 00:20 schrieb luser droog: > > I've got my project successfully parsing the circa-1975 C syntax > > from that old manual. I'd like to add parsers for K&R1 and c90 > > syntaxes. > > > > How separate should these be? Should they be complete > > separate grammars, or more piecewise selection? > > IMO this depends widely on the usage of the parser output (diagnostics, > backend...). C90 is much stricter than K&R, requires more checks. Do you > need extensive error diagnostics, or do you assume that all source code > is free of errors? > > > > https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c > > You seem to implement an LL(1) parser? My C98 Parser is LL(2), i.e. an > LL(1) parser with one or two locations where more lookahead is required. In which places do you need more lookahead? Btw, some of my reading describes my parsers as LL(infinity) because of the backtracking.
[toc] | [prev] | [next] | [standalone]
| From | minforth@arcor.de |
|---|---|
| Date | 2020-08-13 14:15 -0700 |
| Message-ID | <20-08-005@comp.compilers> |
| In reply to | #2554 |
Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. > > How separate should these be? Should they be complete > separate grammars, or more piecewise selection? ... Why not settle for one master dialect and use awk to translate between dialects? [Probably because there is a great deal of C code written to comply with the various versions of the standard, users want error messages that match the code they wrote rather than some intermediate code, and it would be quite an awk program that could reconcile all the differences among C flavors. -John]
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-13 21:37 -0700 |
| Message-ID | <20-08-007@comp.compilers> |
| In reply to | #2557 |
On Thursday, August 13, 2020 at 5:24:36 PM UTC-5, minf...@arcor.de wrote: > Am Donnerstag, 13. August 2020 00:32:56 UTC+2 schrieb luser droog: > > I've got my project successfully parsing the circa-1975 C syntax > > from that old manual. I'd like to add parsers for K&R1 and c90 > > syntaxes. > > > > How separate should these be? Should they be complete > > separate grammars, or more piecewise selection? ... > > Why not settle for one master dialect and use awk to translate between dialects? > > [Probably because there is a great deal of C code written to comply with > the various versions of the standard, users want error messages that match > the code they wrote rather than some intermediate code, and it would be quite > an awk program that could reconcile all the differences among C flavors. -John] One of the possible goals for this project is exactly such a translator that can downgrade or upgrade code from one standard version to another. Another possible application is a source code formatter. Currently the CST produced by the parser keeps all the original whitespace attached to each token.
[toc] | [prev] | [next] | [standalone]
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
|---|---|
| Date | 2020-08-14 12:44 +0300 |
| Subject | RE: Supporting mulitple input syntaxes |
| Message-ID | <20-08-008@comp.compilers> |
| In reply to | #2557 |
We did something similar with Yacc++. We used inheritance of grammars (a feature of our product) to do so. In fact, the point of the exercise was to demonstrate that feature. I would presume something similar would work in a hand-written recursive descent parser. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | David Lovemore <davidlovemore@gmail.com> |
|---|---|
| Date | 2020-08-15 06:42 -0700 |
| Message-ID | <20-08-009@comp.compilers> |
| In reply to | #2554 |
It may be useful to consider what you would like to happen if you encounter a syntax that is ambiguous or works differently or is for another expected syntax from what you are parsing: produce a warning, error or handle quietly, or fall over, or don’t care.
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-15 15:20 -0700 |
| Message-ID | <20-08-010@comp.compilers> |
| In reply to | #2561 |
On Saturday, August 15, 2020 at 9:46:47 AM UTC-5, David Lovemore wrote: > It may be useful to consider what you would like to happen if you encounter a > syntax that is ambiguous or works differently or is for another expected > syntax from what you are parsing: produce a warning, error or handle quietly, > or fall over, or don’t care. Very good points. The parser is backtracking, returning a list of results. That could conceivably be useful for dealing with ambiguity by looking at more than just the first result. Warnings and error messages are going to be trickier I fear. The parser is built around the idea of Monadic combinators. So my research suggests that I'll need Monad Transformers to add the state needed for good messages. There are a bunch of lectures I've found about using these in Haskell, but not much about how to build them from scratch. As it is, any error in parsing will just produce no results. I started a prototype where the input list of characters was instead a list of (character, line-number, line-position). But it got really confusing at the time so I stopped. And the few times I've tried to look at it, I can't figure out what I was thinking.
[toc] | [prev] | [next] | [standalone]
| From | David Lovemore <davidlovemore@gmail.com> |
|---|---|
| Date | 2020-08-16 02:21 -0700 |
| Message-ID | <20-08-011@comp.compilers> |
| In reply to | #2562 |
My friend, reporting the furthest position examined by the parser I have useful in error cases as a simple stop gap when using a combinator approach. Thinking about it you kind of want to see the furthest failed position and the stack of rules above it. Such requires meta information when the code is written in the most natural way. For this reason and others I believe it is good to represent your grammar in data structures which is further in the direction of a compiler compiler tool (or compiler interpreter tool).
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-20 14:45 -0700 |
| Message-ID | <20-08-012@comp.compilers> |
| In reply to | #2563 |
On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote: > My friend, reporting the furthest position examined by the parser I have [found] > useful in error cases as a simple stop gap when using a combinator approach. > > Thinking about it you kind of want to see the furthest failed position and the > stack of rules above it. Such requires meta information when the code is > written in the most natural way. For this reason and others I believe it is > good to represent your grammar in data structures which is further in the > direction of a compiler compiler tool (or compiler interpreter tool). Thanks. I've done some further investigating. I built my parsers following two papers. Hutton and Meijer, Monadic Parser Combinators https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf and Hutton, Higher-Order Functions for Parsing https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf The first adds error reporting using Monad Transformers. I'm thinking about how to move in this direction, but first I'd need to reformulate the code to make the Monad more explicit. It should be something like an interface or a mixin, like a base class with all virtual member functions. That could be done by modelling my objects more like OO objects and have 'bind' and 'result' in a vtable in the Parser object. But the second paper does it differently, and maybe something I can do more easily. It redefines the parsers to no longer produce a list of results, so there's no longer support for ambiguity. Then it defines them to return a Maybe, maybe * ::= Fail [char] | Error [char] | OK * . where the OK branch has the parse tree, and Fail or Error both contain an error message. It describes how a Fail can be transformed into an Error. But it isn't entirely clear where the messages get injected. Still need to do some thinking on it, but I think I can rewrite the parsers to follow this model, and then decorate my grammar with possible errors at each node. Thanks for the encouragement. My classes start on Monday so I'm hoping to accomplish something on this before then.
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-23 19:35 -0700 |
| Message-ID | <20-08-014@comp.compilers> |
| In reply to | #2564 |
On Sunday, August 23, 2020 at 1:39:30 PM UTC-5, luser droog wrote:
> On Sunday, August 16, 2020 at 10:53:24 AM UTC-5, davidl...@gmail.com wrote:
> > My friend, reporting the furthest position examined by the parser I have [found]
> > useful in error cases as a simple stop gap when using a combinator approach.
> >
> > Thinking about it you kind of want to see the furthest failed position and the
> > stack of rules above it. Such requires meta information when the code is
> > written in the most natural way. For this reason and others I believe it is
> > good to represent your grammar in data structures which is further in the
> > direction of a compiler compiler tool (or compiler interpreter tool).
>
> Thanks. I've done some further investigating. I built my parsers following
> two papers. Hutton and Meijer, Monadic Parser Combinators
> https://www.cs.nott.ac.uk/~pszgmh/monparsing.pdf
> and Hutton, Higher-Order Functions for Parsing
> https://pdfs.semanticscholar.org/6669/f223fba59edaeed7fabe02b667809a5744d9.pdf
>
> The first adds error reporting using Monad Transformers. [...]>
> But the second paper does it differently, and maybe something I can do
> more easily. It redefines the parsers to no longer produce a list of results,
> so there's no longer support for ambiguity. Then it defines them to
> return a Maybe,
>
> maybe * ::= Fail [char] | Error [char] | OK *
> .
> where the OK branch has the parse tree, and Fail or Error both contain an error
> message. It describes how a Fail can be transformed into an Error. But it isn't
> entirely clear where the messages get injected.
>
> Still need to do some thinking on it, but I think I can rewrite the parsers
> to follow this model, and then decorate my grammar with possible errors
> at each node.
I've made some progress. I wrote a new prototype following Hutton and
modified it to add position information to the character stream.
And then rewrote the parsers to produce the maybe structure and then
to collect rudimentary error messages.
For these, a positive and negative case in PostScript,
0 0 (abcd\ne) input-pos
(abc) str exec
pc
0 0 (abcd\ne) input-pos
(abd) str nofail exec
pq
I get this output:
$ gsnd -q -dNOSAFER pc11.ps
stack:
[/OK [[(a) [(b) [(c) []]]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]
stack:
[/Error [[(after) (a)] [[(after) (b)] [[{(d) eq} (not satisfied)] [[(c) [0 2]] [[(d) [0 3]] [[(\n) [0 4]] [[(e) [1 0]] null]]]]]]]]
So, this indicates that I *can* modify my C parsers to produce error
messages. The remaining input list has the position information for
where the error occurred, [(c) [0 2]].
Following the prototype, I modified the input functions to add positions
for each character and modified the base parser item() to detect and
remove the position stuff before it passes into the rest of the machinery.
The next hurdle was making the extra position information work with
the Unicode filters ucs4_from_utf8() and utf8_from_ucs4(). And that
all appears to be working now.
But that's probably the end of the story for now. Got to gear up for
Operating Systems and Advanced Web Dev with Jave.
Thanks to everyone for the help, esp. Kaz with the brilliant suggestion
to pass a language id token between tokenizer and parser.
Ps. the prototype is written in PostScript extended with function syntax.
https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps
https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript
--
l droog
[Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John]
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-23 20:39 -0700 |
| Subject | Re: implementation languages, was Supporting multiple input syntaxes |
| Message-ID | <20-08-015@comp.compilers> |
| In reply to | #2566 |
On Sunday, August 23, 2020 at 10:15:06 PM UTC-5, luser droog wrote: > Thanks to everyone for the help, esp. Kaz with the brilliant suggestion > to pass a language id token between tokenizer and parser. > > > Ps. the prototype is written in PostScript extended with function syntax. > https://github.com/luser-dr00g/pcomb/blob/master/ps/pc11.ps > https://codereview.stackexchange.com/questions/193520/an-enhanced-syntax-for-defining-functions-in-postscript > > -- > l droog > [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John] I discovered PostScript around '97 or '98. I was taking Computer Graphics and it was in an Appendix to the textbook (Salman). At the same time I was editor of the Honors College student magazine so it really piqued my interest as a graphics and typography language. But the language itself I just really enjoy. It's my "Lego blocks" language. The RPN syntax removes all ambiguity about precedence and sequencing. It has the same code=data properties as Lisp. Application code can read from the program stream. It has strings, arrays and dictionaries. It has first class procedures which can be constructed on the fly. I've found it a nice playpen for syntax extension. I was also on a many-decades long crusade to never use MS Word after that /first/ time they screwed everyone by changing the interface. And PostScript has slowly become my tool for that as my programming skill grew. https://github.com/luser-dr00g/ibis.ps On another front, I wanted to have parsers in PostScript so I could evaluate infix math expressions. And I wanted regular expression matching in PS thinking it would help to implement algorithmic hyphenation of text. [Take a look at Forth. Many of the same advantages, runs a lot more places. -John]
[toc] | [prev] | [next] | [standalone]
| From | Thomas Koenig <tkoenig@netcologne.de> |
|---|---|
| Date | 2020-08-24 17:01 +0000 |
| Subject | Re: implementation languages, was Supporting multiple input syntaxes |
| Message-ID | <20-08-017@comp.compilers> |
| In reply to | #2567 |
luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> schrieb:
[PostScript]
> But the language itself I just really enjoy. It's my "Lego blocks"
> language. The RPN syntax removes all ambiguity about precedence and
> sequencing.
I recently had the doubtful pleasure of evaluating the formula
x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
+(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))
in Postscript. (Yes, really. Don't ask.)
It was the first time in more than a decade that I wrote a
flex/bison grammar (mostly copied from the bison manual). It was
faster and less error-prone than trying to do it directly.
The grammar actually generated fairly unidiomatic PostScript
because I made it give names to all the variables, so
a-b
became
a b sub
I'm sure a real PostScript aficionado would have done it all
on the stack :-)
[Turning infix into RPN is a pretty basic intro compiler course exercise. Conceptually,
you make the parse tree and then do a postorder tree walk. Or if you'rs using yacc or
bison, in the expression grammar you just print the operator or token in the code for each
rule because they are recognized in the right order, e.g.:
expr: VARIABLE { printf(" %s ", $1): }
| expr '+' expr ( printf(" add "); }
| expr '-' expr ( printf(" sub "); }
| '(' expr ')' { /* print nothing */ }
-John]
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-28 10:56 -0700 |
| Subject | Re: implementation languages, was Supporting multiple input syntaxes |
| Message-ID | <20-08-019@comp.compilers> |
| In reply to | #2569 |
On Monday, August 24, 2020 at 2:12:13 PM UTC-5, Thomas Koenig wrote:
> luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email.dmarc.email> schrieb:
>
> [PostScript]
>
> > But the language itself I just really enjoy. It's my "Lego blocks"
> > language. The RPN syntax removes all ambiguity about precedence and
> > sequencing.
>
> I recently had the doubtful pleasure of evaluating the formula
>
> x = ((a-b)*c^2+(-d^2+e^2-a^2+b^2)*c+a^2*b+(f^2-e^2-b^2)*a
> +(-f^2+d^2)*b)/((-2*d+2*e)*c+(2*f-2*e)*a-2*b*(f-d))
>
> in Postscript. (Yes, really. Don't ask.)
>
In case you need it, I've got a PostScript debugger that can single
step into loops and procedures.
https://github.com/luser-dr00g/debug.ps
$ gsnd db5.ps
GPL Ghostscript 9.52 (2020-03-19)
Copyright (C) 2020 Artifex Software, Inc. All rights reserved.
This software is supplied under the GNU AGPLv3 and comes with NO WARRANTY:
see the file COPYING for details.
GS>{ 0 0 1 5 { add } for } stepon traceon debug
%|-
0 %|- 0
0 %|- 0 0
1 %|- 0 0 1
5 %|- 0 0 1 5
{add} %|- 0 0 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?s
%|- 0 0
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 0
1 %|- 0 1
1 %|- 0 1 1
5 %|- 0 1 1 5
{add} %|- 0 1 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
%|- 0 1
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 1
2 %|- 1 2
1 %|- 1 2 1
5 %|- 1 2 1 5
{add} %|- 1 2 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
%|- 1 2
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 3
3 %|- 3 3
1 %|- 3 3 1
5 %|- 3 3 1 5
{add} %|- 3 3 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
%|- 3 3
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 6
4 %|- 6 4
1 %|- 6 4 1
5 %|- 6 4 1 5
{add} %|- 6 4 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
%|- 6 4
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 10
5 %|- 10 5
1 %|- 10 5 1
5 %|- 10 5 1 5
{add} %|- 10 5 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
%|- 10 5
add nametype object add step: (continue|next|bypass|step|prompt|quit)?
%|- 15
6 %|- 15 6
1 %|- 15 6 1
5 %|- 15 6 1 5
{add} %|- 15 6 1 5 {add}
for nametype object for step: (continue|next|bypass|step|prompt|quit)?
GS<1>==
15
GS>quit
[This is drifting rather far from compilers now. -John]
[toc] | [prev] | [next] | [standalone]
| From | luser droog <mijoryx@yahoo.com.dmarc.email> |
|---|---|
| Date | 2020-08-24 12:14 -0700 |
| Subject | Re: stack languages, was Supporting multiple input syntaxes |
| Message-ID | <20-08-018@comp.compilers> |
| In reply to | #2567 |
On Monday, August 24, 2020 at 10:42:33 AM UTC-5, luser droog wrote: > > [Why Postscript? I realize it's Turing complete, but it seems odd to run ones parser on a printer. -John] > > I discovered PostScript around '97 or '98. I was taking Computer Graphics > and it was in an Appendix to the textbook (Salman). At the same time > I was editor of the Honors College student magazine so it really piqued > my interest as a graphics and typography language. ... > [Take a look at Forth. Many of the same advantages, runs a lot more places. -John] Good suggestion. I have looked at Forth quite a bit. I lurked in comp.lang.forth for a number of years. I've got a half-written interpreter that stalled because my vm doesn't have any I/O. https://groups.google.com/d/topic/comp.lang.forth/Y1XlX8wD3RQ/discussion https://retrocomputing.stackexchange.com/questions/6610/how-to-do-i-o-with-emulation-code I went down a wild rabbit hole after discovering the document "X86 is an octal machine" and tried to recode the assembler macros using more octal. But I kind of stalled on that whole area since the first thing I would want in Forth is tagged objects like PostScript has. There's Oforth and 8th which both supply that sort of thing, but then I'd probably miss the PS graphics functions.,,. :) I've also played with APL and tried writing a few interpreters for it. But the common thread among all these interpreters was coding them all in C. So I turned my attention to compiling and analyzing C code. A friend of mine was wanting a really customizable C formatter so I thought I might be able to make a tool to accommodate lots of different backends for doing something with the parse tree or syntax tree. I want to be able to write C99 code and transpile it automatically to something that will work with the MS compiler without having to maintain any MS business in the "master" source.
[toc] | [prev] | [next] | [standalone]
| From | gah4 <gah4@u.washington.edu> |
|---|---|
| Date | 2020-08-23 14:26 -0700 |
| Message-ID | <20-08-013@comp.compilers> |
| In reply to | #2554 |
On Wednesday, August 12, 2020 at 3:32:56 PM UTC-7, luser droog wrote: > I've got my project successfully parsing the circa-1975 C syntax > from that old manual. I'd like to add parsers for K&R1 and c90 > syntaxes. (snip) I remember using a later C compiler that accepted the older form of assignment operators, such as =+, =-, etc. Presumably for those with older code. I then had to put an extra space when assigning negative values: i= -4; I don't remember now how I found out about that one. That is the only old C syntax I remember. (And much later than 1975.)
[toc] | [prev] | [next] | [standalone]
| From | antispam@math.uni.wroc.pl |
|---|---|
| Date | 2021-02-11 23:27 +0000 |
| Message-ID | <21-02-004@comp.compilers> |
| In reply to | #2554 |
luser droog <mijoryx@yahoo.com.dmarc.email.dmarc.email> wrote:
> I've got my project successfully parsing the circa-1975 C syntax
> from that old manual. I'd like to add parsers for K&R1 and c90
> syntaxes.
>
> How separate should these be? Should they be complete
> separate grammars, or more piecewise selection?
>
> My feeling is that separating them will be less headache, but maybe
> there's some advantage to changing out smaller pieces of the grammar
> in that it might be easier to make sure that they produce the same
> structure compatible with the backend.
>
> Any guidance in this area?
>
> https://github.com/luser-dr00g/pcomb/blob/master/pc9syn.c
>
> [Really, it's up to you. My inclination would be to make them
> separate but use some sort of macro setup so you can insert
> common pieces into each of the grammars. -John]
Gnu Pascal supports several Pascal dialects. Gnu Pascal uses
unified parser for all dialects. Some ideas used:
- flags in scanner decide if dialect specific tokens are
recognized
- superset parsing: several constructs are generalized so
that single construct represents things that othewise
would lead to conflits. Later semantic stage looks at
dialects flags, prunes things not allowed in given
dialect. Example of superset contruction is rule
'call_or_cast', it handles several syntactically similar
constructs that are usually given by separate syntax
rules. Semantic rules beside dialect flags use types to
decide of meaning.
- even after usin two tricks above grammar still have
LALR conflicts, they are resolved using GLR option
of Bison. All conflicts are resolvable using lookahead,
and AFAICS some are only resolvable with lookahead.
Parser lookahead means that traditional trick of
passing semantic info back to scanner does not work
(parser actions are delayed, so scanner may be forced
to produce token before semantic info is available).
Still, it seems that GLR leads to cleaner parser.
My impression is that variation in Pascal dialects is larger
than in C dialects, so case for unified parser in C IMHO
is much stronger. OTOH Gnu Pascal is full compiler with
semantic actions invoked from grammar rules. Semantic code
embedded in the parser changed much more than grammar rules,
so maintaining separate parsers probably would be a
nightmare.
--
Waldek Hebisch
[toc] | [prev] | [next] | [standalone]
| From | Elijah Stone <elronnd@elronnd.net> |
|---|---|
| Date | 2021-02-17 01:44 -0800 |
| Message-ID | <21-02-005@comp.compilers> |
| In reply to | #2629 |
On Thu, 11 Feb 2021, antispam@math.uni.wroc.pl wrote: > My impression is that variation in Pascal dialects is larger than in C > dialects, so case for unified parser in C IMHO Pascal is more fragmented, but it's also much easier to parse than C. I think it's a wash. (I also think the whole idea is horrifying and ought not to be pursued; but.) -E
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.compilers
csiph-web