Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3641 > unrolled thread
| Started by | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| First post | 2025-04-05 23:22 +0000 |
| Last post | 2025-08-03 22:16 +0000 |
| Articles | 4 — 2 participants |
Back to article view | Back to comp.compilers
Prior art for operator parsing trick Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-05 23:22 +0000
Re: Prior art for operator parsing trick antispam@fricas.org - 2025-04-06 16:00 +0000
Re: Prior art for operator parsing trick Kaz Kylheku <643-408-1753@kylheku.com> - 2025-04-06 17:18 +0000
Re: Prior art for operator parsing trick Kaz Kylheku <643-408-1753@kylheku.com> - 2025-08-03 22:16 +0000
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-04-05 23:22 +0000 |
| Subject | Prior art for operator parsing trick |
| Message-ID | <25-04-002@comp.compilers> |
Hi all,
I recently came up with neat trick in order to obtain particular
(to me) desirable results out of an infix parser (which more
or less follows Shunting Yard).
The basic motivation is that I have certain monadic functions
like sin, cos, log, sqrt , ... installed as very low precedence
prefix operators. Thus:
sqrt x * x + y * y
means
sqrt (x * x + y * y)
Unfortunately, what it means is also that, say:
sqrt x + x + sqrt y + y
parses as:
sqrt (x + x + sqrt (y + y))
The symmetric-looking expression ends up with an assymetric parse.
(Some may disagree, but my requirement is what it is.)
I came up with a general modification to the algorithm so that we can
obtain the parse
sqrt (x + x) + sqrt (y + y)
Here is the modification to the algorithm.
When the loop is just about to process an infix operator O, we perform
these steps before anything else:
1. Determine whether the infix operator O is immediately followed by a
sequence of one or more consecutive prefix operators P1, P2.
2. If such a sequence is identified, we determine which of the Pi's
has lowest precedence, and we subtract one from that precedence.
3. Then we compare the precedence of infix operator O with the
precedence calculated in (3). If O has higher precedence, we
*demote* it to the calculated precedence. (Only the currently
occurring instance of that operator.)
4. We then continue with the usual algorithm, processing the operator
stack based on comparing precedence and building nodes in the
output tree and so on.
Some live examples:
1> (parse-infix '(sin x + x + cos y + y))
(+ (sin (+ x x))
(cos (+ y y)))
Works between - and *; unary - has higher precedence than
additive operators, but less than multiplicative:
2> (parse-infix '(- x * x * - y * y))
(* (- (* x x))
(- (* y y)))
This example illustrates when the algorithm finds that the precedence
calculated from the prefix operators is higher, so the infix operator
isn't demoted:
3> (parse-infix '(- x + x + - y + y)) ;; ... x + - y ...
(+ (+ (+ (- x) x)
(- y))
y)
This example shows when we have two prefix operators after the infix +,
namely - cos. The first one has higher precedence than +, but by looking
ahead to the second one, we calculate the demoted precedence correctly,
resulting in the parse that I would like in that case:
4> (parse-infix '(sin x + x + - cos y + y))
(+ (sin (+ x x))
(- (cos (+ y y))))
Someone must have come up with exactly the same thing before in all the
decades we have been doing this; but where have they written it up?
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[toc] | [next] | [standalone]
| From | antispam@fricas.org |
|---|---|
| Date | 2025-04-06 16:00 +0000 |
| Message-ID | <25-04-003@comp.compilers> |
| In reply to | #3641 |
Kaz Kylheku <643-408-1753@kylheku.com> wrote:
w> I recently came up with neat trick in order to obtain particular
> (to me) desirable results out of an infix parser (which more
> or less follows Shunting Yard).
>
> The basic motivation is that I have certain monadic functions
> like sin, cos, log, sqrt , ... installed as very low precedence
> prefix operators. Thus:
>
> sqrt x * x + y * y
>
> means
>
> sqrt (x * x + y * y)
>
> Unfortunately, what it means is also that, say:
>
> sqrt x + x + sqrt y + y
>
> parses as:
>
> sqrt (x + x + sqrt (y + y))
>
> The symmetric-looking expression ends up with an assymetric parse.
> (Some may disagree, but my requirement is what it is.)
>
> I came up with a general modification to the algorithm so that we can
> obtain the parse
>
> sqrt (x + x) + sqrt (y + y)
>
> Here is the modification to the algorithm.
>
> When the loop is just about to process an infix operator O, we perform
> these steps before anything else:
>
> 1. Determine whether the infix operator O is immediately followed by a
> sequence of one or more consecutive prefix operators P1, P2.
>
> 2. If such a sequence is identified, we determine which of the Pi's
> has lowest precedence, and we subtract one from that precedence.
>
> 3. Then we compare the precedence of infix operator O with the
> precedence calculated in (3). If O has higher precedence, we
> *demote* it to the calculated precedence. (Only the currently
> occurring instance of that operator.)
>
> 4. We then continue with the usual algorithm, processing the operator
> stack based on comparing precedence and building nodes in the
> output tree and so on.
>
> Some live examples:
>
> 1> (parse-infix '(sin x + x + cos y + y))
> (+ (sin (+ x x))
> (cos (+ y y)))
>
> Works between - and *; unary - has higher precedence than
> additive operators, but less than multiplicative:
>
> 2> (parse-infix '(- x * x * - y * y))
> (* (- (* x x))
> (- (* y y)))
>
> This example illustrates when the algorithm finds that the precedence
> calculated from the prefix operators is higher, so the infix operator
> isn't demoted:
>
> 3> (parse-infix '(- x + x + - y + y)) ;; ... x + - y ...
> (+ (+ (+ (- x) x)
> (- y))
> y)
>
> This example shows when we have two prefix operators after the infix +,
> namely - cos. The first one has higher precedence than +, but by looking
> ahead to the second one, we calculate the demoted precedence correctly,
> resulting in the parse that I would like in that case:
>
> 4> (parse-infix '(sin x + x + - cos y + y))
> (+ (sin (+ x x))
> (- (cos (+ y y))))
>
> Someone must have come up with exactly the same thing before in all the
> decades we have been doing this; but where have they written it up?
Do you mean that
sin x + y + - - sin z
should have the same meaning as
sin(x + y) + (- (- sin(z)))
but
sin x + y + - - z
should have the same meaning as
sin(x + y + (- (- z)))
and similarly for longer chains of '-' signs? If yes, this needs
unbounded lookahead, which make any method to handle it rather
nonstandard. And frankly, such parsing looks rather obscure,
so is is possible that there are no prior art in this specific case.
There is paper "Noncanonical Extensions of LR Parsing Methods" by
M. Hutton:
https://www.eecg.toronto.edu/~mdhutton/papers/noncan.pdf
and I think that parsing expressions in your way is an example
of "LR-Regular Parsing". However, what Hutton presents is
much more general and described in terms of LR-automata and
not in terms of precedence.
--
Waldek Hebisch
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-04-06 17:18 +0000 |
| Message-ID | <25-04-004@comp.compilers> |
| In reply to | #3642 |
On 2025-04-06, antispam@fricas.org <antispam@fricas.org> wrote: >> Someone must have come up with exactly the same thing before in all the >> decades we have been doing this; but where have they written it up? > > Do you mean that > > sin x + y + - - sin z > > should have the same meaning as > > sin(x + y) + (- (- sin(z))) > > but > > sin x + y + - - z > > should have the same meaning as > > sin(x + y + (- (- z))) Your understanding is correct. I have assigned (as I mentioned) a precedence to the prefix - (a.k.a unary minus) which is higher than that of the binary +. Therefore, no precedence demotion of + takes place in the second example. 1> (parse-infix '(sin x + y + - - sin z)) (+ (sin (+ x y)) (- (- (sin z)))) 2> (parse-infix '(sin x + y + - - z)) (sin (+ (+ x y) (- (- z)))) > and similarly for longer chains of '-' signs? If yes, this needs > unbounded lookahead, That is correct. If we have a chain of 10,000 consecutive prefix operators immediately after an infix operator, they all have to be examined to find the lowest precedence. > which make any method to handle it rather > nonstandard. I think, say, LALR(1) can't handle this even if we just look ahead at one prefix operator, because of the dynamic precedence reassignment. LALR(1) wants to reduce its stack based on the identity of the lookahead token. The unmodified algorithm has no provision for conveying some property of the loakahead token such that it temporarily rearranges some aspect of the grammar. > And frankly, such parsing looks rather obscure, > so is is possible that there are no prior art in this specific case. > > There is paper "Noncanonical Extensions of LR Parsing Methods" by > M. Hutton: > > https://www.eecg.toronto.edu/~mdhutton/papers/noncan.pdf > > and I think that parsing expressions in your way is an example > of "LR-Regular Parsing". However, what Hutton presents is > much more general and described in terms of LR-automata and > not in terms of precedence. Thank you very much; I will have a look. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <643-408-1753@kylheku.com> |
|---|---|
| Date | 2025-08-03 22:16 +0000 |
| Message-ID | <25-08-001@comp.compilers> |
| In reply to | #3641 |
On 2025-04-05, Kaz Kylheku <643-408-1753@kylheku.com> wrote: > Hi all, > > I recently came up with neat trick in order to obtain particular > (to me) desirable results out of an infix parser (which more > or less follows Shunting Yard). This "precedence demotion" algorithm turns out to have some undesirable properties, leading to unintuitive parses, such as, oh, a + sqrt b * sqrt c parsing as (a + sqrt b) * (sqrt c). This led me to search for another way to capture the good stuff while avoiding the bad. > When the loop is just about to process an infix operator O, we perform > these steps before anything else: > > 1. Determine whether the infix operator O is immediately followed by a > sequence of one or more consecutive prefix operators P1, P2. We keep this step. > 2. If such a sequence is identified, we determine which of the Pi's > has lowest precedence, and we subtract one from that precedence. We also keep this, but don't subtract one from the prededence. > 3. Then we compare the precedence of infix operator O with the > precedence calculated in (3). If O has higher precedence, we > *demote* it to the calculated precedence. (Only the currently > occurring instance of that operator.) We replace this step entirely. No dynamic precedence adjustment takes place. Instead: 3. We search the operator stack to find an prefix operator, which has a precdedence at least as high as the one determined in 2. If such an operator exists, we perform reductions until we remove it. If no such operator exists in the stack, we do nothing. > 4. We then continue with the usual algorithm, processing the operator > stack based on comparing precedence and building nodes in the > output tree and so on. > Some live examples: > > 1> (parse-infix '(sin x + x + cos y + y)) > (+ (sin (+ x x)) > (cos (+ y y))) [...] > 2> (parse-infix '(- x * x * - y * y)) > (* (- (* x x)) > (- (* y y))) [...] > 3> (parse-infix '(- x + x + - y + y)) ;; ... x + - y ... > (+ (+ (+ (- x) x) > (- y)) > y) [...] > 4> (parse-infix '(sin x + x + - cos y + y)) > (+ (sin (+ x x)) > (- (cos (+ y y)))) I get exactly the same output on these, and all my existing test cases pass. Effectively we are still demoting the precedence of the infix operator, but in a surgical way: we are giving it lower precedence than specifically identified unparsed material to the left in the parsing stack, by doing reductions until we clear that material. E.g. in the case of (sin x + x + - cos y + y)) At the second + , we see a sequence of prefix operators to the right, (- cos). The lowest precedence among them is that of cos. We the search the parsing stack until we find a prefix operator at least as low, and so sin is identified. What is going in is a kind of bracketing. We see some prefix operators to the right, and so we scan the left context to find a parallel prefix operator whose parse is not yet complete. If we find it, we then close off that operator, reducing it to a syntax node. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [standalone]
Back to top | Article view | comp.compilers
csiph-web