Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Kaz Kylheku <864-117-4973@kylheku.com> Newsgroups: comp.compilers Subject: Re: Deep expression chain performance Date: Thu, 27 Apr 2023 00:28:55 -0000 (UTC) Organization: A noiseless patient Spider Sender: johnl@iecc.com Approved: comp.compilers@iecc.com Message-ID: <23-04-015@comp.compilers> References: <23-04-012@comp.compilers> Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="23892"; mail-complaints-to="abuse@iecc.com" Keywords: parse, performance Posted-Date: 26 Apr 2023 20:42:41 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3456 On 2023-04-25, Andy wrote: > I am writing C grammar. > Grammar speed may be down caused by deep expression chain. > For example, simple "n=0" has 20 levels: > assignmentExpression > conditionalExpression > logicalOrExpression > logicalAndExpression > inclusiveOrExpression > exclusiveOrExpression > andExpression > eqaulityExpression > relationalExpression > shiftExpression > additiveExpression > multiplicativeExpression > castExpression > unaryExpression > postfixExpression > postfixExpressionLeft > atom > literal > IntegerConstant > DeciamlConstant ISO C specifies its grammar in an obtusely verbose way, similar to the above. The reason is that it makes the grammar unambiguous, meaning that it does not require any annotation or external remarks about ambiguities having to be resolved by a certain associativity or precedence. If you have some grammar processor which literally follows all those productions, doing all of their reductions, then it will likely not perform well. I suspect even a LALR(1) compiled grammar will have an issue there; I'm not aware that Yacc-like parser generators do any collapsing of cascaded reductions which do nothing but { $$ = $1; }. Here is a concrete example. Attached below is over 200 lines of y.output from a Yacc implementation (Bison). In that y.output we can trace the actions of the state machine and see what happens when a the simple token LIT is reduced to an expression. In State 0, when the parser sees a LIT, it will shift the token and go to State 1. There, a reduction takes place via Rule 9, and so we have a parser_expression on the top of the stack. Then, back to State 0, where now have a primary_expression, and so we go to state 8. State 8, similarly to State 1, performs a reduction: primary expression to mult_expression. Back to State 0, where we have to dispatch a similar thing again for multi_expression, then additive_expression, then expression and finally top. Thanks for raising this issue; I suspect it's a real performance problem. Not only is a factored-out grammar hard to read but the performance is bad due to useless back and forth transitions between superfluous states to propagate reductions. Above-referenced material follows: Grammar 0 $accept: top $end 1 top: expression ';' 2 | expression ';' expression 3 expression: additive_expression 4 additive_expression: mult_expression 5 | mult_expression '+' additive_expression 6 mult_expression: primary_expression 7 | mult_expression '*' primary_expression 8 primary_expression: '(' expression ')' 9 | LIT 10 | IDENT Terminals, with rules where they appear $end (0) 0 '(' (40) 8 ')' (41) 8 '*' (42) 7 '+' (43) 5 ';' (59) 1 2 error (256) LIT (258) 9 IDENT (259) 10 Nonterminals, with rules where they appear $accept (10) on left: 0 top (11) on left: 1 2, on right: 0 expression (12) on left: 3, on right: 1 2 8 additive_expression (13) on left: 4 5, on right: 3 5 mult_expression (14) on left: 6 7, on right: 4 5 7 primary_expression (15) on left: 8 9 10, on right: 6 7 state 0 0 $accept: . top $end LIT shift, and go to state 1 IDENT shift, and go to state 2 '(' shift, and go to state 3 top go to state 4 expression go to state 5 additive_expression go to state 6 mult_expression go to state 7 primary_expression go to state 8 state 1 9 primary_expression: LIT . $default reduce using rule 9 (primary_expression) state 2 10 primary_expression: IDENT . $default reduce using rule 10 (primary_expression) state 3 8 primary_expression: '(' . expression ')' LIT shift, and go to state 1 IDENT shift, and go to state 2 '(' shift, and go to state 3 expression go to state 9 additive_expression go to state 6 mult_expression go to state 7 primary_expression go to state 8 state 4 0 $accept: top . $end $end shift, and go to state 10 state 5 1 top: expression . ';' 2 | expression . ';' expression ';' shift, and go to state 11 state 6 3 expression: additive_expression . $default reduce using rule 3 (expression) state 7 4 additive_expression: mult_expression . 5 | mult_expression . '+' additive_expression 7 mult_expression: mult_expression . '*' primary_expression '+' shift, and go to state 12 '*' shift, and go to state 13 $default reduce using rule 4 (additive_expression) state 8 6 mult_expression: primary_expression . $default reduce using rule 6 (mult_expression) state 9 8 primary_expression: '(' expression . ')' ')' shift, and go to state 14 state 10 0 $accept: top $end . $default accept state 11 1 top: expression ';' . 2 | expression ';' . expression LIT shift, and go to state 1 IDENT shift, and go to state 2 '(' shift, and go to state 3 $default reduce using rule 1 (top) expression go to state 15 additive_expression go to state 6 mult_expression go to state 7 primary_expression go to state 8 state 12 5 additive_expression: mult_expression '+' . additive_expression LIT shift, and go to state 1 IDENT shift, and go to state 2 '(' shift, and go to state 3 additive_expression go to state 16 mult_expression go to state 7 primary_expression go to state 8 state 13 7 mult_expression: mult_expression '*' . primary_expression LIT shift, and go to state 1 IDENT shift, and go to state 2 '(' shift, and go to state 3 primary_expression go to state 17 state 14 8 primary_expression: '(' expression ')' . $default reduce using rule 8 (primary_expression) state 15 2 top: expression ';' expression . $default reduce using rule 2 (top) state 16 5 additive_expression: mult_expression '+' additive_expression . $default reduce using rule 5 (additive_expression) state 17 7 mult_expression: mult_expression '*' primary_expression . $default reduce using rule 7 (mult_expression) -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca