Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.snarked.org!news.linkpendium.com!news.linkpendium.com!news.iecc.com!nerds-end From: "Ev. Drikos" Newsgroups: comp.compilers Subject: a java parser generator Date: Fri, 29 Apr 2011 13:37:45 +0300 Organization: An OTEnet S.A. customer Lines: 110 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-05-005@comp.compilers> NNTP-Posting-Host: news.iecc.com X-Trace: gal.iecc.com 1304311838 44649 64.57.183.58 (2 May 2011 04:50:38 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Mon, 2 May 2011 04:50:38 +0000 (UTC) Keywords: Java, parse, tools Posted-Date: 02 May 2011 00:50:38 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:114 Hi, The "Syntaxis.jar" parser/scanner generator suite has a new LR generator that creates java packed parsers. The basic features of such parsers/scanners are listed at the end of this message along with two small examples that can help you figure out how the LR Builders save some table space. The LR generator uses ranges to pack the shift/reduce table. This generator supports also "bison" compatible references to semantic values; thanks to a small partial java parser created by this generator. All features described are optional. I don't claim to be an expert in the topic and constructive feedback is welcome. The demo version of the program, "SyntaxisDemo.jar", is available for evaluation, on request. Best Regards, Ev. Drikos -------------------------------------------------------------------------------- Parser Features: -Target Language: Java. -Suitable for parsing small strings. -Algorithms: Optimized (or standard) SLR, LALR, and Canonical LR. -Pure BNF. -Scannerless Mode (Note: the scanner generator can export DFA's as Left Recursive LALR(1) grammars; backtracking isn't supported). -Ranges, Difference, Intersection, Follows Restrictions for terminals. -Spacing/Comments: In scannerless grammars as preprocessor definitions. -Panic Mode Recovery. -Elimination of Single Reductions without Semantic Actions. -Modifiable; it can be subclassed. -Conflicts resolution for expressions; optionally, the "yacc" approach. Features not supported: -Mid Action Rules. -Union Data Types. -Precedence and Associatively Declarations. Scanner Features: -Embedded; it supports Intersection/Difference of regular grammars. -Tokens Numbering; Manual/Automatic Synchronization with the parser. -Return Statements; Optional. (skip comments/spaces declarations) -The Lookahead operator. -Optionally, the scanner can return multiple/alternative tokens for the same portion of the input. The parser examines such alternative tokens on failure. Below are two grammar examples from the book: "Compilers, Principles Techniques, and Tools". 2) Table Sizes for an LR(1) grammar; example 4.44. 2.1) The Syntax Rules. ::= S S ::= a A d | b B d | a B e | b A e A ::= c B ::= c 2.2) The Size of Parsing Tables. 2.2.1) Standard algorithm; States/Conflicts: SLR: 14/1, LALR: 14/1, and (cross checked with bison). CLR: 15/0. 2.2.2) Eliminate Single Reductions; States/Conflicts: SLR: 10/0, LALR: 10/0, and CLR: 10/0. 2.2.3) Optimize (Resolves Conflicts among others); States/Conflicts: SLR: 14/1, LALR: 15/0, and CLR: 15/0. 2.2.4) Eliminate Single + Override Equivalent Last Items: SLR: 7/0, LALR: 7/0, and CLR: 7/0. Note: The first two alternatives of "S" have equal number of symbols, equal last items, and don't have different semantic actions, thus: The dotted item S-> b .B d on "B" goes to S->a A .d Otherwise, the dotted item S-> b B .d on "d" could go to S->a A d. If you know that this technique has a different name, please tell me. 2.2.5) Eliminate Single + Override + Optimize: SLR: 7/0, LALR: 7/0, and CLR: 7/0. 3) The grammar of a basic Calculator. It is the equivalent of the example 4.52 (Fig. 4.59). It has 17 states. 3.1) The Syntax Rules. grm ::= lines lines ::= lines expr \n /* printf("%g\n" , $2 ); */ | lines \n | NONE | ERROR \n /* printf( "reenter last line:\n") ; error=false; */ expr ::= expr <-|+> expr /* if (*str(2)=='+') $0 += $3 ; else $0 -= $3;*/ |expr <*|/> expr /* if (*str(2)=='*') $0 *= $3 ; else $0 /= $3;*/ | factor <-|+> ::= - | + <*|/> ::= *| / factor ::= ( expr ) /* $$ = $2 ; */ | NUMBER | - factor /* $$ = - $2 ; */ 3.2) The Lexical Rules. (The declaration "#ignore space" instructs the scanner to skip spaces.) #ignore space token ::= NUMBER /* yylval = atof( str(1) ) ; */ | space NUMBER ::= { [ { 0 .. 9 }... ] [ { . } { 0 .. 9 }... ] } -= { NONE } space ::= \t | \s| \r