Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!news-out.readnews.com!news-xxxfer.readnews.com!news.misty.com!news.iecc.com!lnews.iecc.com!nerds-end From: "Ev. Drikos" Newsgroups: comp.compilers Subject: Syntaxis.jar; LALR parsing for languages with unreserved keywords Date: Wed, 2 Nov 2011 19:40:12 +0200 Organization: An OTEnet S.A. customer Lines: 91 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <11-11-015@comp.compilers> NNTP-Posting-Host: lnews.iecc.com X-Trace: gal.iecc.com 1320288254 6065 64.57.183.34 (3 Nov 2011 02:44:14 GMT) X-Complaints-To: abuse@iecc.com NNTP-Posting-Date: Thu, 3 Nov 2011 02:44:14 +0000 (UTC) Keywords: available, parse, Java Posted-Date: 02 Nov 2011 22:44:14 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: x330-a1.tempe.blueboxinc.net comp.compilers:317 Hello, This message describes a new feature of the parser/scanner generator suite "Syntaxis.jar". This feature can help you parse languages with unreserved keywords using a LALR parser and a tokenizer; you need to carry out the following steps: 1) Use the difference operator "but not" ("-=") to exclude any reserved words from identifiers and activate the scanner generator option to report all conflicting tokens. 2) In the parser generator activate the option "Shift Simultaneously Conflicting Tokens". 3) Give the full path name of the document with the lexical rules. At the end of this message there is a small elegant grammar I found in a paper with title: "LALR parsing for languages without reserved words". In this example, a new production for identifiers where all unreserved keywords are listed as alternatives of the token "identifier" would introduce a shift/reduce conflict. With the option (2) above activated and without any grammar restatements, the LALR builder of "Syntaxis.jar" builds a parsing table without conflicts. Ultimately, the generated LALR parser accepts keywords as identifiers. To give you another example, if you build an ISO SQL 2003 parser with the technique described in this message, instead of adding a new production for identifiers, the LALR table can be 5.84 times smaller. Constructive feedback is welcome. Best Regards, Ev. Drikos A) Syntax Rules ----------------------------------------------------------------------- grm ::= program program ::= BEGIN statements END statements ::= statement ; statements | statement ; statement ::= reference = expression | ASSERT expression reference ::= IDENTIFIER | IDENTIFIER ( expression ) expression ::= ( expression ) | reference B) Lexical Conventions ----------------------------------------------------------------------- #ignore spaces token ::= BEGIN | END | ASSERT | spaces | IDENTIFIER BEGIN ::= B E G I N END ::= E N D ASSERT ::= A S S E R T spaces ::= { t | \n | \r | \s }... IDENTIFIER ::= { A .. Z }...