Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!news-out.readnews.com!news-xxxfer.readnews.com!news.misty.com!news.iecc.com!lnews.iecc.com!nerds-end
From: "Ev. Drikos" <drikosev@otenet.gr>
Newsgroups: comp.compilers
Subject: Syntaxis.jar; LALR parsing for languages with unreserved keywords
Date: Wed, 2 Nov 2011 19:40:12 +0200
Organization: An OTEnet S.A. customer
Lines: 91
Sender: news@iecc.com
Approved: comp.compilers@iecc.com
Message-ID: <11-11-015@comp.compilers>
NNTP-Posting-Host: lnews.iecc.com
X-Trace: gal.iecc.com 1320288254 6065 64.57.183.34 (3 Nov 2011 02:44:14 GMT)
X-Complaints-To: abuse@iecc.com
NNTP-Posting-Date: Thu, 3 Nov 2011 02:44:14 +0000 (UTC)
Keywords: available, parse, Java
Posted-Date: 02 Nov 2011 22:44:14 EDT
X-submission-address: compilers@iecc.com
X-moderator-address: compilers-request@iecc.com
X-FAQ-and-archives: http://compilers.iecc.com
Xref: x330-a1.tempe.blueboxinc.net comp.compilers:317

Hello,

This message describes a new feature of the parser/scanner generator suite
"Syntaxis.jar".

This feature can help you parse languages with unreserved keywords using a
LALR parser and a tokenizer; you need to carry out the following steps:

1) Use the difference operator "but not" ("-=") to exclude any reserved
words from identifiers and activate the scanner generator option to report
all conflicting tokens.

2) In the parser generator activate the option
"Shift Simultaneously Conflicting Tokens".

3) Give the full path name of the document with the lexical rules.

At the end of this message there is a small elegant grammar I found in a
paper with title: "LALR parsing for languages without reserved words".

In this example, a new production for identifiers where all unreserved
keywords are listed as alternatives of the token "identifier" would
introduce a shift/reduce conflict.

With the option (2) above activated and without any grammar restatements,
the LALR builder of "Syntaxis.jar" builds a parsing table without conflicts.
Ultimately, the generated LALR parser accepts keywords as identifiers.

To give you another example, if you build an ISO SQL 2003 parser with the
technique described in this message, instead of adding a new production for
identifiers, the LALR table can be 5.84 times smaller.

Constructive feedback is welcome.

Best Regards,
Ev. Drikos



A) Syntax Rules
-----------------------------------------------------------------------
grm ::=
           program

program ::=
           BEGIN statements END

statements ::=
           statement ; statements
     |     statement ;

statement ::=
           reference = expression
     |     ASSERT expression

reference ::=
           IDENTIFIER
     |     IDENTIFIER ( expression )

expression ::=
           ( expression )
     |     reference



B) Lexical Conventions
-----------------------------------------------------------------------
#ignore spaces

token ::=
           BEGIN
     |     END
     |     ASSERT
     |     spaces
     |     IDENTIFIER

BEGIN ::=
           B E G I N

END ::=
           E N D

ASSERT ::=
           A S S E R T

spaces ::=
           { t | \n | \r | \s }...

IDENTIFIER ::=
           { A .. Z }...