Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #317

Syntaxis.jar; LALR parsing for languages with unreserved keywords

Path csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!nx02.iad01.newshosting.com!newshosting.com!news-out.readnews.com!news-xxxfer.readnews.com!news.misty.com!news.iecc.com!lnews.iecc.com!nerds-end
From "Ev. Drikos" <drikosev@otenet.gr>
Newsgroups comp.compilers
Subject Syntaxis.jar; LALR parsing for languages with unreserved keywords
Date Wed, 2 Nov 2011 19:40:12 +0200
Organization An OTEnet S.A. customer
Lines 91
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <11-11-015@comp.compilers> (permalink)
NNTP-Posting-Host lnews.iecc.com
X-Trace gal.iecc.com 1320288254 6065 64.57.183.34 (3 Nov 2011 02:44:14 GMT)
X-Complaints-To abuse@iecc.com
NNTP-Posting-Date Thu, 3 Nov 2011 02:44:14 +0000 (UTC)
Keywords available, parse, Java
Posted-Date 02 Nov 2011 22:44:14 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
Xref x330-a1.tempe.blueboxinc.net comp.compilers:317

Show key headers only | View raw


Hello,

This message describes a new feature of the parser/scanner generator suite
"Syntaxis.jar".

This feature can help you parse languages with unreserved keywords using a
LALR parser and a tokenizer; you need to carry out the following steps:

1) Use the difference operator "but not" ("-=") to exclude any reserved
words from identifiers and activate the scanner generator option to report
all conflicting tokens.

2) In the parser generator activate the option
"Shift Simultaneously Conflicting Tokens".

3) Give the full path name of the document with the lexical rules.

At the end of this message there is a small elegant grammar I found in a
paper with title: "LALR parsing for languages without reserved words".

In this example, a new production for identifiers where all unreserved
keywords are listed as alternatives of the token "identifier" would
introduce a shift/reduce conflict.

With the option (2) above activated and without any grammar restatements,
the LALR builder of "Syntaxis.jar" builds a parsing table without conflicts.
Ultimately, the generated LALR parser accepts keywords as identifiers.

To give you another example, if you build an ISO SQL 2003 parser with the
technique described in this message, instead of adding a new production for
identifiers, the LALR table can be 5.84 times smaller.

Constructive feedback is welcome.

Best Regards,
Ev. Drikos



A) Syntax Rules
-----------------------------------------------------------------------
grm ::=
           program

program ::=
           BEGIN statements END

statements ::=
           statement ; statements
     |     statement ;

statement ::=
           reference = expression
     |     ASSERT expression

reference ::=
           IDENTIFIER
     |     IDENTIFIER ( expression )

expression ::=
           ( expression )
     |     reference



B) Lexical Conventions
-----------------------------------------------------------------------
#ignore spaces

token ::=
           BEGIN
     |     END
     |     ASSERT
     |     spaces
     |     IDENTIFIER

BEGIN ::=
           B E G I N

END ::=
           E N D

ASSERT ::=
           A S S E R T

spaces ::=
           { t | \n | \r | \s }...

IDENTIFIER ::=
           { A .. Z }...

Back to comp.compilers | Previous | Next | Find similar


Thread

Syntaxis.jar; LALR parsing for languages with unreserved keywords "Ev. Drikos" <drikosev@otenet.gr> - 2011-11-02 19:40 +0200

csiph-web