Groups > comp.compilers > #189 > unrolled thread

coupling LALR with a scanner?

Started by	"Armel" <armelasselin@hotmail.com>
First post	2011-07-05 01:02 +0200
Last post	2011-09-17 10:38 -0700
Articles	16 — 6 participants

Back to article view | Back to comp.compilers

  coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-07-05 01:02 +0200
    Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-07-07 10:28 +0200
      coupling LALR with a scanner? "Karsten Nyblad" <uu3kw29sb7@snkmail.com> - 2011-07-07 10:46 +0200
      Re: coupling LALR with a scanner? "Karsten Nyblad" <uu3kw29sb7@snkmail.com> - 2011-07-08 14:39 +0200
        Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-08-04 11:17 +0200
          Re: coupling LALR with a scanner? Paul B Mann <paul@paulbmann.com> - 2011-09-13 13:38 -0700
            Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-09-16 10:47 +0200
              Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-09-19 13:52 +0200
              Re: coupling LALR with a scanner? Paul B Mann <paul@paulbmann.com> - 2011-09-19 12:12 -0700
                Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-09-20 09:40 +0200
                  Re: coupling LALR with a scanner? Chris Dodd <cdodd@acm.org> - 2011-09-23 23:59 +0100
                  Re: coupling LALR with a scanner? Chris F Clark <cfc@shell01.TheWorld.com> - 2011-09-29 00:00 -0400
                    Re: coupling LALR with a scanner? "Armel" <armelasselin@hotmail.com> - 2011-10-02 16:41 +0200
                      Re: coupling LALR with a scanner? Chris F Clark <cfc@shell01.TheWorld.com> - 2011-10-03 11:59 -0400
                    Re: coupling LALR with a scanner? glen herrmannsfeldt <gah@ugcs.caltech.edu> - 2011-10-03 21:08 +0000
            Re: coupling LALR with a scanner? Paul B Mann <paul@paulbmann.com> - 2011-09-17 10:38 -0700

#189 — coupling LALR with a scanner?

From	"Armel" <armelasselin@hotmail.com>
Date	2011-07-05 01:02 +0200
Subject	coupling LALR with a scanner?
Message-ID	<11-07-013@comp.compilers>

Hi,

I am exploring the possibility of choosing between several lexers depending
on the current state of an LALR parser (i.e. if a state can accept
productions such as A => A . 'a' b, B => . 'c', then the selected lexer will
accept (at least) 'a' and 'c', for rightmost positions the lookahead symbols
would be acceptable as well)

Is there any literature about this? or example? I searched around for
scannerless/lexerless parsing but could not make my mind.

Regards
Armel Asselin
[The usual approach is to set flex start states in your yacc or bison parser.
-John]

[toc] | [next] | [standalone]

#191

From	"Armel" <armelasselin@hotmail.com>
Date	2011-07-07 10:28 +0200
Message-ID	<11-07-015@comp.compilers>
In reply to	#189

>[The usual approach is to set flex start states in your yacc or bison
>parser. >-John]

the LALR generator is one of mine and the idea here would be to select
automatically the right lexer from the currently accepted tokens. I have the
feeling that it must be doable. The target is to be able to write grammars
with such dependencies naturally without any (user level) grammar actions
and very minimal lexer meta-information (such as which lexer produces which
tokens).

Armel
[I think you will find that users hate a compiler where in each state the
lexer only recognizes the tokens valid in that state, since the only error
message it could produce is "invalid token."  Also remember that the LA in
LALR stands for Look Ahead, and the lexer is often one token ahead of the
parser. -John]

[toc] | [prev] | [next] | [standalone]

#192

From	"Karsten Nyblad" <uu3kw29sb7@snkmail.com>
Date	2011-07-07 10:46 +0200
Message-ID	<11-07-016@comp.compilers>
In reply to	#191

>I am exploring the possibility of choosing between several lexers
>depending on the current state of an LALR parser (i.e. if a state can
>accept productions such as A => A . 'a' b, B => . 'c', then the
>selected lexer will accept (at least) 'a' and 'c', for rightmost
>positions the lookahead symbols would be acceptable as well)

The problem with using an LALR parser for that is that the state is not
enough information to know which token are accepted.  An example:

A -> a B a
A -> b B b
B -> a

( A and B being non terminals and a and b being terminals.)
In an LALR parser you will get a state with just one item [ B -> a . ]
with a lookahead set of { a, b }.  However, just one of a and b is
acceptable.

There are three ways to get around that problem:

First you can use an LR parser instead.  Some parser generators
generate an LALR parser and use state splitting for solving conflicts
that come from LALR parsers being less powerful than LR parser.  Such
generators can easily be modified to split states in this case too, in
practise generating an LR parser.

Secondly you can change the parser driver program, i.e., the program
that is generated by the parser generator.  If it looks up information
in tables, it is easy to let the driver program, such that it finds out
which tokens are acceptable before asking the scanner for a token.  You
test whether or not a token is acceptable by copying the parse stack,
place the token being tested in the window, and then parse until the
token is pushed on the stack or an error occurs.  (Of course the first
indicates that the token is acceptable, and the second that it is not.)

Thirdly you can use a parser generator like Bison, that implements
general LR parsing.  General LR parser generators work on multiple
stacks.  You change the lexer such that it returns a set of tokens, a
token for each way the next tokens may be lexed.  Say the lexed
returns three tokens.  Then the driver program makes three copies of
the stacks, and parses on with each token being applied to its own
stack.

At last: You might get better answers if you are more detailed with
which problem you are trying to solve.  I have been interested in such
techniques for parsing languages like PL/I where keywords may be used
as identifiers, Ada where a single quote may be a token itself or the
start of a character constant, and BCPL where line ends may end
statements.  The best solution is not necessarily the same in all the
cases.

[toc] | [prev] | [next] | [standalone]

#194

From	"Karsten Nyblad" <uu3kw29sb7@snkmail.com>
Date	2011-07-08 14:39 +0200
Message-ID	<11-07-018@comp.compilers>
In reply to	#191

>the LALR generator is one of mine and the idea here would be to select
>automatically the right lexer from the currently accepted tokens. I
>have the feeling that it must be doable. The target is to be able to
>write grammars with such dependencies naturally without any (user
>level) grammar actions and very minimal lexer meta-information (such as
>which lexer produces which tokens).

You have not written if your generator supports LALR(k) or just LALR(1).
In the later case my suggestion is that you change your generator to an
LR(1) generator.  Not only will you get an easy implementation of what
you want.  You will also catch syntax errors right after the offending
token is put in the window, such that you can get better error reporting
and recovery.  Do not be afraid the parsers will be too big.  That would
have been the case 25 years ago, but not for the last 10 years.

If you don't to go that way, then do as I suggested in my previous
posting:  Say you have a parse stack W and a terminal been pushed on the
stack.  Now you want to know if a terminal t is acceptable.  Then you
copy W onto a new stack W' and use W' to parse with t in the window.  If
t is stacked, then t is acceptable.  My guess is that you will find this
approach will make the parsers a bit slow.

The later approach is so simple to implement, that I think somebody
would have put it into some parser generator if it was a facility, that
would scanning and parsing significantly easier.  To me it looks like
you have an idea for a solution with no problem.

[toc] | [prev] | [next] | [standalone]

#221

From	"Armel" <armelasselin@hotmail.com>
Date	2011-08-04 11:17 +0200
Message-ID	<11-08-004@comp.compilers>
In reply to	#194

"Karsten Nyblad"  a icrit dans le message de groupe de discussion :
> You have not written if your generator supports LALR(k) or just LALR(1).
just LALR(1) or GLALR(1)

>In the later case my suggestion is that you change your generator to an
>LR(1) generator.
I will soon do LR(1) or something in-between (e.g. IELR(1))

From what I could read in the work a Joel Denny about PSLR (pseudo
scannerless LR) the approach is valid. It does not seem really hard to make
something similar.

Regards
Armel

[toc] | [prev] | [next] | [standalone]

#266

From	Paul B Mann <paul@paulbmann.com>
Date	2011-09-13 13:38 -0700
Message-ID	<11-09-016@comp.compilers>
In reply to	#221

/* Test1 grammar (like the one above). */

   G -> A <eof>
   A -> a B c
      -> a B d
   B -> b

Test1 grammar is LR(0) and requires NO lookahead symbols.
By default, b always reduces to B.

/* Test2 grammar. */

   G -> A <eof>
   A -> a B1 c
      -> a B2 d
   B1 -> b
   B2 -> b

Test2 grammar is LALR(1) and probably SLR(1) also.  The parser
requires one symbol of lookahead to know which reduction to make (B1
or B2).

/* Test3 grammar. */

   G -> A <eof>
   A -> a B1 c
      -> a B2 d
      -> a B1 d
      -> a B2 c
   B1 -> b
   B2 -> b

Test3 grammar is NOT LALR(1), but it is LR(1).  The parser requires
one symbol of lookahead to know which reduction to make (B1 or B2).

Test3 is a highly contrived situation, not usually seen in real-world
computer languages (as far as I know).  Test3 could be written as
Test2.

It seems that LALR(1) parsers are more powerful than some people
realize.  LR(1) parser tables can be huge.  A COBOL-85 grammar created
an LR(1) parser with over 2,000,000 states, whereas a IELR(1) parser
for that grammar created about 1668 states.  Bison can create IELR(1)
parsers. HYACC can also create IELR(1) parsers.

I don't understand why you want to have a different scanner for each
state.  The parser can easily make the decision, whether a token is
valid.  In fact, an LALR parser already has this information in the
parser tables, so why make a simple situation complicated?

Also in case of a syntax error, the parser can tell the user what
tokens were expected at the error point.  So there is no need for
different lexers, unless the syntax of the tokens changes from one
state to the next and this could be confusing to the the user of your
language.

Paul B Mann

[toc] | [prev] | [next] | [standalone]

#267

From	"Armel" <armelasselin@hotmail.com>
Date	2011-09-16 10:47 +0200
Message-ID	<11-09-017@comp.compilers>
In reply to	#266

>I don't understand why you want to have a different scanner for each
>state.  The parser can easily make the decision, whether a token is
>valid.  In fact, an LALR parser already has this information in the
>parser tables, so why make a simple situation complicated?

IELR was exactly made for that reason, as a first step to PSLR: some
grammars have no 'tokens' and 'grammar rules', they just have a 'grammar'
where mutually exclusive tokens are present, e.g. you cannot make a
Javascript single lexer as there are state where / (slash) means 'start of
regular expression' (of course the content of the regular expression follows
totally different lexing rules than the rest of the text) whereas in other
states it means 'division' operator. If your parser cannot tell which of the
two lexers to use, you are off.

> this could be confusing to the the user of your language.
people like Perl and Javascript, so we have to make parsers for those
languages :)

by the way I'd be open to use IELR, but I have to read the paper again
because it's far from being easy to understand and implement...

Armel
[This seems like an awfully complicated way to reinvent scanner start
states. -John]

[toc] | [prev] | [next] | [standalone]

#271

From	"Armel" <armelasselin@hotmail.com>
Date	2011-09-19 13:52 +0200
Message-ID	<11-09-021@comp.compilers>
In reply to	#267

>[This seems like an awfully complicated way to reinvent scanner start
>states. -John]

In Fact, This Is The Same. there is a big difference in usage with start
states guided through Bison/Yacc explicit actions: the scanner knows by
itself which start state to use AND can tell you up front that what you
expect to do is impossible. when two mutually exclusive tokens happen in the
same parser state e.g.

A => expr "/" expr
       | expr REGEXPR expr        // imagining its possible in your language

B => expr "/" expr
expr => number | REGEXPR

REGEXPR = /[^/]*/

In case A the parser generator would issue by itself an error, in case B, it
would set automatically the lexer start states appropriately.
In both case it has a tremendous advantage: any developer can enter the
grammar, compile and have something working (or know why it won't work),
s/he does not need any more to be a scanner/parser guru.

By the way, nothing prevents start states with exclusive tokens from
being stuffed with all the other non-problematic tokens, so as to give
meaningful messages when an unexpected token happens.

Regards
Armel

[toc] | [prev] | [next] | [standalone]

#272

From	Paul B Mann <paul@paulbmann.com>
Date	2011-09-19 12:12 -0700
Message-ID	<11-09-022@comp.compilers>
In reply to	#267

> IELR was exactly made for that reason, as a first step to PSLR: some
> grammars have no 'tokens' and 'grammar rules', they just have a
> 'grammar' where mutually exclusive tokens are present, e.g. you
> cannot make a Javascript single lexer as there are state where /
> (slash) means 'start of regular expression' (of course the content
> of the regular expression follows totally different lexing rules
> than the rest of the text) whereas in other states it means
> 'division' operator. If your parser cannot tell which of the two
> lexers to use, you are off.

IELR(1) does not solve this problem.  It solves the problem of reduce-
reduce conflicts in an LR(0) state machine where a state has multiple
"lookback nonterminal transitions" which cause conflicts.  IELR(1)
does state splitting (or duplicating) in order to remove the conflicts
(if possible).  It has nothing to do with how many scanners you may
need. It only pertains to the language defined by the grammar.  It
looks like you have two languages or two lexical languages (and need
two scanners).

To show that IELR(1) solves this problem you would have to have an
LALR(1) grammar that has reduce-reduce conflicts caused by the need of
two different lexers (scanners) and show that IELR(1) removes the
conflicts.

What you described with Javascript is a little language inside of
another language.  Why not switch parsers when a slash ('/') is
encountered if necessary?

Parsing expression grammars are more appropriate for this Javascript
problem, if you don't want to hand-code a "fix" within an LALR parser.

A similar situation occurs when parsing C.  You encounter a "#if ..."
while parsing C.  At the '#' you must leave the C parser and enter the
C- preprocessor parser.  After the conditional statement has been
parsed, you must switch back to the C parser.

/ Paul Mann

[toc] | [prev] | [next] | [standalone]

#273

From	"Armel" <armelasselin@hotmail.com>
Date	2011-09-20 09:40 +0200
Message-ID	<11-09-023@comp.compilers>
In reply to	#272

"Paul B Mann"  a icrit dans le message de groupe de discussion :
>> IELR was exactly made for that reason, as a first step to PSLR: some
>IELR(1) does not solve this problem.  It solves the problem of reduce-
>reduce conflicts in an LR(0) [...]

OK I did not sentenced that right, this is PSLR which does all the
lexer/parser integration (hence Pseudo Scannerless LR).

>What you described with Javascript is a little language inside of
>another language.  Why not switch parsers when a slash ('/') is
>encountered if necessary?

The problem is that / is used as a normal token AND as the start for regular
expression.
For example these are two valid sequences:
var regexpr = / b + a /
                    2;
var numeric_var = 1 / b + a /
                    2;

The 2; on the following lines will have two different meanings, in case
'regexpr', it is an instruction setting the expression value (as used in
eval() ), in the case second case, 2 is the second operand to a / operator.

I insist a bit on this to show there is no 'right side' heuristic to
determine / b + a / is a regular expression. On the other hand the '1'
expression could be as complex as you can imagine.

Of course in the presence of something like expr / expr the parser knows the
/ is for a division, and if the / is found where expr is expected it knows
there is a regular expression coming.
I don't see how I could hack this into a scanner only. maybe some specific
preceding tokens. I have pass again some time on that question.

Armel

[toc] | [prev] | [next] | [standalone]

#279

From	Chris Dodd <cdodd@acm.org>
Date	2011-09-23 23:59 +0100
Message-ID	<11-09-029@comp.compilers>
In reply to	#273

"Armel" <armelasselin@hotmail.com> wrote in news:11-09-023@comp.compilers:
> Of course in the presence of something like expr / expr the parser knows
> the / is for a division, and if the / is found where expr is expected it
> knows there is a regular expression coming.
> I don't see how I could hack this into a scanner only. maybe some
> specific preceding tokens. I have pass again some time on that question.

The usual way to do this in a flex+bison parser is with a production like:

expr: { beginREGEXP(); } '/' regexp { endREGEXP(); } '/'

where beginREGEXP and endREGEXP are functions in the flex file that set
the start-state into and out of regexp mode.  Note that the actions need
to be BEFORE the parser shifts the '/' token, as an LALR(1) parser may look
ahead one token.  Which means that a '/' must be a '/' regardless of the
scanner state.

Note that if you allow additional lookahead in the parser (LR(k) or LR(*) or
any kind of backtracking) this will break badly.

Chris Dodd
cdodd@acm.org

[toc] | [prev] | [next] | [standalone]

#285

From	Chris F Clark <cfc@shell01.TheWorld.com>
Date	2011-09-29 00:00 -0400
Message-ID	<11-10-003@comp.compilers>
In reply to	#273

I have come to this discussion a little late and may have missed some
of the conversation, so excuse me if I say some things which are
obvious and have been discussed before.

There are classes of languages where you can use the "parser" context
to sufficiently disambiguate which characters should be grouped into
tokens.  Thus, the class of scannerless LR parsers make sense and
there are numerous papers discussing them.  It does in general make
the front end writer's life simpler in that the programmer doesn't
have to manually distinguish the lexical grammar from the parsing
grammar.  Moreover, several conventions like automatically dealing
with whitespace and comments can make this set much larger and much
more useful.

It might even be possible to argue that the scannerless technology
might result in better error messages because a lexical error is
promoted to a parsing error and more context can be used to display
the correct possibilites (if one assumes that the parser can generate
intelligible error messages to begin with, which is often a dubious
claim for most automated parsing systems).

However, and here I think we come to the crux of the discussion, can a
scannerless system disambiguate the potentially ambiguous use of / as
a division operator and the start of a regular expression?  I would
argue that if the system is LR(k) for some finite k and the use of
regular expressions is not restricted to some narrow (left, preceding)
context (e.g. as in Perl where regular expressions are part of only
specific operations like m/regex/ and ~=/regex/ (forgive me if my Perl
is a bit rusty)), then the answer is no.

The reason why not is simple, at some point one has to decide whether
to tokenize a / as a division operator or to use it as introducing a
regular expression.  With an LR(k) parser, one must make the decision
after at most k tokens, but a regular expression can be arbitrarily
long, thus one can "easily" formulate a regular expression which is
ambiguous with only k tokens of lookahead.  Note, I'm positing that
tokenization is equivalent to a reduction.  However, I think you will
find that that is generally true.

Note a scannerless system based upon a more general language class
(e.g. Earley parsing) could use an arbitrary amount of right,
succeeding context to disambiguate the tokenization and avoid this
problem.  However, the runtime cost of doing so could easily become
prohibitive.

Alternately, one could design the language to restrict regular
expressions to certain contexts, as I believe Perl does, and thus make
the cases where one is looking for them unambiguous or atleast much
simpler.  (To me this would be the hallmark of a good language design,
as ambiguity often confuses the programmer as well as the compilers,
but that is a digression.)  If the language design has the
characteristics that regular expressions are restricted to certain
well-defined contexts, the ones where lexer start states in a grammar
would work, then there is no specific reason why a scannerless system
could not find and insert those start states automatically.  In
theory, then a non-compiler writing guru could use such a tool to work
on the grammar without being aware of what is going on under the
covers.

However, my experience suggests the result will be far less sanguine.
Not only would such a tool that further abstracts the grammar writer
from the details of the process not really enable non-gurus to
successfully write grammars for such subtle cases, but it will lull
them into falsely believing that they can write grammars without
learning the underlying theory and then when the tool produces an
error message they will be so much more confused as to what works and
what doesn't that the result will become even more of a black-art with
magic incantations and spells that are recited without understanding
on the belief that such superstitions enable one to write correct
grammars.  Despite having written compilers for ~40 years now, I still
find myself falling back on overly simplified interpretations on why
certain things work and others don't.  To wit the hand-waving
explanation I offered for context sensitivity above.  Although my
intuition about context sensitivity is slightly more subtle than what
I expressed, it is hard to explain rigorously or precisely.

Of course, I don't expect my annecdotal evidence to be convincing so I
submit how well C++ template errors are handled.  How many people do
you know who can write and really understand the creation of
non-trivial C++ templates?  I would argue that grammar writing is of
roughly the same complexity as template writing.

Note, I think this also explains the popularity of LL parsing, as the
theory of what works in LL is much simpler and manageable because it
is so much more restrictive than LR parsing while still being adequate
(once one accepts the if-then-else trick) to handle most if not all
programming languages.  It does take more effort to write such
grammars, but novices can quickly write simple grammars in them.  In
contrast, many users simply give up when trying to resolve reduce-
reduce conflicts (or even shift-reduce ones).

That is not to say that scannerless parser generators are a bad idea.
It is simply to disabuse the notion that it will instantly enable less
skilled writers to more successfully develop subtle and complicated
grammars, such as those needed to handle ambiguous tokenizations.
Over time as we get experienced using them, possibly yes, but
out-of-the-box no.

Moreover, I also think that understanding the value of a separate
lexer and parser is a good thing.  The implementation gap from
divorcing those two aspects actually has utility.  In fact, as I have
been studying "high speed regular expression processing", I have
determined that even within a regular expression processor (e.g. a
lexer) there is an argument for introducing such a gap to "segment"
tokens into smaller chunks for both performance and expressibility
reasons.  At the same time, I think that the segmentation process can
be automated, at least in many cases.

Hope this helps,
-Chris

******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris

[toc] | [prev] | [next] | [standalone]

#287

From	"Armel" <armelasselin@hotmail.com>
Date	2011-10-02 16:41 +0200
Message-ID	<11-10-005@comp.compilers>
In reply to	#285

> there is an argument for introducing such a gap to "segment"
> tokens into smaller chunks for both performance and expressibility
> reasons.

could you elaborate on this segmentation mechanism?

In my lexer generator, the developer can introduce start states by himself
and 'cut' complex expressions into smaller expressions which still respect
AFD capabilities and introduce dynamic regular expressions where absolutely
necessary, for languages allowing dynamic string delimiters for example
(e.g. doc-strings like << END_OF_STR_MARKER, some lines then a line with
END_OF_STR_MARKER only on the line), languages such as ruby are very funny
from that point of view if I remember well.

Regards
Armel

[toc] | [prev] | [next] | [standalone]

#288

From	Chris F Clark <cfc@shell01.TheWorld.com>
Date	2011-10-03 11:59 -0400
Message-ID	<11-10-006@comp.compilers>
In reply to	#287

"Armel" <armelasselin@hotmail.com> writes:

>> there is an argument for introducing such a gap to "segment"
>> tokens into smaller chunks for both performance and expressibility
>> reasons.
>
> could you elaborate on this segmentation mechanism?

I would like to, but the idea is going through the patent process, as
it has hardware design implications.  Once I know that the patent has
been filed, or get other ok from Intel legal, I will write the idea
up.

> In my lexer generator, the developer can introduce start states by himself
> and 'cut' complex expressions into smaller expressions which still respect
> AFD capabilities and introduce dynamic regular expressions where absolutely
> necessary, for languages allowing dynamic string delimiters for example
> (e.g. doc-strings like << END_OF_STR_MARKER, some lines then a line with
> END_OF_STR_MARKER only on the line), languages such as ruby are very funny
> from that point of view if I remember well.

Those sound like useful extensions.  In the hands of an experienced
parser writer, that could make the process much easier.  Let the
system figure out what it can, then adjust where you must.

[toc] | [prev] | [next] | [standalone]

#290

From	glen herrmannsfeldt <gah@ugcs.caltech.edu>
Date	2011-10-03 21:08 +0000
Message-ID	<11-10-008@comp.compilers>
In reply to	#285

Chris F Clark <cfc@shell01.theworld.com> wrote:
> I have come to this discussion a little late and may have missed some
> of the conversation, so excuse me if I say some things which are
> obvious and have been discussed before.

(snip)
> However, and here I think we come to the crux of the discussion, can a
> scannerless system disambiguate the potentially ambiguous use of / as
> a division operator and the start of a regular expression?  I would
> argue that if the system is LR(k) for some finite k and the use of
> regular expressions is not restricted to some narrow (left, preceding)
> context (e.g. as in Perl where regular expressions are part of only
> specific operations like m/regex/ and ~=/regex/ (forgive me if my Perl
> is a bit rusty)), then the answer is no.

I could be completely missing the point, but it seems to me that you
don't have to restrict such RE to a narrow context as long as the
context isn't one where a division operator is allowed.

Division usually requires a preceding operand. In the contexts I would
expect a regular expression it would be either preceded by an operator
or be the beginning of an expression.

In writing AWK programs, where there are functions that take RE as
operands, I normally put then in quotes (") instead of slashes, as it
seems more obvious in the context.  (A variable with the appropriate
value also works.)

-- glen

[toc] | [prev] | [next] | [standalone]

#269

From	Paul B Mann <paul@paulbmann.com>
Date	2011-09-17 10:38 -0700
Message-ID	<11-09-019@comp.compilers>
In reply to	#266

> /* Test3 grammar. */
>
>    G -> A <eof>
>    A -> a B1 c
>       -> a B2 d
>       -> a B1 d
>       -> a B2 c
>    B1 -> b
>    B2 -> b
>

I made a mistake, sorry.  This grammar should be:

/* Test3 grammar. */

   G -> A <eof>
   A -> a B1 c
      -> a B2 d
      -> x B1 d
      -> x B2 c
   B1 -> b
   B2 -> b

Paul B Mann

[toc] | [prev] | [standalone]

csiph-web

coupling LALR with a scanner?

Contents

#189 — coupling LALR with a scanner?

#191

#192

#194

#221

#266

#267

#271

#272

#273

#279

#285

#287

#288

#290

#269