Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2459
| From | Kaz Kylheku <493-878-3164@kylheku.com> |
|---|---|
| Newsgroups | comp.compilers |
| Subject | Re: Languages with optional spaces |
| Date | 2020-02-26 08:06 +0000 |
| Organization | Aioe.org NNTP Server |
| Message-ID | <20-02-021@comp.compilers> (permalink) |
| References | <20-02-015@comp.compilers> |
On 2020-02-19, Maury Markowitz <maury.markowitz@gmail.com> wrote:
> I'm trying to write a lex/yacc (flex/bison) interpreter for classic BASICs
> like the original DEC/MS, HP/DG etc. I have it mostly working for a good chunk
> of 101 BASIC Games (DEF FN is the last feature to add).
>
> Then I got to Super Star Trek. To save memory, SST removes most spaces, so
> lines look like this:
>
> 100FORI=1TO10
>
> Here's my current patterns that match bits of this line:
>
> FOR { return FOR; }
>
> [:,;()\^=+\-*/\<\>] { return yytext[0]; }
>
> [0-9]*[0-9.][0-9]*([Ee][-+]?[0-9]+)? {
> yylval.d = atof(yytext);
> return NUMBER;
> }
>
> "FN"?[A-Za-z@][A-Za-z0-9_]*[\$%\!#]? {
> yylval.s = g_string_new(yytext);
> return IDENTIFIER;
> }
>
> These correctly pick out some parts, numbers and = for instance, so it sees:
>
> 100 FORI = 1 TO 10
>
> The problem is that FORI part. Some BASICs allow variable names with more than
> two characters, so in theory, FORI could be a variable. These BASICs outlaw
> that in their parsers; any string that starts with a keyword exits then, so
> this would always parse as FOR. In lex, FORI is longer than FOR, so it returns
> a variable token called FORI.
>
> Is there a way to represent this in lex? Over on Stack Overflow the only
> suggestion seemed to be to use trailing syntax on the keywords, but that
> appears to require modifying every one of simple patterns for keywords with
> some extra (and ugly) syntax. Likewise, one might modify the variable name
> pattern, but I'm not sure how one says "everything that doesn't start with one
> of these other 110 patterns".
Two ideas:
1. Just forget recognizing variable names in the lexer. Instead,
recognize only the constituent letter of a variable name in the lexer.
Then in the parser, have a grammar production which converts
the letters of a variable into a variable.
variable : VARCHAR
| variable VARCHAR
;
2. Use regex patterns in the lexer to recognize just the keywords,
as a above. Then, recognition of variable names is handled by
matching just one letter A-Z, whose lex action performs ad-hoc
lexical analysis using C logic. At that point you know that you do not
have a keyword, because no keyword rule matched. You can read
characters using YYIN and accumulate a variable name.
A variant of technique (2) is used for scanning C comments,
as an alternative to an ugly regular expression:
"/*" {
int c;
while ((c = yyinput()) != 0)
{
if (c == '\n') {
/* increment line number or something */
}
else if (c == '*')
{
if ((c = yyinput()) == '/')
break;
else
unput(c);
}
}
}
The above is an adaptation of something from an old Flex manual.
IIRC the Dragon Book has a similar example of ad-hoc logic
in a lex rule for handling C comments.
You can see that it's a similar idea. We use a regex to partially match
the comment, just the /* opening. Then we take over from there.
I have a hunch this would work for fetching variables like FORI, when
there is no match on a keyword like FOR.
--
TXR Programming Lanuage: http://nongnu.org/txr
Music DIY Mailing List: http://www.kylheku.com/diy
ADA MP-1 Mailing List: http://www.kylheku.com/mp1
Back to comp.compilers | Previous | Next — Previous in thread | Next in thread | Find similar
Languages with optional spaces Maury Markowitz <maury.markowitz@gmail.com> - 2020-02-19 07:35 -0800
Re: Languages with optional spaces Jerry <awanderin@gmail.com> - 2020-02-20 23:38 -0700
Re: Languages with optional spaces Maury Markowitz <maury.markowitz@gmail.com> - 2020-02-25 06:13 -0800
Re: Languages with optional spaces awanderin <awanderin@gmail.com> - 2020-02-26 10:03 -0700
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-03-12 17:45 +0200
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-02-23 12:33 +0200
Re: Languages with optional spaces Martin Ward <martin@gkc.org.uk> - 2020-02-25 17:00 +0000
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-02-28 13:34 +0200
Re: Languages with optional spaces Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-02-29 11:48 +0200
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-02-29 21:38 +0200
Re: Languages with optional spaces Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-03-01 10:07 +0200
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-03-01 19:41 +0200
Re: Languages with optional spaces Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2020-03-02 08:33 +0200
Re: Languages with optional spaces "Ev. Drikos" <drikosev@gmail.com> - 2020-03-02 20:04 +0200
Re: Languages with optional spaces Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-03-01 00:28 +0100
Re: Languages with optional spaces Maury Markowitz <maury.markowitz@gmail.com> - 2020-02-25 06:11 -0800
Re: Languages with optional spaces Kaz Kylheku <493-878-3164@kylheku.com> - 2020-02-26 08:06 +0000
Re: Languages with optional spaces and tools Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2020-02-28 20:16 +0100
Re: Languages with optional spaces gah4@u.washington.edu - 2020-03-02 21:12 -0800
Re: Languages with optional spaces Gene <gene.ressler@gmail.com> - 2020-04-14 10:08 -0700
Re: Languages with optional spaces mertesthomas@gmail.com - 2020-04-19 04:04 -0700
Re: Languages with optional spaces aston.goldsmith@gmail.com - 2020-05-05 13:05 -0700
csiph-web