Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #3549
| Path | csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | Kaz Kylheku <433-929-6894@kylheku.com> |
| Newsgroups | comp.compilers |
| Subject | Re: Interesting paper on regex NFA matching |
| Date | Sat, 27 Jan 2024 20:20:29 -0000 |
| Organization | Compilers Central |
| Sender | johnl%iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <24-01-007@comp.compilers> (permalink) |
| References | <24-01-006@comp.compilers> |
| MIME-Version | 1.0 |
| Content-Type | text/plain; charset="UTF-8" |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="81627"; mail-complaints-to="abuse@iecc.com" |
| Keywords | lex, comment |
| Posted-Date | 27 Jan 2024 22:19:04 EST |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| Xref | csiph.com comp.compilers:3549 |
Show key headers only | View raw
On 2024-01-27, Christopher F Clark <christopher.f.clark@compiler-resources.com> wrote:
> Kaz Kylheku <433-929-6894@kylheku.com> wrote:
>
>> (I personaly am not so interested in backtracking regexes; and they are
>> not directly applicable to compilers. You'd never want to design a
>> lexical analyzer that requires "lookaround" or whatnot in order to
>> recognize tokens. It's nice they are generating Arxiv papers for
>> someone; no genuine intellectual inquiry is invalid; good luck! It's
>> not applicable to the "Programming Languages" category it has been filed
>> under. I will skim through it, though.)
>
> You are mistaken that they are not applicable to programming languages.
>
> Even Pascal [originally] had cases where lookahead and backtracking
> are required to properly tokenize the input.
>
> The well-known case is "1." if followed by another "." it is two tokens,
> and if followed by a digit it is a floating point number.
> I don't recall what it should be if followed by any other character.
I think that case can be handled by technology like Lex trailing
contexts, which doesn't require backtracking.
{digits}/. {
// ... set up semantic value ..
return INTEGER;
}
{digits}.{digits} {
// ... set up semantic value ..
return FLOAT;
}
Another possibility is to recongize {digits}. as one lexeme and call
unput('.'). That's a minor form of backtracking, that doesn't require us
to do anything funny to the regex implementation.
--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca
[Patterns with alternatives and trailing contexts do have to back up
and the flex manual warns it's quite expensive. See the
PERFORMANCE CONSIDERATIONS and DEFICIENCIES parts of the man page. -John]
Back to comp.compilers | Previous | Next — Previous in thread | Find similar
Re: Interesting paper on regex NFA matching Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2024-01-27 12:47 +0200 Re: Interesting paper on regex NFA matching Kaz Kylheku <433-929-6894@kylheku.com> - 2024-01-27 20:20 +0000
csiph-web