Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3049

Re: State-of-the-art algorithms for lexical analysis?

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From gah4 <gah4@u.washington.edu>
Newsgroups comp.compilers
Subject Re: State-of-the-art algorithms for lexical analysis?
Date Mon, 6 Jun 2022 10:03:55 -0700 (PDT)
Organization Compilers Central
Lines 29
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-06-011@comp.compilers> (permalink)
References <Adh5kg76Z0xZslIuRRyzgUhteE2M6A==> <22-06-009@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="77191"; mail-complaints-to="abuse@iecc.com"
Keywords lex
Posted-Date 06 Jun 2022 15:57:06 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
In-Reply-To <22-06-009@comp.compilers>
Xref csiph.com comp.compilers:3049

Show key headers only | View raw


On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote:

(snip)

> I will look into PSL. There are algorithms for converting regexes to DFA
> and then using the DFA to tokenize the input. Are there algorithms for
> converting PSL to (what?) and then using the (what?) to tokenize the input?

The approximate searches are done using dynamic programming.
The penalty is 1 for insertion, deletion, or substitution and the score
is in 3 bits, so up to six spelling errors.

The whole query is then compiled into code for a systolic array,
which then runs as fast as the data comes off disk.

FDF2 is a 9U VME board that runs in a VME based Sun system.

FDF3 connects directly to a SCSI disk, and also to a Sun workstation.
In searching, it transfers directly from the disk.  To load data into
the disk, the disk is accessed indirectly through the FDF3.
It is a desktop box, about the size of a large external SCSI disk.

Some of it is described here:

https://aclanthology.org/X93-1011.pdf

along with its use for searching Japanese text, and:

https://trec.nist.gov/pubs/trec3/papers/paper.ps.gz

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-06 10:48 +0000
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-06 10:03 -0700
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-06 12:25 -0700

csiph-web