Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: gah4 Newsgroups: comp.compilers Subject: Re: State-of-the-art algorithms for lexical analysis? Date: Mon, 6 Jun 2022 10:03:55 -0700 (PDT) Organization: Compilers Central Lines: 29 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-011@comp.compilers> References: <22-06-009@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="77191"; mail-complaints-to="abuse@iecc.com" Keywords: lex Posted-Date: 06 Jun 2022 15:57:06 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com In-Reply-To: <22-06-009@comp.compilers> Xref: csiph.com comp.compilers:3049 On Monday, June 6, 2022 at 8:06:28 AM UTC-7, Roger L Costello wrote: (snip) > I will look into PSL. There are algorithms for converting regexes to DFA > and then using the DFA to tokenize the input. Are there algorithms for > converting PSL to (what?) and then using the (what?) to tokenize the input? The approximate searches are done using dynamic programming. The penalty is 1 for insertion, deletion, or substitution and the score is in 3 bits, so up to six spelling errors. The whole query is then compiled into code for a systolic array, which then runs as fast as the data comes off disk. FDF2 is a 9U VME board that runs in a VME based Sun system. FDF3 connects directly to a SCSI disk, and also to a Sun workstation. In searching, it transfers directly from the disk. To load data into the disk, the disk is accessed indirectly through the FDF3. It is a desktop box, about the size of a large external SCSI disk. Some of it is described here: https://aclanthology.org/X93-1011.pdf along with its use for searching Japanese text, and: https://trec.nist.gov/pubs/trec3/papers/paper.ps.gz