Path: csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: Christopher F Clark Newsgroups: comp.compilers Subject: Re: State-of-the-art algorithms for lexical analysis? Date: Tue, 7 Jun 2022 19:40:11 +0300 Organization: Compilers Central Lines: 30 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-06-019@comp.compilers> References: <22-06-006@comp.compilers> <22-06-007@comp.compilers> <22-06-008@comp.compilers> <22-06-013@comp.compilers> <22-06-015@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50232"; mail-complaints-to="abuse@iecc.com" Keywords: lex, comment Posted-Date: 07 Jun 2022 13:05:09 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3056 Yes, as our moderator explained. I was talking about things like FORTRAN Hollerith strings, but more importantly network packets, where they give the size of the "field" within a packet and then you simply take that many characters (or bytes or bits or some other quanta) as the "token". This is quite important for parsing "binary" data. And, sometimes the numbers are text like I showed but in many protocols the numbers are "binary" e.g. something like \xAHabcdefghij where \xA is a single 8 bit character (octet) whose bits are "0000 1010" (or maybe 4, 8 bit, characters -- 4 octets), that represent a 32 integer). And, as our moderator pointed out, this makes a terrible regular expression, NFA, DFA, but it is actually quite easy in nearly any programming language. You read the length in, convert it to an integer and then loop reading that many characters from the input and call that a "token". Kind regards, Chris -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [Right. When I was writing Fortran lexers, Hollerith strings were among the simplest of the kludges I had to use. -John]