Re: State-of-the-art algorithms for lexical analysis?

From	Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups	comp.compilers
Subject	Re: State-of-the-art algorithms for lexical analysis?
Date	2022-06-07 19:40 +0300
Organization	Compilers Central
Message-ID	<22-06-019@comp.compilers> (permalink)
References	<22-06-006@comp.compilers> <22-06-007@comp.compilers> <22-06-008@comp.compilers> <22-06-013@comp.compilers> <22-06-015@comp.compilers>

Show all headers | View raw

Yes, as our moderator explained.  I was talking about things like
FORTRAN Hollerith strings, but more importantly network packets, where
they give the size of the "field" within a packet and then you simply
take that many characters (or bytes or bits or some other quanta) as
the "token".  This is quite important for parsing "binary" data.  And,
sometimes the numbers are text like I showed but in many protocols the
numbers are "binary"  e.g. something like

\xAHabcdefghij where \xA is a single 8 bit character (octet) whose
bits are "0000 1010"  (or maybe 4, 8 bit, characters -- 4 octets),
that represent a 32 integer).

And, as our moderator pointed out, this makes a terrible regular
expression, NFA, DFA, but it is actually quite easy in nearly any
programming language. You read the length in, convert it to an integer
and then loop reading that many characters from the input and call
that a "token".

Kind regards,
Chris

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------
[Right.  When I was writing Fortran lexers, Hollerith strings were among the
simplest of the kludges I had to use. -John]

Thread

State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-05 20:53 +0000
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-05 16:05 -0700
    Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-06 08:59 +0200
      State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 21:16 +0300
        Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-07 06:52 +0200
          Re: State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-07 19:40 +0300
            Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-08 05:32 +0200
              Re: counted strings, was State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-09 11:54 -0700
                Re: counted characters in strings "Robin Vowels" <robin51@dodo.com.au> - 2022-06-10 12:21 +1000
                Re: counted characters in strings Martin Ward <martin@gkc.org.uk> - 2022-06-11 10:52 +0100
                Re: counted characters in strings drb@msu.edu (Dennis Boone) - 2022-06-11 11:09 -0500
    Re: State-of-the-art algorithms for lexical analysis? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-06 16:00 +0000
    References for PSL ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 20:11 +0300

csiph-web