Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.compilers > #3056

Re: State-of-the-art algorithms for lexical analysis?

Path csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end
From Christopher F Clark <christopher.f.clark@compiler-resources.com>
Newsgroups comp.compilers
Subject Re: State-of-the-art algorithms for lexical analysis?
Date Tue, 7 Jun 2022 19:40:11 +0300
Organization Compilers Central
Lines 30
Sender news@iecc.com
Approved comp.compilers@iecc.com
Message-ID <22-06-019@comp.compilers> (permalink)
References <22-06-006@comp.compilers> <22-06-007@comp.compilers> <22-06-008@comp.compilers> <22-06-013@comp.compilers> <22-06-015@comp.compilers>
Mime-Version 1.0
Content-Type text/plain; charset="UTF-8"
Injection-Info gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="50232"; mail-complaints-to="abuse@iecc.com"
Keywords lex, comment
Posted-Date 07 Jun 2022 13:05:09 EDT
X-submission-address compilers@iecc.com
X-moderator-address compilers-request@iecc.com
X-FAQ-and-archives http://compilers.iecc.com
Xref csiph.com comp.compilers:3056

Show key headers only | View raw


Yes, as our moderator explained.  I was talking about things like
FORTRAN Hollerith strings, but more importantly network packets, where
they give the size of the "field" within a packet and then you simply
take that many characters (or bytes or bits or some other quanta) as
the "token".  This is quite important for parsing "binary" data.  And,
sometimes the numbers are text like I showed but in many protocols the
numbers are "binary"  e.g. something like

\xAHabcdefghij where \xA is a single 8 bit character (octet) whose
bits are "0000 1010"  (or maybe 4, 8 bit, characters -- 4 octets),
that represent a 32 integer).

And, as our moderator pointed out, this makes a terrible regular
expression, NFA, DFA, but it is actually quite easy in nearly any
programming language. You read the length in, convert it to an integer
and then loop reading that many characters from the input and call
that a "token".

Kind regards,
Chris

--
******************************************************************************
Chris Clark                  email: christopher.f.clark@compiler-resources.com
Compiler Resources, Inc.  Web Site: http://world.std.com/~compres
23 Bailey Rd                 voice: (508) 435-5016
Berlin, MA  01503 USA      twitter: @intel_chris
------------------------------------------------------------------------------
[Right.  When I was writing Fortran lexers, Hollerith strings were among the
simplest of the kludges I had to use. -John]

Back to comp.compilers | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

State-of-the-art algorithms for lexical analysis? Roger L Costello <costello@mitre.org> - 2022-06-05 20:53 +0000
  Re: State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-05 16:05 -0700
    Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-06 08:59 +0200
      State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 21:16 +0300
        Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-07 06:52 +0200
          Re: State-of-the-art algorithms for lexical analysis? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-07 19:40 +0300
            Re: State-of-the-art algorithms for lexical analysis? Hans-Peter Diettrich <DrDiettrich1@netscape.net> - 2022-06-08 05:32 +0200
              Re: counted strings, was State-of-the-art algorithms for lexical analysis? gah4 <gah4@u.washington.edu> - 2022-06-09 11:54 -0700
                Re: counted characters in strings "Robin Vowels" <robin51@dodo.com.au> - 2022-06-10 12:21 +1000
                Re: counted characters in strings Martin Ward <martin@gkc.org.uk> - 2022-06-11 10:52 +0100
                Re: counted characters in strings drb@msu.edu (Dennis Boone) - 2022-06-11 11:09 -0500
    Re: State-of-the-art algorithms for lexical analysis? Kaz Kylheku <480-992-1380@kylheku.com> - 2022-06-06 16:00 +0000
    References for PSL ? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-06-06 20:11 +0300

csiph-web