Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.compilers > #2976
| Path | csiph.com!weretis.net!feeder6.news.weretis.net!news.misty.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end |
|---|---|
| From | Christopher F Clark <christopher.f.clark@compiler-resources.com> |
| Newsgroups | comp.compilers |
| Subject | Why is flex pattern-matching of NULs slow? |
| Date | Sat, 9 Apr 2022 21:40:45 +0300 |
| Organization | Compilers Central |
| Lines | 31 |
| Sender | news@iecc.com |
| Approved | comp.compilers@iecc.com |
| Message-ID | <22-04-010@comp.compilers> (permalink) |
| References | <22-04-001@comp.compilers> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="UTF-8" |
| Injection-Info | gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="26217"; mail-complaints-to="abuse@iecc.com" |
| Keywords | lex, i18n, comment |
| Posted-Date | 09 Apr 2022 16:19:41 EDT |
| X-submission-address | compilers@iecc.com |
| X-moderator-address | compilers-request@iecc.com |
| X-FAQ-and-archives | http://compilers.iecc.com |
| Xref | csiph.com comp.compilers:2976 |
Show key headers only | View raw
I haven't looked at Flex in a while either, but what I remember is that 0 is used as end of buffer and EOF indication and that you had to validate against that. I don't recall whether that required an attempt at reading or not. It wouldn't surprise me if it were used as a flag also, and for a "null pointer". Depending upon how you look at it, C either hates 0 or loves it, but it is very often "special". But if you are parsing human readable ASCII text, having 0 (NUL) be an EOF mark is actually not a bad solution. If I recall correctly, that isn't even a bad choice for human readable UTF-8 (including non-latin-1 texts, because 2 and 3 byte sequences don't have NULs in them). It only becomes a pain if you want to parse binary data. By the way, in our lexer, we used -1, i.e. what getc used to return for EOF for the same condition and I don't recall how we put it in the buffer (or whether we even did). Being ex-PL/I and Pascal programmers, we used strings with lengths in many places instead of C strings. I don't remember whether we used Paul Abrahams clever hack to put the length at the end of the string which if done right also serves as a null byte for use as C strings. -- ****************************************************************************** Chris Clark email: christopher.f.clark@compiler-resources.com Compiler Resources, Inc. Web Site: http://world.std.com/~compres 23 Bailey Rd voice: (508) 435-5016 Berlin, MA 01503 USA twitter: @intel_chris ------------------------------------------------------------------------------ [You're right about UTF-8, where NUL is also a reasonable string terminator. UTF-8 is self-synchonizing -- the bytes of no UTF-8 code point are a prefix or suffix of any other code point. -John]
Back to comp.compilers | Previous | Next — Previous in thread | Find similar
Why is flex pattern-matching of NULs slow? Roger L Costello <costello@mitre.org> - 2022-04-08 11:06 +0000 Why is flex pattern-matching of NULs slow? Christopher F Clark <christopher.f.clark@compiler-resources.com> - 2022-04-09 21:40 +0300
csiph-web