Path: csiph.com!1.us.feeder.erje.net!feeder.erje.net!border1.nntp.dca1.giganews.com!nntp.giganews.com!news.iecc.com!.POSTED.news.iecc.com!nerds-end From: George Neuner Newsgroups: comp.compilers Subject: Re: Why does the lexer convert text integer lexemes to binary integers? I thought that lexers should be simple? Date: Sun, 17 Jul 2022 16:52:20 -0400 Organization: A noiseless patient Spider Lines: 53 Sender: news@iecc.com Approved: comp.compilers@iecc.com Message-ID: <22-07-032@comp.compilers> References: <22-07-011@comp.compilers> <22-07-015@comp.compilers> <22-07-020@comp.compilers> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 8bit Injection-Info: gal.iecc.com; posting-host="news.iecc.com:2001:470:1f07:1126:0:676f:7373:6970"; logging-data="66004"; mail-complaints-to="abuse@iecc.com" Keywords: lex, performance, comment Posted-Date: 17 Jul 2022 17:23:14 EDT X-submission-address: compilers@iecc.com X-moderator-address: compilers-request@iecc.com X-FAQ-and-archives: http://compilers.iecc.com Xref: csiph.com comp.compilers:3127 On Fri, 15 Jul 2022 03:02:07 -0700 (PDT), Jan Ziak <0xe2.0x9a.0x9b@gmail.com> wrote: >On Friday, July 15, 2022 at 4:13:42 AM UTC+2, George Neuner wrote: >> In many (actually most) cases, the binary representation of an integer >> can be stored in less space than the text representation. > >The output of a command such as (cd /usr/src/linux; grep --only-matching >--recursive "\b[0-9][0-9]*\b") proves the falsity of the above claim. > >Binary, fixed-width, representation of integers is statistically more >space-efficient compared to implicit-width textual representation only if the >text representation of the integers includes (for example): a plain 32/64-bit >pointer to the start of the text, a plain 16/32/64-bit relative/absolute >offset to the start of the number in a character array, the [length of the >textual form of the number] in an explicit form. ??? Decimal 100 is 3 characters in text, but 1 byte in binary. 65535 is 5 characters, but 2 bytes. 2147483648 is 11 characters, but 4 bytes. 0x101010 is 8 characters, 1 byte. 052 is 2 characters, 1 byte. This dichotomy holds for almost all numbers: regardless of what number base is used for the text form, the binary form will be shorter. >Binary, fixed-width, representation of integers is more likely to be more >space-efficient than [their textual representation with an implicit length] if >the source of the integers doesn't originate from a human hand typing digits >on a keyboard. For example, when the source of the integers is a >high-precision measurement device, is something with a physical counterpart >(such as: the GPS coordinates of objects spread across Earth's surface), etc. The source of the value is not relevant - only the form. >-atom You completely misunderstood/misinterpreted what I wrote. I said the binary form CAN BE stored in less space than the text ... I said NOTHING about being restricted to fixed sized binary integers. It should be obvious to anyone that storing 42 into a 64-bit integer is not saving space. YMMV, George [I think the point was that variable length strings are more common than variable length binary integers, but as a message earlier today pointed out, it's not likely to make any difference in practical compilers. -John]