Groups > comp.lang.c > #382985 > unrolled thread

Implicit String-Literal Concatenation

Started by	Lawrence D'Oliveiro <ldo@nz.invalid>
First post	2024-02-24 23:05 +0000
Last post	2024-02-29 19:08 +0100
Articles	11 on this page of 111 — 15 participants

Back to article view | Back to comp.lang.c

  Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-24 23:05 +0000
    Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-25 17:38 +0100
      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-25 20:43 +0000
        Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-25 21:20 +0000
    Re: Implicit String-Literal Concatenation Blue-Maned_Hawk <bluemanedhawk@invalid.invalid> - 2024-02-25 16:45 +0000
    Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-25 20:25 +0000
    Re: Implicit String-Literal Concatenation Łukasz 'Maly' Ostrowski <l3vi4than@gmail.com> - 2024-02-26 21:12 +0100
      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-26 20:31 +0000
        Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-27 13:18 +0100
          Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 23:10 +0000
            Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-28 00:50 +0100
    Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-26 20:42 +0000
    Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-26 22:03 +0000
      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-26 23:17 +0000
        Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-27 17:27 +0000
      Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-27 09:36 +0100
        Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-27 17:31 +0000
        Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-27 18:56 +0000
          Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-27 23:21 +0100
            Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 22:52 +0000
              Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-28 01:09 +0000
                Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 12:50 +0100
                  Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-28 20:56 +0000
                    Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 21:34 +0000
                      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-28 23:52 +0000
                        Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 00:15 +0000
                          Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 02:53 +0000
                            Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 09:20 +0000
                          Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 15:48 +0000
                            Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 17:03 +0100
                              Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 16:17 +0000
                                Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 18:12 +0100
                                  Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 17:30 +0000
                                    Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:20 -0800
                                      Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 21:44 +0000
                                        Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 14:06 -0800
                                          Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-03-01 18:09 +0100
                                            Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-01 10:49 -0800
                                              Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-03-01 22:06 +0100
                                Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:20 -0800
                    Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 08:58 +0100
                      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:05 +0000
                        Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 09:16 +0100
                          Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-01 16:55 +0000
                            Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 18:28 +0100
        Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 20:25 +0000
          Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-27 12:35 -0800
            Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 23:03 +0000
          Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-27 22:12 +0000
            Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 12:54 +0100
              Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 13:13 +0000
                Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 15:08 +0100
                Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:36 -0800
                  Re: Implicit String-Literal Concatenation Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-02-29 11:56 +0000
                    Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 16:19 +0100
                      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:36 +0000
                        Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:53 -0800
                        Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-03-01 12:59 +0000
                          Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-01 20:59 +0000
                    Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:08 -0800
                  Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 14:31 +0000
                    Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-02-29 15:22 +0000
                      Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-02-29 13:10 -0800
                        Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:45 -0800
                          Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-02-29 14:03 -0800
                            Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 14:14 -0800
                              Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-02 13:48 -0800
                              Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-05 04:48 +0000
                                Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-04 20:55 -0800
                                  Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-07 21:08 +0000
                                    Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-07 21:44 +0000
                                      Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 14:25 -0800
                                        Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-07 23:00 +0000
                                          Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 15:46 -0800
                                        Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-07 16:17 -0800
                                        Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-03-08 00:26 +0000
                                    Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 14:16 -0800
                    Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 16:30 +0100
                      Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:25 -0800
                    Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:18 -0800
                      Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 18:17 +0100
                        Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:22 -0800
                          Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-29 19:26 +0000
                            Re: Implicit String-Literal Concatenation James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-02-29 14:45 -0500
                Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:41 -0800
              Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:57 -0800
                Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 23:01 +0000
                  Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 15:31 -0800
                    Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 00:47 +0000
                      Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 17:12 -0800
                      Re: Implicit String-Literal Concatenation tTh <tth@none.invalid> - 2024-02-29 16:29 +0100
                        Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 16:15 +0000
                      Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 15:53 +0000
                        Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:06 -0800
                          Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 17:28 +0000
                            Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 18:58 +0100
                              Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 18:05 +0000
                                Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 18:09 +0000
                                  Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:27 +0000
                                    Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-03-01 11:52 +0000
                                      Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-05 04:47 +0000
                                        Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-03-05 15:09 +0000
                                          Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-06 01:49 +0000
                                Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 20:51 +0100
                Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 10:10 +0100
                  Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 10:18 +0000
                    Re: Implicit String-Literal Concatenation tTh <tth@none.invalid> - 2024-02-29 16:34 +0100
                      Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-03-01 11:58 +0000
                        Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 13:17 +0100
                  Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:03 -0800
                    Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 19:08 +0100

Page 6 of 6 — ← Prev page 1 2 3 4 5 [6]

#383364

From	Lawrence D'Oliveiro <ldo@nz.invalid>
Date	2024-03-05 04:47 +0000
Message-ID	<us684m$3jpc3$4@dont-email.me>
In reply to	#383210

On Fri, 1 Mar 2024 11:52:16 +0000, bart wrote:

> On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
>
>> On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
>> 
>>> I have an actual use case today where #embed of a (C++) std::map
>>> binary object created by separate tool would be very useful.  I'm
>>> planning on using mmap to load it at runtime at the moment.
>> 
>> Why not convert it to a .o file and statically link it into your
>> program as part of the build process?
> 
> That's exactly what #embed will enable.

You can call it a toy version of objcopy
<https://manpages.debian.org/1/objcopy.1.html>.

[toc] | [prev] | [next] | [standalone]

#383392

From	scott@slp53.sl.home (Scott Lurndal)
Date	2024-03-05 15:09 +0000
Message-ID	<mAGFN.115189$m4d.8674@fx43.iad>
In reply to	#383364

Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>On Fri, 1 Mar 2024 11:52:16 +0000, bart wrote:
>
>> On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
>>
>>> On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
>>> 
>>>> I have an actual use case today where #embed of a (C++) std::map
>>>> binary object created by separate tool would be very useful.  I'm
>>>> planning on using mmap to load it at runtime at the moment.
>>> 
>>> Why not convert it to a .o file and statically link it into your
>>> program as part of the build process?
>> 
>> That's exactly what #embed will enable.
>
>You can call it a toy version of objcopy
><https://manpages.debian.org/1/objcopy.1.html>.

While objcopy supports a number of ways to
manipulate an ELF file, I wouldn't equate it
with #embed at all.

[toc] | [prev] | [next] | [standalone]

#383406

From	Lawrence D'Oliveiro <ldo@nz.invalid>
Date	2024-03-06 01:49 +0000
Message-ID	<us8i3g$24j7$2@dont-email.me>
In reply to	#383392

On Tue, 05 Mar 2024 15:09:06 GMT, Scott Lurndal wrote:

> Lawrence D'Oliveiro <ldo@nz.invalid> writes:
>
>>On Fri, 1 Mar 2024 11:52:16 +0000, bart wrote:
>>
>>> On 29/02/2024 21:27, Lawrence D'Oliveiro wrote:
>>>
>>>> On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote:
>>>> 
>>>>> I have an actual use case today where #embed of a (C++) std::map
>>>>> binary object created by separate tool would be very useful.  I'm
>>>>> planning on using mmap to load it at runtime at the moment.
>>>> 
>>>> Why not convert it to a .o file and statically link it into your
>>>> program as part of the build process?
>>> 
>>> That's exactly what #embed will enable.
>>
>>You can call it a toy version of objcopy
>><https://manpages.debian.org/1/objcopy.1.html>.
> 
> While objcopy supports a number of ways to manipulate an ELF file, I
> wouldn't equate it with #embed at all.

It does a whole lot more.

[toc] | [prev] | [next] | [standalone]

#383194

From	David Brown <david.brown@hesbynett.no>
Date	2024-02-29 20:51 +0100
Message-ID	<urqn7f$p9ki$1@dont-email.me>
In reply to	#383188

On 29/02/2024 19:05, Scott Lurndal wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 29/02/2024 18:28, Scott Lurndal wrote:
>>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:
>>>> scott@slp53.sl.home (Scott Lurndal) writes:
>>>>> bart <bc@freeuk.com> writes:
>>>>>> On 28/02/2024 23:31, Keith Thompson wrote:
>>>>>>> bart <bc@freeuk.com> writes:
>>>>>>
>>>>>>>> It would be unfortunate if your example was allowed. Clearly a binary
>>>>>>>> representation of an instance of your struct would probably require 16
>>>>>>>> bytes rather than 4, of which one may be padding.
>>>>>>>
>>>>>>> Depending on the sizes and alignments of the various types, sure.
>>>>>>> So what?
>>>>>>>
>>>>>>
>>>>>>> If you have suggestions for alternate ways to define #embed, they might
>>>>>>> be interesting, but it's too late to change the existing specification.
>>>>>>>
>>>>>>
>>>>>> My early comments on this were about compiler performance. I suggested
>>>>>> there might be a way to turn 100,000 byte values in a file, directly
>>>>>> into a 100KB string or data block, without needing to first convert
>>>>>> 100,000 values into 100,000 integer expressions representated as tokens,
>>>>>> and to then parse those 100,000 expressions into AST nodes etc.
>>>>>>
>>>>>> DB suggested something like that was actually done. But you can't do
>>>>>> that if those 100,000 numbers represent from 100KB to 800KB of memory
>>>>>> depending on the data type of the strucure they're initialising.
>>>>>
>>>>> An implementation is free to simply pass a variant (or the directive
>>>>> itself) of #embed from the pre-processor to the compiler if the programmer
>>>>> isn't using -E, and the compiler could simply copy the embedded file
>>>>> into the object file directly, without processing it as a series of
>>>>> integer values.  Much like the #file and #line directives passed by
>>>>> the pre-processor to the compiler.
>>>>
>>>> Sure, an implementation has to operate *as if* it implemented the 8
>>>> translation phases separately.  But given a structure initialized with
>>>> #embed, it would have to generate additional code to initialize the
>>>> structure members from the bytes of the binary blob.
>>>
>>> Would it?  Or could it simply assume that the binary blob
>>> is already in the same binary format that writing an instance
>>> of the structure from a C application on the same host would have created?
>>
>> That would depend on the sizes of the fields in the struct, and the size
>> of the integer constants in the #embed.
> 
> I'm embedding a binary file.  I want the representation in memory
> to be _exactly_ the same as in the file, regardless of how it is
> defined in the C code (array of char, array of int, array of long, struct whatever).
> 

Then you would want a union of the struct type and an appropriately 
sized unsigned char array, and initialise the unsigned char area with 
the bytes of the file using #embed.

[toc] | [prev] | [next] | [standalone]

#383157

From	David Brown <david.brown@hesbynett.no>
Date	2024-02-29 10:10 +0100
Message-ID	<urphlj$ejuu$1@dont-email.me>
In reply to	#383145

On 28/02/2024 22:57, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> [...]
>> They won't use strings, they will use data blobs - binary data.  Then
>> there is no issue with null bytes.  And yes, implementations will skip
>> the token generation (unless you are doing something weird, such as
>> using #embed to read the parameters to a function call).
>>
>> Tests with prototype implementations gave extremely fast results.
> 
> I'm not sure how that would work.  #embed is a preprocessor directive,
> and at least in the abstract model it has to expand to valid C code.
> 
> I would have expected that it would simply generate the list of
> comma-separated integer constants described in the standard; later
> phases would simply parse that list and generate code as if that
> sequence had been written in the original source file.  Do you know of
> an implementation that does something else?
> 

The key thing, as I understand it, is that the compiler gets to know 
that the integers in the list are all "nice".  And since the 
preprocessor and the compiler are part of the same implementation (even 
if they are separate programs communicating with pipes or temporary 
files), the preprocessor could pass on the binary blob in a pre-parsed form.

Think about what a preprocessor and compiler does with the initialisers 
in an array, written in normal text (such as by using "xxd -i" or 
another external script).  For each integer, it has to divide up the 
tokens, identify the comma, parse the integer, check that it is a valid 
integer, figure out its type based on the size (and suffix, if any).  It 
needs to record the line number and column number for possible later 
reference in error or warning messages.  It has to check the value of 
the integer against the type for the array elements, and possibly change 
the value to suit, or issue warnings for out-of-range values.  It has to 
allocate all the space to store this information as it goes along, 
without knowing the size of the array - so it will be lots of small 
mallocs and/or wasted space.  It's a /lot/.  (Simpler compilers can get 
away with a bit less effort, especially if they have more limited warnings.)

With #embed, the preprocessor can generate a compiler-specific "start of 
embed" informational directive (much like "#line" directives and such 
things generated by preprocessors today), then the data in a very 
specific format, then an "end of embed" directive.  It could, for 
example, generate all the integers in the format "0xAB, " with 16 
elements per line.  The compiler wouldn't need to parse the data 
normally - it knows exactly how many elements there are (from the "start 
of embed" directive), it knows exactly where to find each entry (as each 
is 6 characters long), it only needs to look at two of these characters, 
there's never any errors, the source line number is fixed (at the #embed 
line), and so on.

A more tightly coupled preprocessor and compiler can do even better - 
for array initialisation, the binary blob could be used directly without 
ever generating integer constants or parsing them.

The results of testing are that #embed is /massively/ faster and lower 
memory compared to external generators, especially for larger files. 
And it gives you the data on-hand for optimisation purposes, unlike 
external direct linking of binary blobs.  (So you can get the size of 
the array, or use values from it as compile-time known values.)

> For example, say you have a file "foo.dat" containing 4 bytes with
> values 0, 1, 2, and 3.  This would be perfectly valid:
> 
>      struct foo {
>          unsigned char a;
>          unsigned short b;
>          unsigned int c;
>          double d;
>      };
> 
>      struct foo obj = {
> #embed "foo.dat"
>      };
> 
> #embed isn't defined to translate an input file to a sequence of bytes.
> It's defined to translate an input file to a sequence of integer
> constant expressions.
> 

Yes.  But the prime speed (and memory usage) gains come in, are for 
large files, and that means array initialisers.  That does not conflict 
with using it for cases like yours.

> *Maybe* a compiler could optimize for the case where it knows that it's
> being used to initialize an array of unsigned char, but (a) that would
> require the preprocessor to have information that normally doesn't exist
> until later phases, and (b) I'm not convinced it would be worth the
> effort.
> 

Look at 
<https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.

In those tests, for a 40 MB file gcc #embed is 200 times faster than 
"xxd -i" generated files, and takes about 2.5% of the memory.  It scales 
to 1 GB files.  And that's just a proof-of-concept implementation.

[toc] | [prev] | [next] | [standalone]

#383159

From	bart <bc@freeuk.com>
Date	2024-02-29 10:18 +0000
Message-ID	<urpllc$ff02$1@dont-email.me>
In reply to	#383157

On 29/02/2024 09:10, David Brown wrote:
> On 28/02/2024 22:57, Keith Thompson wrote:

>> *Maybe* a compiler could optimize for the case where it knows that it's
>> being used to initialize an array of unsigned char, but (a) that would
>> require the preprocessor to have information that normally doesn't exist
>> until later phases, and (b) I'm not convinced it would be worth the
>> effort.
>>
> 
> Look at 
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
> 
> In those tests, for a 40 MB file gcc #embed is 200 times faster than 
> "xxd -i" generated files, and takes about 2.5% of the memory.  It scales 
> to 1 GB files.  And that's just a proof-of-concept implementation.

I've just down my own tests, with a 40MB data file containing random 
A..Z letters (so can be processed as a text file).

This was converted also to a 120MB text file contain a list of numbers 
("65,66,73,...", 3 characters for each data byte).

Using 'strinclude' in my old C compiler, it took about 1 second to build 
this program:

   #include <stdio.h>
   #include <string.h>

   char* s=strinclude("data");

   int main(void) {
      printf("%zu\n", strlen(s));
  }

(Running it shows '40000000'.) The same test in my language (which has 
no intermediate ASM stage) took 0.3 seconds.

Next I tried instead that 120MB text file containing the same data but 
as text, initialising a char[] array using #include.

Tcc took 12 seconds. Bcc took 56 seconds (via ASM etc).

gcc got up to about 3GB memory usage then 'cc1' failed trying to 
allocate 0.5GB, after about a minute.

Processing long list of numbers DOES use considerable resources. Bear in 
mind that #embed also needs to take binary data and generate tokens, 
possibly converting each binary number to text.

[toc] | [prev] | [next] | [standalone]

#383169

From	tTh <tth@none.invalid>
Date	2024-02-29 16:34 +0100
Message-ID	<urq860$2ntv$2@news.gegeweb.eu>
In reply to	#383159

On 2/29/24 11:18, bart wrote:
> Using 'strinclude' in my old C compiler, it took about 1 second to build 
> this program:
> 
>    #include <stdio.h>
>    #include <string.h>
> 
>    char* s=strinclude("data");
> 
>    int main(void) {
>       printf("%zu\n", strlen(s));
>   }

tth@redlady:~/Desktop$ man strinclude
No manual entry for strinclude
tth@redlady:~/Desktop$

-- 
+---------------------------------------------------------------------+
|          https://tube.interhacker.space/a/tth/video-channels        |
+---------------------------------------------------------------------+

[toc] | [prev] | [next] | [standalone]

#383211

From	bart <bc@freeuk.com>
Date	2024-03-01 11:58 +0000
Message-ID	<ursftp$17dak$2@dont-email.me>
In reply to	#383169

On 29/02/2024 15:34, tTh wrote:
> On 2/29/24 11:18, bart wrote:
>> Using 'strinclude' in my old C compiler, it took about 1 second to 
>> build this program:
>>
>>    #include <stdio.h>
>>    #include <string.h>
>>
>>    char* s=strinclude("data");
>>
>>    int main(void) {
>>       printf("%zu\n", strlen(s));
>>   }
> 
> tth@redlady:~/Desktop$ man strinclude
> No manual entry for strinclude
> tth@redlady:~/Desktop$
> 

'strinclude' is an extension I made for that compiler.

#embed is the new feature of C23. Although I'm not sure how it would be 
used to initialise a char* pointer. Perhaps like this:

     char dummy[]  {
     #embed "data"
     ,0};
     char* s = dummy;

(I've added a 0-terminator here; I don't know if #embed will take care 
of that.)

My 'strinclude' produces a zero-terminated string, but it is done within 
the parser rather than lexer.

[toc] | [prev] | [next] | [standalone]

#383212

From	David Brown <david.brown@hesbynett.no>
Date	2024-03-01 13:17 +0100
Message-ID	<ursgvt$17n29$1@dont-email.me>
In reply to	#383211

On 01/03/2024 12:58, bart wrote:
> On 29/02/2024 15:34, tTh wrote:
>> On 2/29/24 11:18, bart wrote:
>>> Using 'strinclude' in my old C compiler, it took about 1 second to 
>>> build this program:
>>>
>>>    #include <stdio.h>
>>>    #include <string.h>
>>>
>>>    char* s=strinclude("data");
>>>
>>>    int main(void) {
>>>       printf("%zu\n", strlen(s));
>>>   }
>>
>> tth@redlady:~/Desktop$ man strinclude
>> No manual entry for strinclude
>> tth@redlady:~/Desktop$
>>
> 
> 'strinclude' is an extension I made for that compiler.
> 
> #embed is the new feature of C23. Although I'm not sure how it would be 
> used to initialise a char* pointer. Perhaps like this:
> 
>      char dummy[]  {
>      #embed "data"
>      ,0};
>      char* s = dummy;
> 
> (I've added a 0-terminator here; I don't know if #embed will take care 
> of that.)

#embed very specifically does not add anything.  So you would do :

	const char s[] = {
		#embed "data" suffix(,)
		0
	};

The "suffix" parameter adds a comma if "data" is not empty, and does 
nothing if "data" is empty.  Writing it as you did would work fine for 
non-empty "data" but give the nonsensical results {,0} if "data" is 
empty.  (You might not care about such cases and prefer to write the 
simpler version, but now you also know about "suffix".)

There is no need to have a separate character pointer variable - the 
const char array can be used directly in most circumstances.

> 
> My 'strinclude' produces a zero-terminated string, but it is done within 
> the parser rather than lexer.

[toc] | [prev] | [next] | [standalone]

#383179

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2024-02-29 09:03 -0800
Message-ID	<87a5njtd72.fsf@nosuchdomain.example.com>
In reply to	#383157

David Brown <david.brown@hesbynett.no> writes:
> On 28/02/2024 22:57, Keith Thompson wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>> [...]
>>> They won't use strings, they will use data blobs - binary data.  Then
>>> there is no issue with null bytes.  And yes, implementations will skip
>>> the token generation (unless you are doing something weird, such as
>>> using #embed to read the parameters to a function call).
>>>
>>> Tests with prototype implementations gave extremely fast results.
>> I'm not sure how that would work.  #embed is a preprocessor
>> directive,
>> and at least in the abstract model it has to expand to valid C code.
>> I would have expected that it would simply generate the list of
>> comma-separated integer constants described in the standard; later
>> phases would simply parse that list and generate code as if that
>> sequence had been written in the original source file.  Do you know of
>> an implementation that does something else?
>
> The key thing, as I understand it, is that the compiler gets to know
> that the integers in the list are all "nice".  And since the 
> preprocessor and the compiler are part of the same implementation
> (even if they are separate programs communicating with pipes or
> temporary files), the preprocessor could pass on the binary blob in a
> pre-parsed form.
[...]

Sure, an implementation *could* optimize #embed so it expands to some
implementation-defined nonstandard form that later phases can treat as
raw data.  But since it's defined as a preprocessor directive, it's
difficult to see how it could do so while covering all cases.

[...]

> The results of testing are that #embed is /massively/ faster and lower
> memory compared to external generators, especially for larger files. 
> And it gives you the data on-hand for optimisation purposes, unlike
> external direct linking of binary blobs.  (So you can get the size of 
> the array, or use values from it as compile-time known values.)

What testing?  The very latest versions of gcc and clang (I checked both
their git repos yesterday) do not yet implement #embed.

>> For example, say you have a file "foo.dat" containing 4 bytes with
>> values 0, 1, 2, and 3.  This would be perfectly valid:
>>      struct foo {
>>          unsigned char a;
>>          unsigned short b;
>>          unsigned int c;
>>          double d;
>>      };
>>      struct foo obj = {
>> #embed "foo.dat"
>>      };
>> #embed isn't defined to translate an input file to a sequence of
>> bytes.
>> It's defined to translate an input file to a sequence of integer
>> constant expressions.
>
> Yes.  But the prime speed (and memory usage) gains come in, are for
> large files, and that means array initialisers.  That does not
> conflict with using it for cases like yours.

So a compiler that does this would have to be able to handle

    struct foo obj = {
#blob
<binary data>
#endblob>
    };

and initialize a, b, c, and d to 0, 1, 2, and 3.0, respectively from
successive bytes of the binary data.  Either that, or the preprocessor
would have to use information it doesn't have to determine how to expand
#embed.

>> *Maybe* a compiler could optimize for the case where it knows that it's
>> being used to initialize an array of unsigned char, but (a) that would
>> require the preprocessor to have information that normally doesn't exist
>> until later phases, and (b) I'm not convinced it would be worth the
>> effort.
>
> Look at
> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
>
> In those tests, for a 40 MB file gcc #embed is 200 times faster than
> "xxd -i" generated files, and takes about 2.5% of the memory.  It
> scales to 1 GB files.  And that's just a proof-of-concept
> implementation.

That's for std::embed, a proposed C++ feature that's *not* defined as a
preprocessor directive.  Sample usage from the paper:

    constexpr std::span<const std::byte> fxaa_binary = 
        std::embed( "fxaa.spirv" );

So the compiler knows the type of the object being initialized.

(Note that the author of that C++ paper is also the editor for the C
standard.)

I'm still skeptical that C's #embed will actually be implemented other
than as expanding to a sequence of integer constants.

On the other hand, C23 allows for additional implementation-defined
parameters to #embed (as well as the standard embed parameters limit,
prefix, suffix, and is_empty).  Such a parameter could specify how it's
expanded, perhaps to some implementation-defined blob format.  *If*
compilers optimize #embed to something other than a sequence of integer
constant expressions, that's probably how it would be done.  But since
neither gcc nor clang implements #embed at all, it may be too early to
speculate.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#383189

From	David Brown <david.brown@hesbynett.no>
Date	2024-02-29 19:08 +0100
Message-ID	<urqh6m$nt9f$2@dont-email.me>
In reply to	#383179

On 29/02/2024 18:03, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
>> On 28/02/2024 22:57, Keith Thompson wrote:
>>> David Brown <david.brown@hesbynett.no> writes:
>>> [...]
>>>> They won't use strings, they will use data blobs - binary data.  Then
>>>> there is no issue with null bytes.  And yes, implementations will skip
>>>> the token generation (unless you are doing something weird, such as
>>>> using #embed to read the parameters to a function call).
>>>>
>>>> Tests with prototype implementations gave extremely fast results.
>>> I'm not sure how that would work.  #embed is a preprocessor
>>> directive,
>>> and at least in the abstract model it has to expand to valid C code.
>>> I would have expected that it would simply generate the list of
>>> comma-separated integer constants described in the standard; later
>>> phases would simply parse that list and generate code as if that
>>> sequence had been written in the original source file.  Do you know of
>>> an implementation that does something else?
>>
>> The key thing, as I understand it, is that the compiler gets to know
>> that the integers in the list are all "nice".  And since the
>> preprocessor and the compiler are part of the same implementation
>> (even if they are separate programs communicating with pipes or
>> temporary files), the preprocessor could pass on the binary blob in a
>> pre-parsed form.
> [...]
> 
> Sure, an implementation *could* optimize #embed so it expands to some
> implementation-defined nonstandard form that later phases can treat as
> raw data.  But since it's defined as a preprocessor directive, it's
> difficult to see how it could do so while covering all cases.
> 

It would require a strong link between the compiler and the preprocessor 
- as you know, these don't have to be separate programs.  In a more 
weakly coupled system, there could still be a method for passing a 
binary blob to the compiler in addition to the integer data, and let the 
compiler use whichever form it preferred (based on what your code does 
with the data).

> [...]
> 
>> The results of testing are that #embed is /massively/ faster and lower
>> memory compared to external generators, especially for larger files.
>> And it gives you the data on-hand for optimisation purposes, unlike
>> external direct linking of binary blobs.  (So you can get the size of
>> the array, or use values from it as compile-time known values.)
> 
> What testing?  The very latest versions of gcc and clang (I checked both
> their git repos yesterday) do not yet implement #embed.
> 

I believe prototypes, tests, or proofs of concept have been made for 
gcc, clang and perhaps other tools.  I posted a link to some results - 
more are floating around the internet if you want to look for them.

>>> For example, say you have a file "foo.dat" containing 4 bytes with
>>> values 0, 1, 2, and 3.  This would be perfectly valid:
>>>       struct foo {
>>>           unsigned char a;
>>>           unsigned short b;
>>>           unsigned int c;
>>>           double d;
>>>       };
>>>       struct foo obj = {
>>> #embed "foo.dat"
>>>       };
>>> #embed isn't defined to translate an input file to a sequence of
>>> bytes.
>>> It's defined to translate an input file to a sequence of integer
>>> constant expressions.
>>
>> Yes.  But the prime speed (and memory usage) gains come in, are for
>> large files, and that means array initialisers.  That does not
>> conflict with using it for cases like yours.
> 
> So a compiler that does this would have to be able to handle
> 
>      struct foo obj = {
> #blob
> <binary data>
> #endblob>
>      };
> 
> and initialize a, b, c, and d to 0, 1, 2, and 3.0, respectively from
> successive bytes of the binary data.  Either that, or the preprocessor
> would have to use information it doesn't have to determine how to expand
> #embed.
> 

I think I've covered how that could be handled.  (And I don't know how 
it /will/ be handled.  But I am sure compiler implementers will figure a 
way to make it work correctly for any use of the integer constant list, 
while also making it as efficient as they reasonably can for the common 
case of initialising an unsigned char array.)

>>> *Maybe* a compiler could optimize for the case where it knows that it's
>>> being used to initialize an array of unsigned char, but (a) that would
>>> require the preprocessor to have information that normally doesn't exist
>>> until later phases, and (b) I'm not convinced it would be worth the
>>> effort.
>>
>> Look at
>> <https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p1040r6.html#design-practice-speed>.
>>
>> In those tests, for a 40 MB file gcc #embed is 200 times faster than
>> "xxd -i" generated files, and takes about 2.5% of the memory.  It
>> scales to 1 GB files.  And that's just a proof-of-concept
>> implementation.
> 
> That's for std::embed, a proposed C++ feature that's *not* defined as a
> preprocessor directive.  Sample usage from the paper:
> 
>      constexpr std::span<const std::byte> fxaa_binary =
>          std::embed( "fxaa.spirv" );
> 
> So the compiler knows the type of the object being initialized.
> 
> (Note that the author of that C++ paper is also the editor for the C
> standard.)

The work on #embed is being done simultaneously for C and C++. 
std::embed() gives you slightly different way to write it, but the 
implementation is the same.  (Not unlike _Pragma and #pragma in C.)

Other pages I have seen with speed tests show the same pattern while 
referring explicitly to #embed.

> 
> I'm still skeptical that C's #embed will actually be implemented other
> than as expanding to a sequence of integer constants.
> 

We'll see when it all hits the mainline compilers!

> On the other hand, C23 allows for additional implementation-defined
> parameters to #embed (as well as the standard embed parameters limit,
> prefix, suffix, and is_empty).  Such a parameter could specify how it's
> expanded, perhaps to some implementation-defined blob format.  *If*
> compilers optimize #embed to something other than a sequence of integer
> constant expressions, that's probably how it would be done.  But since
> neither gcc nor clang implements #embed at all, it may be too early to
> speculate.
>

[toc] | [prev] | [standalone]

Page 6 of 6 — ← Prev page 1 2 3 4 5 [6]

csiph-web

Implicit String-Literal Concatenation

Contents

#383364

#383392

#383406

#383194

#383157

#383159

#383169

#383211

#383212

#383179

#383189