Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.c > #382985 > unrolled thread
| Started by | Lawrence D'Oliveiro <ldo@nz.invalid> |
|---|---|
| First post | 2024-02-24 23:05 +0000 |
| Last post | 2024-02-29 19:08 +0100 |
| Articles | 20 on this page of 111 — 15 participants |
Back to article view | Back to comp.lang.c
Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-24 23:05 +0000
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-25 17:38 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-25 20:43 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-25 21:20 +0000
Re: Implicit String-Literal Concatenation Blue-Maned_Hawk <bluemanedhawk@invalid.invalid> - 2024-02-25 16:45 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-25 20:25 +0000
Re: Implicit String-Literal Concatenation Łukasz 'Maly' Ostrowski <l3vi4than@gmail.com> - 2024-02-26 21:12 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-26 20:31 +0000
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-27 13:18 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 23:10 +0000
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-28 00:50 +0100
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-26 20:42 +0000
Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-26 22:03 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-26 23:17 +0000
Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-27 17:27 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-27 09:36 +0100
Re: Implicit String-Literal Concatenation porkchop@invalid.foo (Mike Sanders) - 2024-02-27 17:31 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-27 18:56 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-27 23:21 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 22:52 +0000
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-28 01:09 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 12:50 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-28 20:56 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 21:34 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-28 23:52 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 00:15 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 02:53 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 09:20 +0000
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 15:48 +0000
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 17:03 +0100
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 16:17 +0000
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 18:12 +0100
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 17:30 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:20 -0800
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 21:44 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 14:06 -0800
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-03-01 18:09 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-01 10:49 -0800
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-03-01 22:06 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:20 -0800
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 08:58 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:05 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 09:16 +0100
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-01 16:55 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 18:28 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 20:25 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-27 12:35 -0800
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-27 23:03 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-27 22:12 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 12:54 +0100
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 13:13 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-28 15:08 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:36 -0800
Re: Implicit String-Literal Concatenation Malcolm McLean <malcolm.arthur.mclean@gmail.com> - 2024-02-29 11:56 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 16:19 +0100
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:36 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:53 -0800
Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-03-01 12:59 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-01 20:59 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:08 -0800
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 14:31 +0000
Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-02-29 15:22 +0000
Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-02-29 13:10 -0800
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 13:45 -0800
Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-02-29 14:03 -0800
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 14:14 -0800
Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-02 13:48 -0800
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-05 04:48 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-04 20:55 -0800
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-07 21:08 +0000
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-07 21:44 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 14:25 -0800
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-03-07 23:00 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 15:46 -0800
Re: Implicit String-Literal Concatenation "Chris M. Thomasson" <chris.m.thomasson.1@gmail.com> - 2024-03-07 16:17 -0800
Re: Implicit String-Literal Concatenation Richard Harnden <richard.nospam@gmail.invalid> - 2024-03-08 00:26 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-03-07 14:16 -0800
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 16:30 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:25 -0800
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 08:18 -0800
Re: Implicit String-Literal Concatenation Janis Papanagnou <janis_papanagnou+ng@hotmail.com> - 2024-02-29 18:17 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:22 -0800
Re: Implicit String-Literal Concatenation Kaz Kylheku <433-929-6894@kylheku.com> - 2024-02-29 19:26 +0000
Re: Implicit String-Literal Concatenation James Kuyper <jameskuyper@alumni.caltech.edu> - 2024-02-29 14:45 -0500
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:41 -0800
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 13:57 -0800
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-28 23:01 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 15:31 -0800
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 00:47 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-28 17:12 -0800
Re: Implicit String-Literal Concatenation tTh <tth@none.invalid> - 2024-02-29 16:29 +0100
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 16:15 +0000
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 15:53 +0000
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:06 -0800
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 17:28 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 18:58 +0100
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 18:05 +0000
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-02-29 18:09 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-02-29 21:27 +0000
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-03-01 11:52 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-05 04:47 +0000
Re: Implicit String-Literal Concatenation scott@slp53.sl.home (Scott Lurndal) - 2024-03-05 15:09 +0000
Re: Implicit String-Literal Concatenation Lawrence D'Oliveiro <ldo@nz.invalid> - 2024-03-06 01:49 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 20:51 +0100
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 10:10 +0100
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-02-29 10:18 +0000
Re: Implicit String-Literal Concatenation tTh <tth@none.invalid> - 2024-02-29 16:34 +0100
Re: Implicit String-Literal Concatenation bart <bc@freeuk.com> - 2024-03-01 11:58 +0000
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-03-01 13:17 +0100
Re: Implicit String-Literal Concatenation Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2024-02-29 09:03 -0800
Re: Implicit String-Literal Concatenation David Brown <david.brown@hesbynett.no> - 2024-02-29 19:08 +0100
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
| From | Janis Papanagnou <janis_papanagnou+ng@hotmail.com> |
|---|---|
| Date | 2024-02-29 18:17 +0100 |
| Message-ID | <urqe71$ndb4$1@dont-email.me> |
| In reply to | #383177 |
On 29.02.2024 17:18, Keith Thompson wrote: > > "abc\0def" is a valid string literal, but its value is not a string. > (No, the standard doesn't say that the value of a string literal is a > string.) This sounds somewhat strange in my ears. Usually a literal for a type will constitute an instance of the type. - I suppose the irregularity stems from the fact that there's no explicit string object type in C. Janis
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-29 09:22 -0800 |
| Message-ID | <87wmqnrxrf.fsf@nosuchdomain.example.com> |
| In reply to | #383182 |
Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes:
> On 29.02.2024 17:18, Keith Thompson wrote:
>> "abc\0def" is a valid string literal, but its value is not a string.
>> (No, the standard doesn't say that the value of a string literal is a
>> string.)
>
> This sounds somewhat strange in my ears. Usually a literal for a type
> will constitute an instance of the type. - I suppose the irregularity
> stems from the fact that there's no explicit string object type in C.
Exactly, "string" is not a type.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Kaz Kylheku <433-929-6894@kylheku.com> |
|---|---|
| Date | 2024-02-29 19:26 +0000 |
| Message-ID | <20240229110522.422@kylheku.com> |
| In reply to | #383184 |
On 2024-02-29, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: > Janis Papanagnou <janis_papanagnou+ng@hotmail.com> writes: >> On 29.02.2024 17:18, Keith Thompson wrote: >>> "abc\0def" is a valid string literal, but its value is not a string. >>> (No, the standard doesn't say that the value of a string literal is a >>> string.) >> >> This sounds somewhat strange in my ears. Usually a literal for a type >> will constitute an instance of the type. - I suppose the irregularity >> stems from the fact that there's no explicit string object type in C. > > Exactly, "string" is not a type. It is a type in the broader sense, in that is a logical proposition about the attributes of an object that is true or false. It's just not a type in the C static type system. What that means is that there does not exist a constraint rule in standard C requiring some expression or object to conform to the string type. The concept "string" is not represented in the constraint system. But it is a type concept. (There are rules that require a string, but they are not constraint rules. E.g. if strlen is given an argument which isn't a string, the behavior is undefined.) Consider: char a[3] = "abc"; size_t l = strlen(a); In the unlikely event that this example would capture the attention of a computer scientist who researches type systems, he or she would identify that as having a type error. (One that the C type system is too weak to model.) "Upper case letter" is also a type; that's why the header is called <ctype.h>. -- TXR Programming Language: http://nongnu.org/txr Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal Mastodon: @Kazinator@mstdn.ca
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2024-02-29 14:45 -0500 |
| Message-ID | <urqmt8$p3s4$1@dont-email.me> |
| In reply to | #383192 |
On 2/29/24 14:26, Kaz Kylheku wrote: > On 2024-02-29, Keith Thompson <Keith.S.Thompson+u@gmail.com> wrote: ...>> Exactly, "string" is not a type. > > It is a type in the broader sense, in that is a logical proposition > about the attributes of an object that is true or false. If I defined something to be a sequence of floating point numbers terminated by a NaN, would that thing also qualify as a type, according to the definition you're using? Could you give a source for the definition of "type" that you're using? Can you use the word "type" in a statement whose truth relies upon the difference between that definition and the way that "type" is defined by the C standard? Preferably it would be a useful statement that applies to C.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-28 13:41 -0800 |
| Message-ID | <87bk80uuzr.fsf@nosuchdomain.example.com> |
| In reply to | #383136 |
bart <bc@freeuk.com> writes:
[...]
> AFAIK strings in C can have embedded zeros when not assumed to be
> zero-terminated. So here:
>
> char s[]={1,2,3,0,4,5,6};
>
> s will have a length of 7.
s will have a *size* of 7. Its length, as a string, is 3. The
distinction between "length" and "size" is particularly important in
this case.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-28 13:57 -0800 |
| Message-ID | <877ciouua2.fsf@nosuchdomain.example.com> |
| In reply to | #383133 |
David Brown <david.brown@hesbynett.no> writes:
[...]
> They won't use strings, they will use data blobs - binary data. Then
> there is no issue with null bytes. And yes, implementations will skip
> the token generation (unless you are doing something weird, such as
> using #embed to read the parameters to a function call).
>
> Tests with prototype implementations gave extremely fast results.
I'm not sure how that would work. #embed is a preprocessor directive,
and at least in the abstract model it has to expand to valid C code.
I would have expected that it would simply generate the list of
comma-separated integer constants described in the standard; later
phases would simply parse that list and generate code as if that
sequence had been written in the original source file. Do you know of
an implementation that does something else?
For example, say you have a file "foo.dat" containing 4 bytes with
values 0, 1, 2, and 3. This would be perfectly valid:
struct foo {
unsigned char a;
unsigned short b;
unsigned int c;
double d;
};
struct foo obj = {
#embed "foo.dat"
};
#embed isn't defined to translate an input file to a sequence of bytes.
It's defined to translate an input file to a sequence of integer
constant expressions.
*Maybe* a compiler could optimize for the case where it knows that it's
being used to initialize an array of unsigned char, but (a) that would
require the preprocessor to have information that normally doesn't exist
until later phases, and (b) I'm not convinced it would be worth the
effort.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | bart <bc@freeuk.com> |
|---|---|
| Date | 2024-02-28 23:01 +0000 |
| Message-ID | <uroe02$4eoh$1@dont-email.me> |
| In reply to | #383145 |
On 28/02/2024 21:57, Keith Thompson wrote:
> David Brown <david.brown@hesbynett.no> writes:
> [...]
>> They won't use strings, they will use data blobs - binary data. Then
>> there is no issue with null bytes. And yes, implementations will skip
>> the token generation (unless you are doing something weird, such as
>> using #embed to read the parameters to a function call).
>>
>> Tests with prototype implementations gave extremely fast results.
>
> I'm not sure how that would work. #embed is a preprocessor directive,
> and at least in the abstract model it has to expand to valid C code.
>
> I would have expected that it would simply generate the list of
> comma-separated integer constants described in the standard; later
> phases would simply parse that list and generate code as if that
> sequence had been written in the original source file. Do you know of
> an implementation that does something else?
>
> For example, say you have a file "foo.dat" containing 4 bytes with
> values 0, 1, 2, and 3. This would be perfectly valid:
>
> struct foo {
> unsigned char a;
> unsigned short b;
> unsigned int c;
> double d;
> };
>
> struct foo obj = {
> #embed "foo.dat"
> };
It would be unfortunate if your example was allowed. Clearly a binary
representation of an instance of your struct would probably require 16
bytes rather than 4, of which one may be padding.
Certainly if you were to write it out to disk as binary, it would need
more than 4.
> #embed isn't defined to translate an input file to a sequence of bytes.
> It's defined to translate an input file to a sequence of integer
> constant expressions.
Maybe it should be defined exactly like that, because that is what
people might expect. You example is better off using a normal text file
which contains an actual comma-delimited list (and which can mix ints
and floats), and a regular #include.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-28 15:31 -0800 |
| Message-ID | <87y1b4tbcq.fsf@nosuchdomain.example.com> |
| In reply to | #383148 |
bart <bc@freeuk.com> writes:
> On 28/02/2024 21:57, Keith Thompson wrote:
>> David Brown <david.brown@hesbynett.no> writes:
>> [...]
>>> They won't use strings, they will use data blobs - binary data. Then
>>> there is no issue with null bytes. And yes, implementations will skip
>>> the token generation (unless you are doing something weird, such as
>>> using #embed to read the parameters to a function call).
>>>
>>> Tests with prototype implementations gave extremely fast results.
>> I'm not sure how that would work. #embed is a preprocessor
>> directive,
>> and at least in the abstract model it has to expand to valid C code.
>> I would have expected that it would simply generate the list of
>> comma-separated integer constants described in the standard; later
>> phases would simply parse that list and generate code as if that
>> sequence had been written in the original source file. Do you know of
>> an implementation that does something else?
>> For example, say you have a file "foo.dat" containing 4 bytes with
>> values 0, 1, 2, and 3. This would be perfectly valid:
>> struct foo {
>> unsigned char a;
>> unsigned short b;
>> unsigned int c;
>> double d;
>> };
>> struct foo obj = {
>> #embed "foo.dat"
>> };
>
> It would be unfortunate if your example was allowed. Clearly a binary
> representation of an instance of your struct would probably require 16
> bytes rather than 4, of which one may be padding.
Depending on the sizes and alignments of the various types, sure.
So what?
N3096 is the latest public C23 draft.
https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3096.pdf
#embed is defined in section 6.10.3.
The expansion of a #embed directive is a token sequence
formed from the list of integer constant expressions described
below. The group of tokens for each integer constant expression
in the list is separated in the token sequence from the group
of tokens for the previous integer constant expression in the
list by a comma. The sequence neither begins nor ends in a
comma. If the list of integer constant expressions is empty,
the token sequence is empty. The directive is replaced by its
expansion and, with the presence of certain embed parameters,
additional or replacement token sequences.
It's a preprocessor directive. The preprocessor operates on text and
proprocessing tokens, not on raw data. There is no way to directly
represent raw data in C source code. (I suppose string literals do so
to some extent, but they can't represent generalized raw data.)
The usage I described above is allowed. I see nothing unfortunate about
it. If you only want to use #embed with arrays of unsigned char, then
do that.
Its primary intended use is to read binary file contents at compile time
and allow a program to treat those contents as a raw representation,
particularly as the initialization for an array of unsigned char. There
was no reason to impose arbitrary restrictions to make it impossible to
use for any other purposes.
I suppose it would have been possible for #embed to expand to the raw
data itself, a binary copy of the input file. That would require C
source code, which currently is plain text, to be able to support
delimited chunks of binary data. It would require changes to portions
of the compiler after the preprocessor. Presumably you'd be able to
write the same representation directly in a C source file, which means
that C source files would no longer necessarily be representable as
text. I can see that causing all kinds of problems.
Fortunately, none of that was necessary, since the authors came up with
a way to define #embed in the preprocessor without making any other
changes to how C source code is represented. The fact that it can be
used in other odd ways doesn't bother me. The code I wrote above is
valid; I never said it was acceptable style.
> Certainly if you were to write it out to disk as binary, it would need
> more than 4.
Yes. So what?
>> #embed isn't defined to translate an input file to a sequence of bytes.
>> It's defined to translate an input file to a sequence of integer
>> constant expressions.
>
> Maybe it should be defined exactly like that, because that is what
> people might expect. You example is better off using a normal text
> file which contains an actual comma-delimited list (and which can mix
> ints and floats), and a regular #include.
I certainly wouldn't advocate writing code like the above. My point is
that, given the definition of #embed in the C23 standard, it's valid and
has well defined semantics.
If you have suggestions for alternate ways to define #embed, they might
be interesting, but it's too late to change the existing specification.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | bart <bc@freeuk.com> |
|---|---|
| Date | 2024-02-29 00:47 +0000 |
| Message-ID | <urok6t$5lv4$1@dont-email.me> |
| In reply to | #383149 |
On 28/02/2024 23:31, Keith Thompson wrote: > bart <bc@freeuk.com> writes: >> It would be unfortunate if your example was allowed. Clearly a binary >> representation of an instance of your struct would probably require 16 >> bytes rather than 4, of which one may be padding. > > Depending on the sizes and alignments of the various types, sure. > So what? > > If you have suggestions for alternate ways to define #embed, they might > be interesting, but it's too late to change the existing specification. > My early comments on this were about compiler performance. I suggested there might be a way to turn 100,000 byte values in a file, directly into a 100KB string or data block, without needing to first convert 100,000 values into 100,000 integer expressions representated as tokens, and to then parse those 100,000 expressions into AST nodes etc. DB suggested something like that was actually done. But you can't do that if those 100,000 numbers represent from 100KB to 800KB of memory depending on the data type of the strucure they're initialising. They might even be mixed type. Or it might be an example like this: A binary file contains 8 bytes representing one IEEE754 float value. It is desired to use that to initialise a double array of one element. However #embed will that into 8 integer values of 0 to 255 each (I assume). It's not clear either what happens when one of the integers has the value 150, say, but it is used to initialise an element of type (signed) char. It sounds like it would make it hard to inialise a char[] array, when char is signed, from a file of UTF8 text. Basically, #embed is dumb. For flexibility, I wouldn't use #embed at all. Just have an actual comma-separated set of values in a text file, and use #include instead.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-28 17:12 -0800 |
| Message-ID | <87ttlst6os.fsf@nosuchdomain.example.com> |
| In reply to | #383153 |
bart <bc@freeuk.com> writes:
> On 28/02/2024 23:31, Keith Thompson wrote:
>> bart <bc@freeuk.com> writes:
>>> It would be unfortunate if your example was allowed. Clearly a binary
>>> representation of an instance of your struct would probably require 16
>>> bytes rather than 4, of which one may be padding.
>> Depending on the sizes and alignments of the various types, sure.
>> So what?
>
>> If you have suggestions for alternate ways to define #embed, they might
>> be interesting, but it's too late to change the existing specification.
>
> My early comments on this were about compiler performance. I suggested
> there might be a way to turn 100,000 byte values in a file, directly
> into a 100KB string or data block, without needing to first convert
> 100,000 values into 100,000 integer expressions representated as
> tokens, and to then parse those 100,000 expressions into AST nodes
> etc.
I suggest that (a) parsing thoser 100,000 byte values isn't likely to be
a huge deal (if you have actual performance figures that contradict
that, feel free to present them), and (b) any solution that doesn't
involve expanding to C source code would require a lot more work to
implement for very little benefit.
> DB suggested something like that was actually done. But you can't do
> that if those 100,000 numbers represent from 100KB to 800KB of memory
> depending on the data type of the strucure they're initialising.
Neither gcc nor clang implements #embed yet. DB mentioned prototype
implementations. I've asked him for more information.
> They might even be mixed type. Or it might be an example like this:
>
> A binary file contains 8 byes representing one IEEE754 float
> value. It is desired to use that to initialise a double array of one
> element.
>
> However #embed will that into 8 integer values of 0 to 255 each (I assume).
Assuming CHAR_BIT==8, yes. You can use it to initialize a union, or use
memcpy() to copy from an array of unsigned char into a double object.
(Storing double values in binary files is uncommon, but it's certainly
possible.)
> It's not clear either what happens when one of the integers has the
> value 150, say, but it is used to initialise an element of type
> (signed) char. It sounds like it would make it hard to inialise a
> char[] array, when char is signed, from a file of UTF8 text.
Say you have a binary file containing a single byte with the value 150
(when interpreted as an 8-bit unsigned char). Then
#embed "file.dat"
will expand to something like
150
or
0x96
So if you write:
char array[] = {
#embed file.dat
};
then it's treated exactly the same as
char array[] = { 150 };
If plain char is signed, then the result of the conversion is
implementation-defined, but is very very likely to result in a value of
-106.
I expect that 99% of the uses of #embed will be to initialize arrays of
unsigned char (or uint8_t). For that purpose, it should work just fine.
> Basically, #embed is dumb.
Do you object to the fact that the authors didn't add additional
arbitrary restrictions to forbid uses that you don't like?
> For flexibility, I wouldn't use #embed at all. Just have an actual
> comma-separated set of values in a text file, and use #include
> instead.
And you can still do that.
If you have a png image file and you want to include its contents in
your C program, you can use a separate program to translate the file to
C source and #include the result, or you can use `#embed "foo.png"`.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | tTh <tth@none.invalid> |
|---|---|
| Date | 2024-02-29 16:29 +0100 |
| Message-ID | <urq7sk$2ntv$1@news.gegeweb.eu> |
| In reply to | #383153 |
On 2/29/24 01:47, bart wrote:
> My early comments on this were about compiler performance. I suggested
> there might be a way to turn 100,000 byte values in a file, directly
> into a 100KB string or data block, without needing to first convert
> 100,000 values into 100,000 integer expressions representated as tokens,
> and to then parse those 100,000 expressions into AST nodes etc.
But you HAVE to do that il #embed is in the preprocessor,
because his job is to give compilable text to the real
compiler. No other way is possible.
> Basically, #embed is dumb.
No.
--
+---------------------------------------------------------------------+
| https://tube.interhacker.space/a/tth/video-channels |
+---------------------------------------------------------------------+
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-02-29 16:15 +0000 |
| Message-ID | <T42EN.21986$hN14.13629@fx17.iad> |
| In reply to | #383167 |
tTh <tth@none.invalid> writes: >On 2/29/24 01:47, bart wrote: > >> My early comments on this were about compiler performance. I suggested >> there might be a way to turn 100,000 byte values in a file, directly >> into a 100KB string or data block, without needing to first convert >> 100,000 values into 100,000 integer expressions representated as tokens, >> and to then parse those 100,000 expressions into AST nodes etc. > > But you HAVE to do that il #embed is in the preprocessor, > because his job is to give compilable text to the real > compiler. No other way is possible. The standard does not require the preprocessor to be separate from a 'real' compiler. It's acceptable for an implementation to implement both in a single executable. Absent -E, the preprocessor and compiler can cooperate to efficiently handle #embed without generating parseable C code.
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-02-29 15:53 +0000 |
| Message-ID | <AL1EN.511040$xHn7.220429@fx14.iad> |
| In reply to | #383153 |
bart <bc@freeuk.com> writes: >On 28/02/2024 23:31, Keith Thompson wrote: >> bart <bc@freeuk.com> writes: > >>> It would be unfortunate if your example was allowed. Clearly a binary >>> representation of an instance of your struct would probably require 16 >>> bytes rather than 4, of which one may be padding. >> >> Depending on the sizes and alignments of the various types, sure. >> So what? >> > >> If you have suggestions for alternate ways to define #embed, they might >> be interesting, but it's too late to change the existing specification. >> > >My early comments on this were about compiler performance. I suggested >there might be a way to turn 100,000 byte values in a file, directly >into a 100KB string or data block, without needing to first convert >100,000 values into 100,000 integer expressions representated as tokens, >and to then parse those 100,000 expressions into AST nodes etc. > >DB suggested something like that was actually done. But you can't do >that if those 100,000 numbers represent from 100KB to 800KB of memory >depending on the data type of the strucure they're initialising. An implementation is free to simply pass a variant (or the directive itself) of #embed from the pre-processor to the compiler if the programmer isn't using -E, and the compiler could simply copy the embedded file into the object file directly, without processing it as a series of integer values. Much like the #file and #line directives passed by the pre-processor to the compiler.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2024-02-29 09:06 -0800 |
| Message-ID | <875xy7td2o.fsf@nosuchdomain.example.com> |
| In reply to | #383171 |
scott@slp53.sl.home (Scott Lurndal) writes:
> bart <bc@freeuk.com> writes:
>>On 28/02/2024 23:31, Keith Thompson wrote:
>>> bart <bc@freeuk.com> writes:
>>
>>>> It would be unfortunate if your example was allowed. Clearly a binary
>>>> representation of an instance of your struct would probably require 16
>>>> bytes rather than 4, of which one may be padding.
>>>
>>> Depending on the sizes and alignments of the various types, sure.
>>> So what?
>>>
>>
>>> If you have suggestions for alternate ways to define #embed, they might
>>> be interesting, but it's too late to change the existing specification.
>>>
>>
>>My early comments on this were about compiler performance. I suggested
>>there might be a way to turn 100,000 byte values in a file, directly
>>into a 100KB string or data block, without needing to first convert
>>100,000 values into 100,000 integer expressions representated as tokens,
>>and to then parse those 100,000 expressions into AST nodes etc.
>>
>>DB suggested something like that was actually done. But you can't do
>>that if those 100,000 numbers represent from 100KB to 800KB of memory
>>depending on the data type of the strucure they're initialising.
>
> An implementation is free to simply pass a variant (or the directive
> itself) of #embed from the pre-processor to the compiler if the programmer
> isn't using -E, and the compiler could simply copy the embedded file
> into the object file directly, without processing it as a series of
> integer values. Much like the #file and #line directives passed by
> the pre-processor to the compiler.
Sure, an implementation has to operate *as if* it implemented the 8
translation phases separately. But given a structure initialized with
#embed, it would have to generate additional code to initialize the
structure members from the bytes of the binary blob.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Medtronic
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-02-29 17:28 +0000 |
| Message-ID | <593EN.22560$zF_1.19152@fx18.iad> |
| In reply to | #383180 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >scott@slp53.sl.home (Scott Lurndal) writes: >> bart <bc@freeuk.com> writes: >>>On 28/02/2024 23:31, Keith Thompson wrote: >>>> bart <bc@freeuk.com> writes: >>> >>>>> It would be unfortunate if your example was allowed. Clearly a binary >>>>> representation of an instance of your struct would probably require 16 >>>>> bytes rather than 4, of which one may be padding. >>>> >>>> Depending on the sizes and alignments of the various types, sure. >>>> So what? >>>> >>> >>>> If you have suggestions for alternate ways to define #embed, they might >>>> be interesting, but it's too late to change the existing specification. >>>> >>> >>>My early comments on this were about compiler performance. I suggested >>>there might be a way to turn 100,000 byte values in a file, directly >>>into a 100KB string or data block, without needing to first convert >>>100,000 values into 100,000 integer expressions representated as tokens, >>>and to then parse those 100,000 expressions into AST nodes etc. >>> >>>DB suggested something like that was actually done. But you can't do >>>that if those 100,000 numbers represent from 100KB to 800KB of memory >>>depending on the data type of the strucure they're initialising. >> >> An implementation is free to simply pass a variant (or the directive >> itself) of #embed from the pre-processor to the compiler if the programmer >> isn't using -E, and the compiler could simply copy the embedded file >> into the object file directly, without processing it as a series of >> integer values. Much like the #file and #line directives passed by >> the pre-processor to the compiler. > >Sure, an implementation has to operate *as if* it implemented the 8 >translation phases separately. But given a structure initialized with >#embed, it would have to generate additional code to initialize the >structure members from the bytes of the binary blob. Would it? Or could it simply assume that the binary blob is already in the same binary format that writing an instance of the structure from a C application on the same host would have created?
[toc] | [prev] | [next] | [standalone]
| From | David Brown <david.brown@hesbynett.no> |
|---|---|
| Date | 2024-02-29 18:58 +0100 |
| Message-ID | <urqgkq$nt9f$1@dont-email.me> |
| In reply to | #383185 |
On 29/02/2024 18:28, Scott Lurndal wrote: > Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >> scott@slp53.sl.home (Scott Lurndal) writes: >>> bart <bc@freeuk.com> writes: >>>> On 28/02/2024 23:31, Keith Thompson wrote: >>>>> bart <bc@freeuk.com> writes: >>>> >>>>>> It would be unfortunate if your example was allowed. Clearly a binary >>>>>> representation of an instance of your struct would probably require 16 >>>>>> bytes rather than 4, of which one may be padding. >>>>> >>>>> Depending on the sizes and alignments of the various types, sure. >>>>> So what? >>>>> >>>> >>>>> If you have suggestions for alternate ways to define #embed, they might >>>>> be interesting, but it's too late to change the existing specification. >>>>> >>>> >>>> My early comments on this were about compiler performance. I suggested >>>> there might be a way to turn 100,000 byte values in a file, directly >>>> into a 100KB string or data block, without needing to first convert >>>> 100,000 values into 100,000 integer expressions representated as tokens, >>>> and to then parse those 100,000 expressions into AST nodes etc. >>>> >>>> DB suggested something like that was actually done. But you can't do >>>> that if those 100,000 numbers represent from 100KB to 800KB of memory >>>> depending on the data type of the strucure they're initialising. >>> >>> An implementation is free to simply pass a variant (or the directive >>> itself) of #embed from the pre-processor to the compiler if the programmer >>> isn't using -E, and the compiler could simply copy the embedded file >>> into the object file directly, without processing it as a series of >>> integer values. Much like the #file and #line directives passed by >>> the pre-processor to the compiler. >> >> Sure, an implementation has to operate *as if* it implemented the 8 >> translation phases separately. But given a structure initialized with >> #embed, it would have to generate additional code to initialize the >> structure members from the bytes of the binary blob. > > Would it? Or could it simply assume that the binary blob > is already in the same binary format that writing an instance > of the structure from a C application on the same host would have created? That would depend on the sizes of the fields in the struct, and the size of the integer constants in the #embed. The norm for #embed will be unsigned char integer constants, so it will only be a direct fit for the binary representation of the struct if all the struct fields are compatible with that. But a compiler could have vendor parameters on the #embed to change those sizes.
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-02-29 18:05 +0000 |
| Message-ID | <2I3EN.637365$p%Mb.40952@fx15.iad> |
| In reply to | #383187 |
David Brown <david.brown@hesbynett.no> writes: >On 29/02/2024 18:28, Scott Lurndal wrote: >> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >>> scott@slp53.sl.home (Scott Lurndal) writes: >>>> bart <bc@freeuk.com> writes: >>>>> On 28/02/2024 23:31, Keith Thompson wrote: >>>>>> bart <bc@freeuk.com> writes: >>>>> >>>>>>> It would be unfortunate if your example was allowed. Clearly a binary >>>>>>> representation of an instance of your struct would probably require 16 >>>>>>> bytes rather than 4, of which one may be padding. >>>>>> >>>>>> Depending on the sizes and alignments of the various types, sure. >>>>>> So what? >>>>>> >>>>> >>>>>> If you have suggestions for alternate ways to define #embed, they might >>>>>> be interesting, but it's too late to change the existing specification. >>>>>> >>>>> >>>>> My early comments on this were about compiler performance. I suggested >>>>> there might be a way to turn 100,000 byte values in a file, directly >>>>> into a 100KB string or data block, without needing to first convert >>>>> 100,000 values into 100,000 integer expressions representated as tokens, >>>>> and to then parse those 100,000 expressions into AST nodes etc. >>>>> >>>>> DB suggested something like that was actually done. But you can't do >>>>> that if those 100,000 numbers represent from 100KB to 800KB of memory >>>>> depending on the data type of the strucure they're initialising. >>>> >>>> An implementation is free to simply pass a variant (or the directive >>>> itself) of #embed from the pre-processor to the compiler if the programmer >>>> isn't using -E, and the compiler could simply copy the embedded file >>>> into the object file directly, without processing it as a series of >>>> integer values. Much like the #file and #line directives passed by >>>> the pre-processor to the compiler. >>> >>> Sure, an implementation has to operate *as if* it implemented the 8 >>> translation phases separately. But given a structure initialized with >>> #embed, it would have to generate additional code to initialize the >>> structure members from the bytes of the binary blob. >> >> Would it? Or could it simply assume that the binary blob >> is already in the same binary format that writing an instance >> of the structure from a C application on the same host would have created? > >That would depend on the sizes of the fields in the struct, and the size >of the integer constants in the #embed. I'm embedding a binary file. I want the representation in memory to be _exactly_ the same as in the file, regardless of how it is defined in the C code (array of char, array of int, array of long, struct whatever).
[toc] | [prev] | [next] | [standalone]
| From | scott@slp53.sl.home (Scott Lurndal) |
|---|---|
| Date | 2024-02-29 18:09 +0000 |
| Message-ID | <QL3EN.637366$p%Mb.580728@fx15.iad> |
| In reply to | #383188 |
scott@slp53.sl.home (Scott Lurndal) writes: >David Brown <david.brown@hesbynett.no> writes: >>On 29/02/2024 18:28, Scott Lurndal wrote: >>> Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: >>>> scott@slp53.sl.home (Scott Lurndal) writes: >>>>> An implementation is free to simply pass a variant (or the directive >>>>> itself) of #embed from the pre-processor to the compiler if the programmer >>>>> isn't using -E, and the compiler could simply copy the embedded file >>>>> into the object file directly, without processing it as a series of >>>>> integer values. Much like the #file and #line directives passed by >>>>> the pre-processor to the compiler. >>>> >>>> Sure, an implementation has to operate *as if* it implemented the 8 >>>> translation phases separately. But given a structure initialized with >>>> #embed, it would have to generate additional code to initialize the >>>> structure members from the bytes of the binary blob. >>> >>> Would it? Or could it simply assume that the binary blob >>> is already in the same binary format that writing an instance >>> of the structure from a C application on the same host would have created? >> >>That would depend on the sizes of the fields in the struct, and the size >>of the integer constants in the #embed. > >I'm embedding a binary file. I want the representation in memory >to be _exactly_ the same as in the file, regardless of how it is >defined in the C code (array of char, array of int, array of long, struct whatever). > I have an actual use case today where #embed of a (C++) std::map binary object created by separate tool would be very useful. I'm planning on using mmap to load it at runtime at the moment.
[toc] | [prev] | [next] | [standalone]
| From | Lawrence D'Oliveiro <ldo@nz.invalid> |
|---|---|
| Date | 2024-02-29 21:27 +0000 |
| Message-ID | <urqsso$q9h9$3@dont-email.me> |
| In reply to | #383190 |
On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote: > I have an actual use case today where #embed of a (C++) std::map binary > object created by separate tool would be very useful. I'm planning on > using mmap to load it at runtime at the moment. Why not convert it to a .o file and statically link it into your program as part of the build process?
[toc] | [prev] | [next] | [standalone]
| From | bart <bc@freeuk.com> |
|---|---|
| Date | 2024-03-01 11:52 +0000 |
| Message-ID | <ursfhg$17dak$1@dont-email.me> |
| In reply to | #383198 |
On 29/02/2024 21:27, Lawrence D'Oliveiro wrote: > On Thu, 29 Feb 2024 18:09:52 GMT, Scott Lurndal wrote: > >> I have an actual use case today where #embed of a (C++) std::map binary >> object created by separate tool would be very useful. I'm planning on >> using mmap to load it at runtime at the moment. > > Why not convert it to a .o file and statically link it into your program > as part of the build process? That's exactly what #embed will enable.
[toc] | [prev] | [next] | [standalone]
Page 5 of 6 — ← Prev page 1 2 3 4 [5] 6 Next page →
Back to top | Article view | comp.lang.c
csiph-web