Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.std.c > #6197 > unrolled thread
| Started by | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| First post | 2021-01-25 10:15 -0500 |
| Last post | 2022-01-17 05:29 -0800 |
| Articles | 20 on this page of 23 — 6 participants |
Back to article view | Back to comp.std.c
Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-25 10:15 -0500
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 12:22 +0000
Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-26 13:48 +0100
Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-01-26 13:05 -0800
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:40 +0000
Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-28 09:53 +0100
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-28 05:45 -0500
Re: Adjacent string literals Richard Damon <Richard@Damon-Family.org> - 2021-01-26 07:52 -0500
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 09:29 -0500
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:46 +0000
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 18:28 -0500
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 01:16 +0000
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 22:48 -0500
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 15:46 +0000
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-27 11:20 -0500
Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-28 03:05 +0000
Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:49 -0700
Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-07-10 14:58 -0700
Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 10:29 -0700
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-11 11:41 -0700
Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 15:26 -0700
Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-22 17:29 -0700
Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-17 05:29 -0800
Page 1 of 2 [1] 2 Next page →
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-25 10:15 -0500 |
| Subject | Adjacent string literals |
| Message-ID | <rumnae$4mr$1@dont-email.me> |
I learned a couple of decades ago that adjacent string literals get concatenated into a single longer literal, even if separated by arbitrarily large amounts of white-space. Yesterday I happened to notice that translation phase 6 says only that "Adjacent string literal tokens are concatenated.", without saying anything about white-space. White-space doesn't lose it's significance until translation phase 7. Therefore, string literals that are separated by white-space do not qualify as adjacent. There's also no mention of white-space in the fuller discussion that occurs in 6.4.5p5. Am I missing something obvious here? I can imagine someone telling me that "adjacent" should be understood as "adjacent, ignoring white-space" - but that doesn't seem obvious to me. It also sounds vaguely familiar, like I've had this discussion with someone before, but I can't locate the discussion. Every example of adjacent string literals that appears in the standard has at least one white-space character separating them, so the intent is crystal-clear, but the wording doesn't clearly say so. If the phrase "White-space characters separating tokens are no longer significant." were moved from the beginning of the description of phase 7 to the beginning of the description phase 6, it would make the insignificance of white space separating string literals perfectly clear, and as far as I can see, would have no other effect
[toc] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-26 12:22 +0000 |
| Message-ID | <874kj3x4yr.fsf@bsb.me.uk> |
| In reply to | #6197 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > I learned a couple of decades ago that adjacent string literals get > concatenated into a single longer literal, even if separated by > arbitrarily large amounts of white-space. > > Yesterday I happened to notice that translation phase 6 says only that > "Adjacent string literal tokens are concatenated.", without saying > anything about white-space. White-space doesn't lose it's significance > until translation phase 7. Therefore, string literals that are separated > by white-space do not qualify as adjacent. There's also no mention of > white-space in the fuller discussion that occurs in 6.4.5p5. > > Am I missing something obvious here? I can imagine someone telling me > that "adjacent" should be understood as "adjacent, ignoring white-space" > - but that doesn't seem obvious to me. Surely it just means "next to", and in the sequence of tokens "a" "b" the two are next to each other. It happens that string literal tokens are such that they can be adjacent without having any white-space between then, but I suspect that's making you over-think the meaning. Would you say that 'long int x' has no tokens adjacent to any others? -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Jakob Bohm <jb-usenet@wisemo.com.invalid> |
|---|---|
| Date | 2021-01-26 13:48 +0100 |
| Message-ID | <6tCdnX0audEHko39nZ2dnUU78VHNnZ2d@giganews.com> |
| In reply to | #6198 |
On 2021-01-26 13:22, Ben Bacarisse wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: > >> I learned a couple of decades ago that adjacent string literals get >> concatenated into a single longer literal, even if separated by >> arbitrarily large amounts of white-space. >> >> Yesterday I happened to notice that translation phase 6 says only that >> "Adjacent string literal tokens are concatenated.", without saying >> anything about white-space. White-space doesn't lose it's significance >> until translation phase 7. Therefore, string literals that are separated >> by white-space do not qualify as adjacent. There's also no mention of >> white-space in the fuller discussion that occurs in 6.4.5p5. >> >> Am I missing something obvious here? I can imagine someone telling me >> that "adjacent" should be understood as "adjacent, ignoring white-space" >> - but that doesn't seem obvious to me. > > Surely it just means "next to", and in the sequence of tokens "a" "b" > the two are next to each other. It happens that string literal tokens > are such that they can be adjacent without having any white-space > between then, but I suspect that's making you over-think the meaning. > Would you say that 'long int x' has no tokens adjacent to any others? > The interesting situation is cases like these: "a" /* Long comment explaining why b is the next byte */ "b" And #define LEAD_BYTE "a" #define TRAIL_BYTE "b" LEAD_BYTE TRAIL_BYTE Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2021-01-26 13:05 -0800 |
| Message-ID | <87o8hb76jb.fsf@nosuchdomain.example.com> |
| In reply to | #6199 |
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
> On 2021-01-26 13:22, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space. White-space doesn't lose it's significance
>>> until translation phase 7. Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent. There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here? I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.
>>
>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>> the two are next to each other. It happens that string literal tokens
>> are such that they can be adjacent without having any white-space
>> between then, but I suspect that's making you over-think the meaning.
>> Would you say that 'long int x' has no tokens adjacent to any others?
>
> The interesting situation is cases like these:
>
> "a" /* Long comment explaining why b is the next byte */ "b"
>
> And
>
> #define LEAD_BYTE "a"
> #define TRAIL_BYTE "b"
>
> LEAD_BYTE TRAIL_BYTE
Sorry, but those cases aren't particularly interesting. Comments are
replaced by spaces in translation phase 3, and macros are expanded in
phase 4. Adjacent string literals are concatenated in phase 6.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-26 21:40 +0000 |
| Message-ID | <87bldbv0l7.fsf@bsb.me.uk> |
| In reply to | #6199 |
Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > On 2021-01-26 13:22, Ben Bacarisse wrote: >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >> >>> I learned a couple of decades ago that adjacent string literals get >>> concatenated into a single longer literal, even if separated by >>> arbitrarily large amounts of white-space. >>> >>> Yesterday I happened to notice that translation phase 6 says only that >>> "Adjacent string literal tokens are concatenated.", without saying >>> anything about white-space. White-space doesn't lose it's significance >>> until translation phase 7. Therefore, string literals that are separated >>> by white-space do not qualify as adjacent. There's also no mention of >>> white-space in the fuller discussion that occurs in 6.4.5p5. >>> >>> Am I missing something obvious here? I can imagine someone telling me >>> that "adjacent" should be understood as "adjacent, ignoring white-space" >>> - but that doesn't seem obvious to me. >> >> Surely it just means "next to", and in the sequence of tokens "a" "b" >> the two are next to each other. It happens that string literal tokens >> are such that they can be adjacent without having any white-space >> between then, but I suspect that's making you over-think the meaning. >> Would you say that 'long int x' has no tokens adjacent to any others? >> > > The interesting situation is cases like these: > > "a" /* Long comment explaining why b is the next byte */ "b" By translation phase 6 (when adjacent string literals are concatenated) this has become "a" "b" > And > > #define LEAD_BYTE "a" > #define TRAIL_BYTE "b" > > LEAD_BYTE TRAIL_BYTE And this has become "a" "b" Am I missing some ambiguity? -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Jakob Bohm <jb-usenet@wisemo.com.invalid> |
|---|---|
| Date | 2021-01-28 09:53 +0100 |
| Message-ID | <v8KdnRuz9ssT5o_9nZ2dnUU78UOdnZ2d@giganews.com> |
| In reply to | #6203 |
On 2021-01-26 22:40, Ben Bacarisse wrote: > Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: > >> On 2021-01-26 13:22, Ben Bacarisse wrote: >>> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >>> >>>> I learned a couple of decades ago that adjacent string literals get >>>> concatenated into a single longer literal, even if separated by >>>> arbitrarily large amounts of white-space. >>>> >>>> Yesterday I happened to notice that translation phase 6 says only that >>>> "Adjacent string literal tokens are concatenated.", without saying >>>> anything about white-space. White-space doesn't lose it's significance >>>> until translation phase 7. Therefore, string literals that are separated >>>> by white-space do not qualify as adjacent. There's also no mention of >>>> white-space in the fuller discussion that occurs in 6.4.5p5. >>>> >>>> Am I missing something obvious here? I can imagine someone telling me >>>> that "adjacent" should be understood as "adjacent, ignoring white-space" >>>> - but that doesn't seem obvious to me. >>> >>> Surely it just means "next to", and in the sequence of tokens "a" "b" >>> the two are next to each other. It happens that string literal tokens >>> are such that they can be adjacent without having any white-space >>> between then, but I suspect that's making you over-think the meaning. >>> Would you say that 'long int x' has no tokens adjacent to any others? >>> >> >> The interesting situation is cases like these: >> >> "a" /* Long comment explaining why b is the next byte */ "b" > > By translation phase 6 (when adjacent string literals are concatenated) > this has become > > "a" "b" > >> And >> >> #define LEAD_BYTE "a" >> #define TRAIL_BYTE "b" >> >> LEAD_BYTE TRAIL_BYTE > > And this has become > > "a" "b" > > Am I missing some ambiguity? > Sorry, but I couldn't easily find the definition of the translation phases, only scattered mentions of "phase 6" and "phase 7", so I had to guess which practically related language features were buried in that distinction. Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-28 05:45 -0500 |
| Message-ID | <ruu4kr$jsa$1@dont-email.me> |
| In reply to | #6211 |
On 1/28/21 3:53 AM, Jakob Bohm wrote: > On 2021-01-26 22:40, Ben Bacarisse wrote: >> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes: >> >>> On 2021-01-26 13:22, Ben Bacarisse wrote: >>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >>>> >>>>> I learned a couple of decades ago that adjacent string literals get >>>>> concatenated into a single longer literal, even if separated by >>>>> arbitrarily large amounts of white-space. >>>>> >>>>> Yesterday I happened to notice that translation phase 6 says only that >>>>> "Adjacent string literal tokens are concatenated.", without saying >>>>> anything about white-space. White-space doesn't lose it's significance >>>>> until translation phase 7. Therefore, string literals that are separated >>>>> by white-space do not qualify as adjacent. There's also no mention of >>>>> white-space in the fuller discussion that occurs in 6.4.5p5. >>>>> >>>>> Am I missing something obvious here? I can imagine someone telling me >>>>> that "adjacent" should be understood as "adjacent, ignoring white-space" >>>>> - but that doesn't seem obvious to me. >>>> >>>> Surely it just means "next to", and in the sequence of tokens "a" "b" >>>> the two are next to each other. It happens that string literal tokens >>>> are such that they can be adjacent without having any white-space >>>> between then, but I suspect that's making you over-think the meaning. >>>> Would you say that 'long int x' has no tokens adjacent to any others? >>>> >>> >>> The interesting situation is cases like these: >>> >>> "a" /* Long comment explaining why b is the next byte */ "b" >> >> By translation phase 6 (when adjacent string literals are concatenated) >> this has become >> >> "a" "b" >> >>> And >>> >>> #define LEAD_BYTE "a" >>> #define TRAIL_BYTE "b" >>> >>> LEAD_BYTE TRAIL_BYTE >> >> And this has become >> >> "a" "b" >> >> Am I missing some ambiguity? >> > > Sorry, but I couldn't easily find the definition of the translation > phases, only scattered mentions of "phase 6" and "phase 7", so I had to > guess which practically related language features were buried in that > distinction. "5.1.1.2 Translation Phases The precedence among the syntax rules of translation is specified by the following phases. 6) 1. Physical source file multibyte characters are mapped, in an implementation- defined manner, to the source character set (introducing new-line characters for end-of-line indicators) if necessary. Trigraph sequences are replaced by corresponding single-character internal representations. 2. Each instance of a backslash character (\) immediately followed by a new-line character is deleted, splicing physical source lines to form logical source lines. Only the last backslash on any physical source line shall be eligible for being part of such a splice. A source file that is not empty shall end in a new-line character, which shall not be immediately preceded by a backslash character before any such splicing takes place. 3. The source file is decomposed into preprocessing tokens 7) and sequences of white-space characters (including comments). A source file shall not end in a partial preprocessing token or in a partial comment. Each comment is replaced by one space character. New-line characters are retained. Whether each nonempty sequence of white-space characters other than new-line is retained or replaced by one space character is implementation-defined. 4. Preprocessing directives are executed, macro invocations are expanded, and _Pragma unary operator expressions are executed. If a character sequence that matches the syntax of a universal character name is produced by token concatenation (6.10.3.3), the behavior is undefined. A #include preprocessing directive causes the named header or source file to be processed from phase 1 through phase 4, recursively. All preprocessing directives are then deleted. 5. Each source character set member and escape sequence in character constants and string literals is converted to the corresponding member of the execution character set; if there is no corresponding member, it is converted to an implementation-defined member other than the null (wide) character. 8) 6. Adjacent string literal tokens are concatenated. 7. White-space characters separating tokens are no longer significant. Each preprocessing token is converted into a token. The resulting tokens are syntactically and semantically analyzed and translated as a translation unit. 8. All external object and function references are resolved. Library components are linked to satisfy external references to functions and objects not defined in the current translation. All such translator output is collected into a program image which contains information needed for execution in its execution environment." The referenced footnotes are: "6) Implementations shall behave as if these separate phases occur, even though many are typically folded together in practice. Source files, translation units, and translated translation units need not necessarily be stored as files, nor need there be any one-to-one correspondence between these entities and any external representation. The description is conceptual only, and does not specify any particular implementation. 7) As described in 6.4, the process of dividing a source file’s characters into preprocessing tokens is context-dependent. For example, see the handling of < within a #include preprocessing directive. 8) An implementation need not convert all non-corresponding source characters to the same execution character."
[toc] | [prev] | [next] | [standalone]
| From | Richard Damon <Richard@Damon-Family.org> |
|---|---|
| Date | 2021-01-26 07:52 -0500 |
| Message-ID | <vgUPH.2028$NRKd.497@fx17.iad> |
| In reply to | #6198 |
On 1/26/21 7:22 AM, Ben Bacarisse wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: > >> I learned a couple of decades ago that adjacent string literals get >> concatenated into a single longer literal, even if separated by >> arbitrarily large amounts of white-space. >> >> Yesterday I happened to notice that translation phase 6 says only that >> "Adjacent string literal tokens are concatenated.", without saying >> anything about white-space. White-space doesn't lose it's significance >> until translation phase 7. Therefore, string literals that are separated >> by white-space do not qualify as adjacent. There's also no mention of >> white-space in the fuller discussion that occurs in 6.4.5p5. >> >> Am I missing something obvious here? I can imagine someone telling me >> that "adjacent" should be understood as "adjacent, ignoring white-space" >> - but that doesn't seem obvious to me. > > Surely it just means "next to", and in the sequence of tokens "a" "b" > the two are next to each other. It happens that string literal tokens > are such that they can be adjacent without having any white-space > between then, but I suspect that's making you over-think the meaning. > Would you say that 'long int x' has no tokens adjacent to any others? > I'm not sure, but 6.4p3 it says As described in 6.10, in certain circumstances during translation phase 4, white space (or the absence thereof) serves as more than preprocessing token separation. which seems to imply that for most purposes (unless expressly stated) white-space between tokens is generally insignificant. There are cases where it matters, like the difference between #define macro(x) (x) and #define macro (x) (x) but these cases explicitly talk about the white-space affecting the meaning. This would seem to at least imply that it is to be ignored elsewhere, and thus the white-space between literals doesn't mean they aren't adjacent. It would seem that the removal of the possible significance could have been moved up earlier (but has to be after phase 4 since that has an explicit use of white-space), as far as I can see, phases 5 and 6 don't need the white-space significance, but maybe the fact that phase 7 also converts processor tokens into token says that we want to handle all the string literal stuff before doing that.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-26 09:29 -0500 |
| Message-ID | <rup90c$m3d$1@dont-email.me> |
| In reply to | #6198 |
On 1/26/21 7:22 AM, Ben Bacarisse wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: > >> I learned a couple of decades ago that adjacent string literals get >> concatenated into a single longer literal, even if separated by >> arbitrarily large amounts of white-space. >> >> Yesterday I happened to notice that translation phase 6 says only that >> "Adjacent string literal tokens are concatenated.", without saying >> anything about white-space. White-space doesn't lose it's significance >> until translation phase 7. Therefore, string literals that are separated >> by white-space do not qualify as adjacent. There's also no mention of >> white-space in the fuller discussion that occurs in 6.4.5p5. >> >> Am I missing something obvious here? I can imagine someone telling me >> that "adjacent" should be understood as "adjacent, ignoring white-space" >> - but that doesn't seem obvious to me. > > Surely it just means "next to", and in the sequence of tokens "a" "b" > the two are next to each other. It happens that string literal tokens > are such that they can be adjacent without having any white-space > between then, but I suspect that's making you over-think the meaning. > Would you say that 'long int x' has no tokens adjacent to any others? No, I would not - and that's precisely because "long int x" is not parsed as a declaration until translation phase 7, and the very first sentence of the description of that phase says "White-space characters separating tokens are no longer significant.". Phase 6 occurs before that sentence applies, which is precisely my point.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-26 21:46 +0000 |
| Message-ID | <875z3jv0bg.fsf@bsb.me.uk> |
| In reply to | #6201 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > On 1/26/21 7:22 AM, Ben Bacarisse wrote: >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >> >>> I learned a couple of decades ago that adjacent string literals get >>> concatenated into a single longer literal, even if separated by >>> arbitrarily large amounts of white-space. >>> >>> Yesterday I happened to notice that translation phase 6 says only that >>> "Adjacent string literal tokens are concatenated.", without saying >>> anything about white-space. White-space doesn't lose it's significance >>> until translation phase 7. Therefore, string literals that are separated >>> by white-space do not qualify as adjacent. There's also no mention of >>> white-space in the fuller discussion that occurs in 6.4.5p5. >>> >>> Am I missing something obvious here? I can imagine someone telling me >>> that "adjacent" should be understood as "adjacent, ignoring white-space" >>> - but that doesn't seem obvious to me. >> >> Surely it just means "next to", and in the sequence of tokens "a" "b" >> the two are next to each other. It happens that string literal tokens >> are such that they can be adjacent without having any white-space >> between then, but I suspect that's making you over-think the meaning. >> Would you say that 'long int x' has no tokens adjacent to any others? > > No, I would not - and that's precisely because "long int x" is not > parsed as a declaration until translation phase 7, and the very first > sentence of the description of that phase says "White-space characters > separating tokens are no longer significant.". Phase 6 occurs before > that sentence applies, which is precisely my point. I meant at the stage you were asking about: phase 6. The example was an attempt to find out if your reluctance to see "a" "b" as being adjacent was in part due to do with the fact that they could have been written with no spaces. I think your answer makes it clear that, at phase 6, you think that there are no two tokens adjacent to one another. I find that a rather artificial reading. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-26 18:28 -0500 |
| Message-ID | <ruq8il$g95$1@dont-email.me> |
| In reply to | #6204 |
On 1/26/21 4:46 PM, Ben Bacarisse wrote: > James Kuyper <jameskuyper@alumni.caltech.edu> writes: > >> On 1/26/21 7:22 AM, Ben Bacarisse wrote: ... >> No, I would not - and that's precisely because "long int x" is not >> parsed as a declaration until translation phase 7, and the very first >> sentence of the description of that phase says "White-space characters >> separating tokens are no longer significant.". Phase 6 occurs before >> that sentence applies, which is precisely my point. > > I meant at the stage you were asking about: phase 6. The example was an > attempt to find out if your reluctance to see "a" "b" as being adjacent > was in part due to do with the fact that they could have been written > with no spaces. Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are not, because both are adjacent to some white-space instead. I'm not suggesting that the committee intended to prohibit white space between the tokens, merely that wording chosen doesn't clearly allow it. > I think your answer makes it clear that, at phase 6, you think that > there are no two tokens adjacent to one another. I find that a rather > artificial reading. If they had used the term "consecutive", I could have seen that as a reasonable interpretation. "a" is one token, and "b" is the next token, even though they are separated by something, because that something isn't a token.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-27 01:16 +0000 |
| Message-ID | <87lfcftc18.fsf@bsb.me.uk> |
| In reply to | #6205 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > On 1/26/21 4:46 PM, Ben Bacarisse wrote: >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >> >>> On 1/26/21 7:22 AM, Ben Bacarisse wrote: > ... >>> No, I would not - and that's precisely because "long int x" is not >>> parsed as a declaration until translation phase 7, and the very first >>> sentence of the description of that phase says "White-space characters >>> separating tokens are no longer significant.". Phase 6 occurs before >>> that sentence applies, which is precisely my point. >> >> I meant at the stage you were asking about: phase 6. The example was an >> attempt to find out if your reluctance to see "a" "b" as being adjacent >> was in part due to do with the fact that they could have been written >> with no spaces. > > Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are > not, because both are adjacent to some white-space instead. Adjacent does not mean with nothing in between (thought it can, of course). What's more, things can be adjacent to each other, and also adjacent to something in between. I can say that there was a fire in the house adjacent to mine. The two house are adjacent. But both are adjacent to the lane separating them. <cut> -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-26 22:48 -0500 |
| Message-ID | <ruqnqh$7ie$1@dont-email.me> |
| In reply to | #6206 |
On 1/26/21 8:16 PM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
...
>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>> not, because both are adjacent to some white-space instead.
>
> Adjacent does not mean with nothing in between (thought it can, of
> course). What's more, things can be adjacent to each other, and also
> adjacent to something in between. I can say that there was a fire in
> the house adjacent to mine. The two house are adjacent. But both are
> adjacent to the lane separating them.
It takes at least two dimensions for the issue you raise to come up. As
far as the C standard is concerned, source code is a one-dimensional
sequence of characters. It's possible to think of the text
two-dimensionally, but the standard doesn't make use of that fact in any
way that I'm aware of. I don't think anyone would suggest that two
string literals that are vertically adjacent to each other:
char first = "James";
char second = "Kuyper";
should be merged.
Even if you acknowledge only that this is one possible way of
interpreting "adjacent", that would mean the meaning is ambiguous.
Moving the first sentence of translation phase 7 to be the first
sentence of translation phase 6 would remove all ambiguity, and have, as
far as I can see, no other consequence.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-27 15:46 +0000 |
| Message-ID | <87ft2mtmae.fsf@bsb.me.uk> |
| In reply to | #6207 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > On 1/26/21 8:16 PM, Ben Bacarisse wrote: >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: > ... >>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are >>> not, because both are adjacent to some white-space instead. >> >> Adjacent does not mean with nothing in between (thought it can, of >> course). What's more, things can be adjacent to each other, and also >> adjacent to something in between. I can say that there was a fire in >> the house adjacent to mine. The two house are adjacent. But both are >> adjacent to the lane separating them. > > It takes at least two dimensions for the issue you raise to come up. I don't follow. 1 and 2 are adjacent integers on the real line (i.e. despite having other kinds of number between them). In addition, they are both integers adjacent to 1/2. > As > far as the C standard is concerned, source code is a one-dimensional > sequence of characters. It's possible to think of the text > two-dimensionally, but the standard doesn't make use of that fact in any > way that I'm aware of. I don't think anyone would suggest that two > string literals that are vertically adjacent to each other: > > char first = "James"; > char second = "Kuyper"; > > should be merged. > Even if you acknowledge only that this is one possible way of > interpreting "adjacent", that would mean the meaning is ambiguous. Lots of words in the standard could, at a pinch, be taken to mean something other than what is obviously intended. But if you think someone might read about phase 6 and think that "a""b" will be concatenated but not "a" "b", then you should file a defect report. > Moving the first sentence of translation phase 7 to be the first > sentence of translation phase 6 would remove all ambiguity, and have, as > far as I can see, no other consequence. I think the strongest case for the possibility of misunderstanding comes from this sentence being where it is. I don't see any problem with the word "adjacent", but I can imagine someone wondering why this sentence is where it is if not to do what you are suggesting. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-01-27 11:20 -0500 |
| Message-ID | <rus3ss$57c$1@dont-email.me> |
| In reply to | #6208 |
On 1/27/21 10:46 AM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>
>> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> ...
>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>>> not, because both are adjacent to some white-space instead.
>>>
>>> Adjacent does not mean with nothing in between (thought it can, of
>>> course). What's more, things can be adjacent to each other, and also
>>> adjacent to something in between. I can say that there was a fire in
>>> the house adjacent to mine. The two house are adjacent. But both are
>>> adjacent to the lane separating them.
>>
>> It takes at least two dimensions for the issue you raise to come up.
>
> I don't follow. 1 and 2 are adjacent integers on the real line
> (i.e. despite having other kinds of number between them). In addition,
> they are both integers adjacent to 1/2.
I'm not familiar with any meaning that could reasonably be attached to
"adjacent" which would make either of those statements true. In the
future, I will try to remember that there's at least one person who does
attach such a meaning to that word - but it would make it easier for me
to understand how you could say such a thing if you would specify that
definition.
When using a meaning that allows 1 and 2 to be both adjacent to 1/2,
while also being adjacent to each other, how do you interpret "adjacent
string literal" so that it doesn't apply to
ptrdiff_t d = "Ben"-"Bacarisse";
It seems to me that, despite having no idea how you could possibly mean
what you seem to have said, I can make a direct analogy, matching 1 with
"Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy
break down? Or are you claiming that they should be concatenated?
...
>> Moving the first sentence of translation phase 7 to be the first
>> sentence of translation phase 6 would remove all ambiguity, and have, as
>> far as I can see, no other consequence.
>
> I think the strongest case for the possibility of misunderstanding comes
> from this sentence being where it is. I don't see any problem with the
> word "adjacent", but I can imagine someone wondering why this sentence
> is where it is if not to do what you are suggesting.
I think you just agreed with me, but you didn't quite say so directly.
[toc] | [prev] | [next] | [standalone]
| From | Ben Bacarisse <ben.usenet@bsb.me.uk> |
|---|---|
| Date | 2021-01-28 03:05 +0000 |
| Message-ID | <87y2gdsqvf.fsf@bsb.me.uk> |
| In reply to | #6209 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > On 1/27/21 10:46 AM, Ben Bacarisse wrote: >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >> >>> On 1/26/21 8:16 PM, Ben Bacarisse wrote: >>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes: >>> ... >>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are >>>>> not, because both are adjacent to some white-space instead. >>>> >>>> Adjacent does not mean with nothing in between (thought it can, of >>>> course). What's more, things can be adjacent to each other, and also >>>> adjacent to something in between. I can say that there was a fire in >>>> the house adjacent to mine. The two house are adjacent. But both are >>>> adjacent to the lane separating them. >>> >>> It takes at least two dimensions for the issue you raise to come up. >> >> I don't follow. 1 and 2 are adjacent integers on the real line >> (i.e. despite having other kinds of number between them). In addition, >> they are both integers adjacent to 1/2. > > I'm not familiar with any meaning that could reasonably be attached to > "adjacent" which would make either of those statements true. That's and interesting view, but probably so off-topic that it would not be reasonable to investigate it here. > In the future, I will try to remember that there's at least one person > who does attach such a meaning to that word - but it would make it > easier for me to understand how you could say such a thing if you > would specify that definition. I am not a lexicographer, and not skilled at writing definitions. So I looked in the two dictionaries on the shelf here. The OED says: "Lying near to; adjoining; bordering. (Not necessarily touching.)" and Collins says "being near or close, esp. having a common boundary; adjoining; contiguous." These are pretty close to what I feel the word means. For comparison, what is your understanding of the word? > When using a meaning that allows 1 and 2 to be both adjacent to 1/2, > while also being adjacent to each other, how do you interpret "adjacent > string literal" so that it doesn't apply to > > ptrdiff_t d = "Ben"-"Bacarisse"; > > It seems to me that, despite having no idea how you could possibly mean > what you seem to have said, I can make a direct analogy, matching 1 with > "Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy > break down? Or are you claiming that they should be concatenated? It depends on what is the considered significant and what is merely a separator or common boundary. On the number line, we can stress what we want to focus on. "Adjacent /integers/" relegates everything else to being a mere separating boundary. So, to push the point to the edge of reason, if I choose to read the key sentence as "Adjacent /string literal/ tokens are concatenated", I could, at a pinch, make the case that "Ben" and "Bacarisse" are, in your example, adjacent. The context would have to be such that considering another token as a mere boundary or separator would be reasonable. The C standard is not such a context. But if I read it as "Adjacent string literal /tokens/ are concatenated", then the intervening token stops them being adjacent. When tokenising a character stream, all the tokens matter, so I believe there is only one reasonable way to read that sentence. > ... >>> Moving the first sentence of translation phase 7 to be the first >>> sentence of translation phase 6 would remove all ambiguity, and have, as >>> far as I can see, no other consequence. >> >> I think the strongest case for the possibility of misunderstanding comes >> from this sentence being where it is. I don't see any problem with the >> word "adjacent", but I can imagine someone wondering why this sentence >> is where it is if not to do what you are suggesting. > > I think you just agreed with me, but you didn't quite say so directly. Agreement is not binary. I don't find your argument based on what adjacent means to be compelling, but I agree that the presence of that sentence one phase too late muddies the waters a bit. I've tried to express the extent and the nature of my agreement (and disagreement) as directly as I can. I'm sorry if you think I have been oblique. TL;DR: The fact that adjacent means something in the cluster of ideas around "being near to" and "having a common boundary, but not necessarily touching" means that I don't think there is any problem with "a" "b" being described as adjacent string literal tokens. -- Ben.
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2021-07-10 08:49 -0700 |
| Message-ID | <86v95i88zw.fsf@linuxsc.com> |
| In reply to | #6197 |
James Kuyper <jameskuyper@alumni.caltech.edu> writes: > I learned a couple of decades ago that adjacent string literals get > concatenated into a single longer literal, even if separated by > arbitrarily large amounts of white-space. > > Yesterday I happened to notice that translation phase 6 says only that > "Adjacent string literal tokens are concatenated.", without saying > anything about white-space. White-space doesn't lose it's significance > until translation phase 7. Therefore, string literals that are separated > by white-space do not qualify as adjacent. There's also no mention of > white-space in the fuller discussion that occurs in 6.4.5p5. > > Am I missing something obvious here? I can imagine someone telling me > that "adjacent" should be understood as "adjacent, ignoring white-space" > - but that doesn't seem obvious to me. It also sounds vaguely familiar, > like I've had this discussion with someone before, but I can't locate > the discussion. Every example of adjacent string literals that appears > in the standard has at least one white-space character separating them, > so the intent is crystal-clear, but the wording doesn't clearly say so. > > If the phrase "White-space characters separating tokens are no longer > significant." were moved from the beginning of the description of phase > 7 to the beginning of the description phase 6, it would make the > insignificance of white space separating string literals perfectly > clear, and as far as I can see, would have no other effect The word "adjacent" doesn't alway mean touching. There is another word for that, the word "adjoining". Booking a hotel reservation for adjacent rooms is not the same as a reservation for adjoining rooms.
[toc] | [prev] | [next] | [standalone]
| From | Keith Thompson <Keith.S.Thompson+u@gmail.com> |
|---|---|
| Date | 2021-07-10 14:58 -0700 |
| Message-ID | <87fswl7rvg.fsf@nosuchdomain.example.com> |
| In reply to | #6264 |
Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]
>> If the phrase "White-space characters separating tokens are no longer
>> significant." were moved from the beginning of the description of phase
>> 7 to the beginning of the description phase 6, it would make the
>> insignificance of white space separating string literals perfectly
>> clear, and as far as I can see, would have no other effect
>
> The word "adjacent" doesn't alway mean touching. There is another
> word for that, the word "adjoining". Booking a hotel reservation
> for adjacent rooms is not the same as a reservation for adjoining
> rooms.
That's not entirely clear. dictionary.com (not a definitive reference
but a convenient one) shows "adjoining" as one of the definitions of
"adjacent".
If I understand you correctly, if rooms 110 and 112 share a common wall,
perhaps with a door going between them, they're both adjacent and
adjoining, but if instead they're on opposide sides of the elevator
they're adjacent but not adjoining. Is that what you meant? I'm not
sure I'd call them "adjacent" in that case.
A footnote on "Adjacent string literals are concatenated" saying that
two string literals are adjacent if they're adjoining or separated only
by white-space characters would clear this up. Moving "White-space
characters separating tokens are no longer significant." from the
beginning of phase 7 to the beginning of phase 6 would also be a good
solution.
But given the clear examples, I wouldn't object to leaving it as it is.
--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */
[toc] | [prev] | [next] | [standalone]
| From | Tim Rentsch <tr.17687@z991.linuxsc.com> |
|---|---|
| Date | 2021-07-22 10:29 -0700 |
| Message-ID | <86im125kaq.fsf@linuxsc.com> |
| In reply to | #6268 |
Keith Thompson <Keith.S.Thompson+u@gmail.com> writes: > Tim Rentsch <tr.17687@z991.linuxsc.com> writes: > >> James Kuyper <jameskuyper@alumni.caltech.edu> writes: > > [...] > >>> If the phrase "White-space characters separating tokens are no longer >>> significant." were moved from the beginning of the description of phase >>> 7 to the beginning of the description phase 6, it would make the >>> insignificance of white space separating string literals perfectly >>> clear, and as far as I can see, would have no other effect >> >> The word "adjacent" doesn't alway mean touching. There is another >> word for that, the word "adjoining". Booking a hotel reservation >> for adjacent rooms is not the same as a reservation for adjoining >> rooms. > > That's not entirely clear. dictionary.com (not a definitive reference > but a convenient one) shows "adjoining" as one of the definitions of > "adjacent". That's consistent with what I said: "adjoining" being only one of the definitions is consistent with saying "adjacent" doesn't _always_ mean touching. Words in English can be ambiguous in their meanings. > If I understand you correctly, if rooms 110 and 112 share a common wall, > perhaps with a door going between them, they're both adjacent and > adjoining, In the case of hotels I think "adjoining" always means connected, either with or perhaps without a door, but yes. > but if instead they're on opposide sides of the elevator > they're adjacent but not adjoining. Is that what you meant? I'm not > sure I'd call them "adjacent" in that case. A better example is a small utility closet rather than an elevator. "Adjacent" usually implies "closeness" even if it doesn't always mean touching, and two rooms with a bank of four elevators between them would for most people not be considered adjacent, I think. In the case of hotel rooms at least it's a matter of degree. Another example is two rooms having the same latitude and longitude, but on different (consecutive) floors. I think most people wouldn't call those rooms "adjacent". However, if there is a connecting stairway between them, a hotel might very well offer them as "adjoining rooms". > A footnote on "Adjacent string literals are concatenated" saying that > two string literals are adjacent if they're adjoining or separated only > by white-space characters would clear this up. Moving "White-space > characters separating tokens are no longer significant." from the > beginning of phase 7 to the beginning of phase 6 would also be a good > solution. > > But given the clear examples, I wouldn't object to leaving it as it is. Given that the wording lasted more than 30 years without anyone even noticing a problem, I think the case for leaving it alone is decidedly stronger than the case for making a change.
[toc] | [prev] | [next] | [standalone]
| From | James Kuyper <jameskuyper@alumni.caltech.edu> |
|---|---|
| Date | 2021-07-11 11:41 -0700 |
| Message-ID | <dab9e114-5156-4951-b464-799f231eaafen@googlegroups.com> |
| In reply to | #6264 |
On Saturday, July 10, 2021 at 11:49:09 AM UTC-4, Tim Rentsch wrote: > James Kuyper <james...@alumni.caltech.edu> writes: > > > I learned a couple of decades ago that adjacent string literals get > > concatenated into a single longer literal, even if separated by > > arbitrarily large amounts of white-space. > > > > Yesterday I happened to notice that translation phase 6 says only that > > "Adjacent string literal tokens are concatenated.", without saying > > anything about white-space. White-space doesn't lose it's significance > > until translation phase 7. Therefore, string literals that are separated > > by white-space do not qualify as adjacent. There's also no mention of > > white-space in the fuller discussion that occurs in 6.4.5p5. > > > > Am I missing something obvious here? I can imagine someone telling me > > that "adjacent" should be understood as "adjacent, ignoring white-space" > > - but that doesn't seem obvious to me. It also sounds vaguely familiar, > > like I've had this discussion with someone before, but I can't locate > > the discussion. Every example of adjacent string literals that appears > > in the standard has at least one white-space character separating them, > > so the intent is crystal-clear, but the wording doesn't clearly say so. > > > > If the phrase "White-space characters separating tokens are no longer > > significant." were moved from the beginning of the description of phase > > 7 to the beginning of the description phase 6, it would make the > > insignificance of white space separating string literals perfectly > > clear, and as far as I can see, would have no other effect > The word "adjacent" doesn't alway mean touching. There is another > word for that, the word "adjoining". Booking a hotel reservation > for adjacent rooms is not the same as a reservation for adjoining > rooms. But, if it doesn't mean "touching", what does it mean? If a blank space doesn't prevent them from being adjacent, what does? How do you draw the line between things that do prevent two string literals from being adjacent, and things that don't? And - most importantly, where in the actual text of the standard does it clearly make that distinction? I contend that it doesn't clearly make that distinction anywhere, but that moving the sentence "White-space characters separating tokens are no longer significant." From the beginning of phase 7 to the beginning of phase 6 would remove all ambiguity, making the text match the way all real world implementations actually handle this issue, and would have no other effect. Do you disagree? If so, with which part of what I just said, and for what reason?
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.std.c
csiph-web