Groups > comp.std.c > #6197 > unrolled thread

Adjacent string literals

Started by	James Kuyper <jameskuyper@alumni.caltech.edu>
First post	2021-01-25 10:15 -0500
Last post	2022-01-17 05:29 -0800
Articles	20 on this page of 23 — 6 participants

Back to article view | Back to comp.std.c

  Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-25 10:15 -0500
    Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 12:22 +0000
      Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-26 13:48 +0100
        Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-01-26 13:05 -0800
        Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:40 +0000
          Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-28 09:53 +0100
            Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-28 05:45 -0500
      Re: Adjacent string literals Richard Damon <Richard@Damon-Family.org> - 2021-01-26 07:52 -0500
      Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 09:29 -0500
        Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:46 +0000
          Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 18:28 -0500
            Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 01:16 +0000
              Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 22:48 -0500
                Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 15:46 +0000
                  Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-27 11:20 -0500
                    Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-28 03:05 +0000
    Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:49 -0700
      Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-07-10 14:58 -0700
        Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 10:29 -0700
      Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-11 11:41 -0700
        Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 15:26 -0700
          Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-22 17:29 -0700
            Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-17 05:29 -0800

Page 1 of 2 [1] 2 Next page →

#6197 — Adjacent string literals

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-25 10:15 -0500
Subject	Adjacent string literals
Message-ID	<rumnae$4mr$1@dont-email.me>

I learned a couple of decades ago that adjacent string literals get
concatenated into a single longer literal, even if separated by
arbitrarily large amounts of white-space.

Yesterday I happened to notice that translation phase 6 says only that
"Adjacent string literal tokens are concatenated.", without saying
anything about white-space. White-space doesn't lose it's significance
until translation phase 7. Therefore, string literals that are separated
by white-space do not qualify as adjacent. There's also no mention of
white-space in the fuller discussion that occurs in 6.4.5p5.

Am I missing something obvious here? I can imagine someone telling me
that "adjacent" should be understood as "adjacent, ignoring white-space"
- but that doesn't seem obvious to me. It also sounds vaguely familiar,
like I've had this discussion with someone before, but I can't locate
the discussion. Every example of adjacent string literals that appears
in the standard has at least one white-space character separating them,
so the intent is crystal-clear, but the wording doesn't clearly say so.

If the phrase "White-space characters separating tokens are no longer
significant." were moved from the beginning of the description of phase
7 to the beginning of the description phase 6, it would make the
insignificance of white space separating string literals perfectly
clear, and as far as I can see, would have no other effect

[toc] | [next] | [standalone]

#6198

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-26 12:22 +0000
Message-ID	<874kj3x4yr.fsf@bsb.me.uk>
In reply to	#6197

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> I learned a couple of decades ago that adjacent string literals get
> concatenated into a single longer literal, even if separated by
> arbitrarily large amounts of white-space.
>
> Yesterday I happened to notice that translation phase 6 says only that
> "Adjacent string literal tokens are concatenated.", without saying
> anything about white-space. White-space doesn't lose it's significance
> until translation phase 7. Therefore, string literals that are separated
> by white-space do not qualify as adjacent. There's also no mention of
> white-space in the fuller discussion that occurs in 6.4.5p5.
>
> Am I missing something obvious here? I can imagine someone telling me
> that "adjacent" should be understood as "adjacent, ignoring white-space"
> - but that doesn't seem obvious to me.

Surely it just means "next to", and in the sequence of tokens "a" "b"
the two are next to each other.  It happens that string literal tokens
are such that they can be adjacent without having any white-space
between then, but I suspect that's making you over-think the meaning.
Would you say that 'long int x' has no tokens adjacent to any others?

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6199

From	Jakob Bohm <jb-usenet@wisemo.com.invalid>
Date	2021-01-26 13:48 +0100
Message-ID	<6tCdnX0audEHko39nZ2dnUU78VHNnZ2d@giganews.com>
In reply to	#6198

On 2021-01-26 13:22, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> 
>> I learned a couple of decades ago that adjacent string literals get
>> concatenated into a single longer literal, even if separated by
>> arbitrarily large amounts of white-space.
>>
>> Yesterday I happened to notice that translation phase 6 says only that
>> "Adjacent string literal tokens are concatenated.", without saying
>> anything about white-space. White-space doesn't lose it's significance
>> until translation phase 7. Therefore, string literals that are separated
>> by white-space do not qualify as adjacent. There's also no mention of
>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>
>> Am I missing something obvious here? I can imagine someone telling me
>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>> - but that doesn't seem obvious to me.
> 
> Surely it just means "next to", and in the sequence of tokens "a" "b"
> the two are next to each other.  It happens that string literal tokens
> are such that they can be adjacent without having any white-space
> between then, but I suspect that's making you over-think the meaning.
> Would you say that 'long int x' has no tokens adjacent to any others?
> 

The interesting situation is cases like these:

"a" /* Long comment explaining why b is the next byte */ "b"

And

#define LEAD_BYTE  "a"
#define TRAIL_BYTE "b"

LEAD_BYTE TRAIL_BYTE

Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[toc] | [prev] | [next] | [standalone]

#6202

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2021-01-26 13:05 -0800
Message-ID	<87o8hb76jb.fsf@nosuchdomain.example.com>
In reply to	#6199

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
> On 2021-01-26 13:22, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space. White-space doesn't lose it's significance
>>> until translation phase 7. Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent. There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here? I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.
>>
>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>> the two are next to each other.  It happens that string literal tokens
>> are such that they can be adjacent without having any white-space
>> between then, but I suspect that's making you over-think the meaning.
>> Would you say that 'long int x' has no tokens adjacent to any others?
>
> The interesting situation is cases like these:
>
> "a" /* Long comment explaining why b is the next byte */ "b"
>
> And
>
> #define LEAD_BYTE  "a"
> #define TRAIL_BYTE "b"
>
> LEAD_BYTE TRAIL_BYTE

Sorry, but those cases aren't particularly interesting.  Comments are
replaced by spaces in translation phase 3, and macros are expanded in
phase 4.  Adjacent string literals are concatenated in phase 6.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6203

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-26 21:40 +0000
Message-ID	<87bldbv0l7.fsf@bsb.me.uk>
In reply to	#6199

Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:

> On 2021-01-26 13:22, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space. White-space doesn't lose it's significance
>>> until translation phase 7. Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent. There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here? I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.
>>
>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>> the two are next to each other.  It happens that string literal tokens
>> are such that they can be adjacent without having any white-space
>> between then, but I suspect that's making you over-think the meaning.
>> Would you say that 'long int x' has no tokens adjacent to any others?
>>
>
> The interesting situation is cases like these:
>
> "a" /* Long comment explaining why b is the next byte */ "b"

By translation phase 6 (when adjacent string literals are concatenated)
this has become

"a"   "b"

> And
>
> #define LEAD_BYTE  "a"
> #define TRAIL_BYTE "b"
>
> LEAD_BYTE TRAIL_BYTE

And this has become

"a" "b"

Am I missing some ambiguity?

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6211

From	Jakob Bohm <jb-usenet@wisemo.com.invalid>
Date	2021-01-28 09:53 +0100
Message-ID	<v8KdnRuz9ssT5o_9nZ2dnUU78UOdnZ2d@giganews.com>
In reply to	#6203

On 2021-01-26 22:40, Ben Bacarisse wrote:
> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
> 
>> On 2021-01-26 13:22, Ben Bacarisse wrote:
>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>>
>>>> I learned a couple of decades ago that adjacent string literals get
>>>> concatenated into a single longer literal, even if separated by
>>>> arbitrarily large amounts of white-space.
>>>>
>>>> Yesterday I happened to notice that translation phase 6 says only that
>>>> "Adjacent string literal tokens are concatenated.", without saying
>>>> anything about white-space. White-space doesn't lose it's significance
>>>> until translation phase 7. Therefore, string literals that are separated
>>>> by white-space do not qualify as adjacent. There's also no mention of
>>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>>
>>>> Am I missing something obvious here? I can imagine someone telling me
>>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>>> - but that doesn't seem obvious to me.
>>>
>>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>>> the two are next to each other.  It happens that string literal tokens
>>> are such that they can be adjacent without having any white-space
>>> between then, but I suspect that's making you over-think the meaning.
>>> Would you say that 'long int x' has no tokens adjacent to any others?
>>>
>>
>> The interesting situation is cases like these:
>>
>> "a" /* Long comment explaining why b is the next byte */ "b"
> 
> By translation phase 6 (when adjacent string literals are concatenated)
> this has become
> 
> "a"   "b"
> 
>> And
>>
>> #define LEAD_BYTE  "a"
>> #define TRAIL_BYTE "b"
>>
>> LEAD_BYTE TRAIL_BYTE
> 
> And this has become
> 
> "a" "b"
> 
> Am I missing some ambiguity?
> 

Sorry, but I couldn't easily find the definition of the translation
phases, only scattered mentions of "phase 6" and "phase 7", so I had to
guess which practically related language features were buried in that
distinction.



Enjoy

Jakob
-- 
Jakob Bohm, CIO, Partner, WiseMo A/S.  https://www.wisemo.com
Transformervej 29, 2860 Søborg, Denmark.  Direct +45 31 13 16 10
This public discussion message is non-binding and may contain errors.
WiseMo - Remote Service Management for PCs, Phones and Embedded

[toc] | [prev] | [next] | [standalone]

#6212

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-28 05:45 -0500
Message-ID	<ruu4kr$jsa$1@dont-email.me>
In reply to	#6211

On 1/28/21 3:53 AM, Jakob Bohm wrote:
> On 2021-01-26 22:40, Ben Bacarisse wrote:
>> Jakob Bohm <jb-usenet@wisemo.com.invalid> writes:
>>
>>> On 2021-01-26 13:22, Ben Bacarisse wrote:
>>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>>>
>>>>> I learned a couple of decades ago that adjacent string literals get
>>>>> concatenated into a single longer literal, even if separated by
>>>>> arbitrarily large amounts of white-space.
>>>>>
>>>>> Yesterday I happened to notice that translation phase 6 says only that
>>>>> "Adjacent string literal tokens are concatenated.", without saying
>>>>> anything about white-space. White-space doesn't lose it's significance
>>>>> until translation phase 7. Therefore, string literals that are separated
>>>>> by white-space do not qualify as adjacent. There's also no mention of
>>>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>>>
>>>>> Am I missing something obvious here? I can imagine someone telling me
>>>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>>>> - but that doesn't seem obvious to me.
>>>>
>>>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>>>> the two are next to each other.  It happens that string literal tokens
>>>> are such that they can be adjacent without having any white-space
>>>> between then, but I suspect that's making you over-think the meaning.
>>>> Would you say that 'long int x' has no tokens adjacent to any others?
>>>>
>>>
>>> The interesting situation is cases like these:
>>>
>>> "a" /* Long comment explaining why b is the next byte */ "b"
>>
>> By translation phase 6 (when adjacent string literals are concatenated)
>> this has become
>>
>> "a"   "b"
>>
>>> And
>>>
>>> #define LEAD_BYTE  "a"
>>> #define TRAIL_BYTE "b"
>>>
>>> LEAD_BYTE TRAIL_BYTE
>>
>> And this has become
>>
>> "a" "b"
>>
>> Am I missing some ambiguity?
>>
> 
> Sorry, but I couldn't easily find the definition of the translation
> phases, only scattered mentions of "phase 6" and "phase 7", so I had to
> guess which practically related language features were buried in that
> distinction.

"5.1.1.2 Translation Phases
The precedence among the syntax rules of translation is specified by the
following
phases. 6)
1. Physical source file multibyte characters are mapped, in an
implementation- defined manner, to the source character set (introducing
new-line characters for end-of-line indicators) if necessary. Trigraph
sequences are replaced by corresponding single-character internal
representations.
2. Each instance of a backslash character (\) immediately followed by a
new-line character is deleted, splicing physical source lines to form
logical source lines. Only the last backslash on any physical source
line shall be eligible for being part of such a splice. A source file
that is not empty shall end in a new-line character, which shall not be
immediately preceded by a backslash character before any such splicing
takes place.
3. The source file is decomposed into preprocessing tokens 7) and
sequences of white-space characters (including comments). A source file
shall not end in a partial preprocessing token or in a partial comment.
Each comment is replaced by one space character. New-line characters are
retained. Whether each nonempty sequence of white-space characters other
than new-line is retained or replaced by one space character is
implementation-defined.
4. Preprocessing directives are executed, macro invocations are
expanded, and _Pragma unary operator expressions are executed. If a
character sequence that matches the syntax of a universal character name
is produced by token concatenation (6.10.3.3), the behavior is
undefined. A #include preprocessing directive causes the named header or
source file to be processed from phase 1 through phase 4, recursively.
All preprocessing directives are then deleted.
5. Each source character set member and escape sequence in character
constants and string literals is converted to the corresponding member
of the execution character set; if there is no corresponding member, it
is converted to an implementation-defined member other than the null
(wide) character. 8)
6. Adjacent string literal tokens are concatenated.
7. White-space characters separating tokens are no longer significant.
Each preprocessing token is converted into a token. The resulting tokens
are syntactically and semantically analyzed and translated as a
translation unit.
8. All external object and function references are resolved. Library
components are linked to satisfy external references to functions and
objects not defined in the current translation. All such translator
output is collected into a program image which contains information
needed for execution in its execution environment."

The referenced footnotes are:
"6) Implementations shall behave as if these separate phases occur, even
though many are typically folded together in practice. Source files,
translation units, and translated translation units need not necessarily
be stored as files, nor need there be any one-to-one correspondence
between these entities and any external representation. The description
is conceptual only, and does not specify any particular implementation.
7) As described in 6.4, the process of dividing a source file’s
characters into preprocessing tokens is context-dependent. For example,
see the handling of < within a #include preprocessing directive.
8) An implementation need not convert all non-corresponding source
characters to the same execution character."

[toc] | [prev] | [next] | [standalone]

#6200

From	Richard Damon <Richard@Damon-Family.org>
Date	2021-01-26 07:52 -0500
Message-ID	<vgUPH.2028$NRKd.497@fx17.iad>
In reply to	#6198

On 1/26/21 7:22 AM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> 
>> I learned a couple of decades ago that adjacent string literals get
>> concatenated into a single longer literal, even if separated by
>> arbitrarily large amounts of white-space.
>>
>> Yesterday I happened to notice that translation phase 6 says only that
>> "Adjacent string literal tokens are concatenated.", without saying
>> anything about white-space. White-space doesn't lose it's significance
>> until translation phase 7. Therefore, string literals that are separated
>> by white-space do not qualify as adjacent. There's also no mention of
>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>
>> Am I missing something obvious here? I can imagine someone telling me
>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>> - but that doesn't seem obvious to me.
> 
> Surely it just means "next to", and in the sequence of tokens "a" "b"
> the two are next to each other.  It happens that string literal tokens
> are such that they can be adjacent without having any white-space
> between then, but I suspect that's making you over-think the meaning.
> Would you say that 'long int x' has no tokens adjacent to any others?
> 

I'm not sure, but 6.4p3 it says

As described in 6.10, in certain circumstances during translation phase
4, white space (or the absence thereof) serves as more than
preprocessing token separation.

which seems to imply that for most purposes (unless expressly stated)
white-space between tokens is generally insignificant. There are cases
where it matters, like the difference between

#define macro(x) (x)
and
#define macro (x) (x)

but these cases explicitly talk about the white-space affecting the
meaning. This would seem to at least imply that it is to be ignored
elsewhere, and thus the white-space between literals doesn't mean they
aren't adjacent.

It would seem that the removal of the possible significance could have
been moved up earlier (but has to be after phase 4 since that has an
explicit use of white-space), as far as I can see, phases 5 and 6 don't
need the white-space significance, but maybe the fact that phase 7 also
converts processor tokens into token says that we want to handle all the
string literal stuff before doing that.

[toc] | [prev] | [next] | [standalone]

#6201

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-26 09:29 -0500
Message-ID	<rup90c$m3d$1@dont-email.me>
In reply to	#6198

On 1/26/21 7:22 AM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> 
>> I learned a couple of decades ago that adjacent string literals get
>> concatenated into a single longer literal, even if separated by
>> arbitrarily large amounts of white-space.
>>
>> Yesterday I happened to notice that translation phase 6 says only that
>> "Adjacent string literal tokens are concatenated.", without saying
>> anything about white-space. White-space doesn't lose it's significance
>> until translation phase 7. Therefore, string literals that are separated
>> by white-space do not qualify as adjacent. There's also no mention of
>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>
>> Am I missing something obvious here? I can imagine someone telling me
>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>> - but that doesn't seem obvious to me.
> 
> Surely it just means "next to", and in the sequence of tokens "a" "b"
> the two are next to each other.  It happens that string literal tokens
> are such that they can be adjacent without having any white-space
> between then, but I suspect that's making you over-think the meaning.
> Would you say that 'long int x' has no tokens adjacent to any others?

No, I would not - and that's precisely because "long int x" is not
parsed as a declaration until translation phase 7, and the very first
sentence of the description of that phase says "White-space characters
separating tokens are no longer significant.". Phase 6 occurs before
that sentence applies, which is precisely my point.

[toc] | [prev] | [next] | [standalone]

#6204

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-26 21:46 +0000
Message-ID	<875z3jv0bg.fsf@bsb.me.uk>
In reply to	#6201

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> On 1/26/21 7:22 AM, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> 
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space. White-space doesn't lose it's significance
>>> until translation phase 7. Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent. There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here? I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.
>> 
>> Surely it just means "next to", and in the sequence of tokens "a" "b"
>> the two are next to each other.  It happens that string literal tokens
>> are such that they can be adjacent without having any white-space
>> between then, but I suspect that's making you over-think the meaning.
>> Would you say that 'long int x' has no tokens adjacent to any others?
>
> No, I would not - and that's precisely because "long int x" is not
> parsed as a declaration until translation phase 7, and the very first
> sentence of the description of that phase says "White-space characters
> separating tokens are no longer significant.". Phase 6 occurs before
> that sentence applies, which is precisely my point.

I meant at the stage you were asking about: phase 6.  The example was an
attempt to find out if your reluctance to see "a" "b" as being adjacent
was in part due to do with the fact that they could have been written
with no spaces.

I think your answer makes it clear that, at phase 6, you think that
there are no two tokens adjacent to one another.  I find that a rather
artificial reading.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6205

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-26 18:28 -0500
Message-ID	<ruq8il$g95$1@dont-email.me>
In reply to	#6204

On 1/26/21 4:46 PM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> 
>> On 1/26/21 7:22 AM, Ben Bacarisse wrote:
...
>> No, I would not - and that's precisely because "long int x" is not
>> parsed as a declaration until translation phase 7, and the very first
>> sentence of the description of that phase says "White-space characters
>> separating tokens are no longer significant.". Phase 6 occurs before
>> that sentence applies, which is precisely my point.
> 
> I meant at the stage you were asking about: phase 6.  The example was an
> attempt to find out if your reluctance to see "a" "b" as being adjacent
> was in part due to do with the fact that they could have been written
> with no spaces.

Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
not, because both are adjacent to some white-space instead. I'm not
suggesting that the committee intended to prohibit white space between
the tokens, merely that wording chosen doesn't clearly allow it.

> I think your answer makes it clear that, at phase 6, you think that
> there are no two tokens adjacent to one another.  I find that a rather
> artificial reading.

If they had used the term "consecutive", I could have seen that as a
reasonable interpretation. "a" is one token, and "b" is the next token,
even though they are separated by something, because that something
isn't a token.

[toc] | [prev] | [next] | [standalone]

#6206

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-27 01:16 +0000
Message-ID	<87lfcftc18.fsf@bsb.me.uk>
In reply to	#6205

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> On 1/26/21 4:46 PM, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> 
>>> On 1/26/21 7:22 AM, Ben Bacarisse wrote:
> ...
>>> No, I would not - and that's precisely because "long int x" is not
>>> parsed as a declaration until translation phase 7, and the very first
>>> sentence of the description of that phase says "White-space characters
>>> separating tokens are no longer significant.". Phase 6 occurs before
>>> that sentence applies, which is precisely my point.
>> 
>> I meant at the stage you were asking about: phase 6.  The example was an
>> attempt to find out if your reluctance to see "a" "b" as being adjacent
>> was in part due to do with the fact that they could have been written
>> with no spaces.
>
> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
> not, because both are adjacent to some white-space instead.

Adjacent does not mean with nothing in between (thought it can, of
course).  What's more, things can be adjacent to each other, and also
adjacent to something in between.  I can say that there was a fire in
the house adjacent to mine.  The two house are adjacent.  But both are
adjacent to the lane separating them.

<cut>
-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6207

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-26 22:48 -0500
Message-ID	<ruqnqh$7ie$1@dont-email.me>
In reply to	#6206

On 1/26/21 8:16 PM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
...
>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>> not, because both are adjacent to some white-space instead.
> 
> Adjacent does not mean with nothing in between (thought it can, of
> course).  What's more, things can be adjacent to each other, and also
> adjacent to something in between.  I can say that there was a fire in
> the house adjacent to mine.  The two house are adjacent.  But both are
> adjacent to the lane separating them.

It takes at least two dimensions for the issue you raise to come up. As
far as the C standard is concerned, source code is a one-dimensional
sequence of characters. It's possible to think of the text
two-dimensionally, but the standard doesn't make use of that fact in any
way that I'm aware of. I don't think anyone would suggest that two
string literals that are vertically adjacent to each other:

    char first = "James";
    char second = "Kuyper";

should be merged.
Even if you acknowledge only that this is one possible way of
interpreting "adjacent", that would mean the meaning is ambiguous.
Moving the first sentence of translation phase 7 to be the first
sentence of translation phase 6 would remove all ambiguity, and have, as
far as I can see, no other consequence.

[toc] | [prev] | [next] | [standalone]

#6208

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-27 15:46 +0000
Message-ID	<87ft2mtmae.fsf@bsb.me.uk>
In reply to	#6207

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> ...
>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>> not, because both are adjacent to some white-space instead.
>> 
>> Adjacent does not mean with nothing in between (thought it can, of
>> course).  What's more, things can be adjacent to each other, and also
>> adjacent to something in between.  I can say that there was a fire in
>> the house adjacent to mine.  The two house are adjacent.  But both are
>> adjacent to the lane separating them.
>
> It takes at least two dimensions for the issue you raise to come up.

I don't follow.  1 and 2 are adjacent integers on the real line
(i.e. despite having other kinds of number between them).  In addition,
they are both integers adjacent to 1/2.

> As
> far as the C standard is concerned, source code is a one-dimensional
> sequence of characters. It's possible to think of the text
> two-dimensionally, but the standard doesn't make use of that fact in any
> way that I'm aware of. I don't think anyone would suggest that two
> string literals that are vertically adjacent to each other:
>
>     char first = "James";
>     char second = "Kuyper";
>
> should be merged.
> Even if you acknowledge only that this is one possible way of
> interpreting "adjacent", that would mean the meaning is ambiguous.

Lots of words in the standard could, at a pinch, be taken to mean
something other than what is obviously intended.  But if you think
someone might read about phase 6 and think that "a""b" will be
concatenated but not "a" "b", then you should file a defect report.

> Moving the first sentence of translation phase 7 to be the first
> sentence of translation phase 6 would remove all ambiguity, and have, as
> far as I can see, no other consequence.

I think the strongest case for the possibility of misunderstanding comes
from this sentence being where it is.  I don't see any problem with the
word "adjacent", but I can imagine someone wondering why this sentence
is where it is if not to do what you are suggesting.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6209

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-01-27 11:20 -0500
Message-ID	<rus3ss$57c$1@dont-email.me>
In reply to	#6208

On 1/27/21 10:46 AM, Ben Bacarisse wrote:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
> 
>> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> ...
>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>>> not, because both are adjacent to some white-space instead.
>>>
>>> Adjacent does not mean with nothing in between (thought it can, of
>>> course).  What's more, things can be adjacent to each other, and also
>>> adjacent to something in between.  I can say that there was a fire in
>>> the house adjacent to mine.  The two house are adjacent.  But both are
>>> adjacent to the lane separating them.
>>
>> It takes at least two dimensions for the issue you raise to come up.
> 
> I don't follow.  1 and 2 are adjacent integers on the real line
> (i.e. despite having other kinds of number between them).  In addition,
> they are both integers adjacent to 1/2.

I'm not familiar with any meaning that could reasonably be attached to
"adjacent" which would make either of those statements true. In the
future, I will try to remember that there's at least one person who does
attach such a meaning to that word - but it would make it easier for me
to understand how you could say such a thing if you would specify that
definition.

When using a meaning that allows 1 and 2 to be both adjacent to 1/2,
while also being adjacent to each other, how do you interpret "adjacent
string literal" so that it doesn't apply to

    ptrdiff_t d = "Ben"-"Bacarisse";

It seems to me that, despite having no idea how you could possibly mean
what you seem to have said, I can make a direct analogy, matching 1 with
"Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy
break down? Or are you claiming that they should be concatenated?

...
>> Moving the first sentence of translation phase 7 to be the first
>> sentence of translation phase 6 would remove all ambiguity, and have, as
>> far as I can see, no other consequence.
> 
> I think the strongest case for the possibility of misunderstanding comes
> from this sentence being where it is.  I don't see any problem with the
> word "adjacent", but I can imagine someone wondering why this sentence
> is where it is if not to do what you are suggesting.

I think you just agreed with me, but you didn't quite say so directly.

[toc] | [prev] | [next] | [standalone]

#6210

From	Ben Bacarisse <ben.usenet@bsb.me.uk>
Date	2021-01-28 03:05 +0000
Message-ID	<87y2gdsqvf.fsf@bsb.me.uk>
In reply to	#6209

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> On 1/27/21 10:46 AM, Ben Bacarisse wrote:
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>> 
>>> On 1/26/21 8:16 PM, Ben Bacarisse wrote:
>>>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>>> ...
>>>>> Yes, it is. In "a""b", the two tokens are adjacent. In "a" "b", they are
>>>>> not, because both are adjacent to some white-space instead.
>>>>
>>>> Adjacent does not mean with nothing in between (thought it can, of
>>>> course).  What's more, things can be adjacent to each other, and also
>>>> adjacent to something in between.  I can say that there was a fire in
>>>> the house adjacent to mine.  The two house are adjacent.  But both are
>>>> adjacent to the lane separating them.
>>>
>>> It takes at least two dimensions for the issue you raise to come up.
>> 
>> I don't follow.  1 and 2 are adjacent integers on the real line
>> (i.e. despite having other kinds of number between them).  In addition,
>> they are both integers adjacent to 1/2.
>
> I'm not familiar with any meaning that could reasonably be attached to
> "adjacent" which would make either of those statements true.

That's and interesting view, but probably so off-topic that it would not be
reasonable to investigate it here.

> In the future, I will try to remember that there's at least one person
> who does attach such a meaning to that word - but it would make it
> easier for me to understand how you could say such a thing if you
> would specify that definition.

I am not a lexicographer, and not skilled at writing definitions.  So I
looked in the two dictionaries on the shelf here.  The OED says:

  "Lying near to; adjoining; bordering. (Not necessarily touching.)"

and Collins says

  "being near or close, esp. having a common boundary; adjoining;
  contiguous."

These are pretty close to what I feel the word means.

For comparison, what is your understanding of the word?

> When using a meaning that allows 1 and 2 to be both adjacent to 1/2,
> while also being adjacent to each other, how do you interpret "adjacent
> string literal" so that it doesn't apply to
>
>     ptrdiff_t d = "Ben"-"Bacarisse";
>
> It seems to me that, despite having no idea how you could possibly mean
> what you seem to have said, I can make a direct analogy, matching 1 with
> "Ben", 1/2 with '-', and 2 with "Bacarisse". So, how does that analogy
> break down? Or are you claiming that they should be concatenated?

It depends on what is the considered significant and what is merely a
separator or common boundary.

On the number line, we can stress what we want to focus on.  "Adjacent
/integers/" relegates everything else to being a mere separating
boundary.

So, to push the point to the edge of reason, if I choose to read the key
sentence as "Adjacent /string literal/ tokens are concatenated", I
could, at a pinch, make the case that "Ben" and "Bacarisse" are, in your
example, adjacent.  The context would have to be such that considering
another token as a mere boundary or separator would be reasonable.  The
C standard is not such a context.

But if I read it as "Adjacent string literal /tokens/ are concatenated",
then the intervening token stops them being adjacent.  When tokenising a
character stream, all the tokens matter, so I believe there is only one
reasonable way to read that sentence.

> ...
>>> Moving the first sentence of translation phase 7 to be the first
>>> sentence of translation phase 6 would remove all ambiguity, and have, as
>>> far as I can see, no other consequence.
>> 
>> I think the strongest case for the possibility of misunderstanding comes
>> from this sentence being where it is.  I don't see any problem with the
>> word "adjacent", but I can imagine someone wondering why this sentence
>> is where it is if not to do what you are suggesting.
>
> I think you just agreed with me, but you didn't quite say so directly.

Agreement is not binary.  I don't find your argument based on what
adjacent means to be compelling, but I agree that the presence of that
sentence one phase too late muddies the waters a bit.

I've tried to express the extent and the nature of my agreement (and
disagreement) as directly as I can.  I'm sorry if you think I have been
oblique.

TL;DR: The fact that adjacent means something in the cluster of ideas
around "being near to" and "having a common boundary, but not
necessarily touching" means that I don't think there is any problem with
"a" "b" being described as adjacent string literal tokens.

-- 
Ben.

[toc] | [prev] | [next] | [standalone]

#6264

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2021-07-10 08:49 -0700
Message-ID	<86v95i88zw.fsf@linuxsc.com>
In reply to	#6197

James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> I learned a couple of decades ago that adjacent string literals get
> concatenated into a single longer literal, even if separated by
> arbitrarily large amounts of white-space.
>
> Yesterday I happened to notice that translation phase 6 says only that
> "Adjacent string literal tokens are concatenated.", without saying
> anything about white-space.  White-space doesn't lose it's significance
> until translation phase 7.  Therefore, string literals that are separated
> by white-space do not qualify as adjacent.  There's also no mention of
> white-space in the fuller discussion that occurs in 6.4.5p5.
>
> Am I missing something obvious here?  I can imagine someone telling me
> that "adjacent" should be understood as "adjacent, ignoring white-space"
> - but that doesn't seem obvious to me.  It also sounds vaguely familiar,
> like I've had this discussion with someone before, but I can't locate
> the discussion.  Every example of adjacent string literals that appears
> in the standard has at least one white-space character separating them,
> so the intent is crystal-clear, but the wording doesn't clearly say so.
>
> If the phrase "White-space characters separating tokens are no longer
> significant." were moved from the beginning of the description of phase
> 7 to the beginning of the description phase 6, it would make the
> insignificance of white space separating string literals perfectly
> clear, and as far as I can see, would have no other effect

The word "adjacent" doesn't alway mean touching.  There is another
word for that, the word "adjoining".  Booking a hotel reservation
for adjacent rooms is not the same as a reservation for adjoining
rooms.

[toc] | [prev] | [next] | [standalone]

#6268

From	Keith Thompson <Keith.S.Thompson+u@gmail.com>
Date	2021-07-10 14:58 -0700
Message-ID	<87fswl7rvg.fsf@nosuchdomain.example.com>
In reply to	#6264

Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
[...]
>> If the phrase "White-space characters separating tokens are no longer
>> significant." were moved from the beginning of the description of phase
>> 7 to the beginning of the description phase 6, it would make the
>> insignificance of white space separating string literals perfectly
>> clear, and as far as I can see, would have no other effect
>
> The word "adjacent" doesn't alway mean touching.  There is another
> word for that, the word "adjoining".  Booking a hotel reservation
> for adjacent rooms is not the same as a reservation for adjoining
> rooms.

That's not entirely clear.  dictionary.com (not a definitive reference
but a convenient one) shows "adjoining" as one of the definitions of
"adjacent".

If I understand you correctly, if rooms 110 and 112 share a common wall,
perhaps with a door going between them, they're both adjacent and
adjoining, but if instead they're on opposide sides of the elevator
they're adjacent but not adjoining.  Is that what you meant?  I'm not
sure I'd call them "adjacent" in that case.

A footnote on "Adjacent string literals are concatenated" saying that
two string literals are adjacent if they're adjoining or separated only
by white-space characters would clear this up.  Moving "White-space
characters separating tokens are no longer significant." from the
beginning of phase 7 to the beginning of phase 6 would also be a good
solution.

But given the clear examples, I wouldn't object to leaving it as it is.

-- 
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips
void Void(void) { Void(); } /* The recursive call of the void */

[toc] | [prev] | [next] | [standalone]

#6270

From	Tim Rentsch <tr.17687@z991.linuxsc.com>
Date	2021-07-22 10:29 -0700
Message-ID	<86im125kaq.fsf@linuxsc.com>
In reply to	#6268

Keith Thompson <Keith.S.Thompson+u@gmail.com> writes:

> Tim Rentsch <tr.17687@z991.linuxsc.com> writes:
>
>> James Kuyper <jameskuyper@alumni.caltech.edu> writes:
>
> [...]
>
>>> If the phrase "White-space characters separating tokens are no longer
>>> significant." were moved from the beginning of the description of phase
>>> 7 to the beginning of the description phase 6, it would make the
>>> insignificance of white space separating string literals perfectly
>>> clear, and as far as I can see, would have no other effect
>>
>> The word "adjacent" doesn't alway mean touching.  There is another
>> word for that, the word "adjoining".  Booking a hotel reservation
>> for adjacent rooms is not the same as a reservation for adjoining
>> rooms.
>
> That's not entirely clear.  dictionary.com (not a definitive reference
> but a convenient one) shows "adjoining" as one of the definitions of
> "adjacent".

That's consistent with what I said:  "adjoining" being only one
of the definitions is consistent with saying "adjacent" doesn't
_always_ mean touching.  Words in English can be ambiguous in
their meanings.

> If I understand you correctly, if rooms 110 and 112 share a common wall,
> perhaps with a door going between them, they're both adjacent and
> adjoining,

In the case of hotels I think "adjoining" always means connected,
either with or perhaps without a door, but yes.

> but if instead they're on opposide sides of the elevator
> they're adjacent but not adjoining.  Is that what you meant?  I'm not
> sure I'd call them "adjacent" in that case.

A better example is a small utility closet rather than an
elevator.  "Adjacent" usually implies "closeness" even if
it doesn't always mean touching, and two rooms with a bank
of four elevators between them would for most people not
be considered adjacent, I think.  In the case of hotel
rooms at least it's a matter of degree.

Another example is two rooms having the same latitude and
longitude, but on different (consecutive) floors.  I think most
people wouldn't call those rooms "adjacent".  However, if there
is a connecting stairway between them, a hotel might very well
offer them as "adjoining rooms".

> A footnote on "Adjacent string literals are concatenated" saying that
> two string literals are adjacent if they're adjoining or separated only
> by white-space characters would clear this up.  Moving "White-space
> characters separating tokens are no longer significant." from the
> beginning of phase 7 to the beginning of phase 6 would also be a good
> solution.
>
> But given the clear examples, I wouldn't object to leaving it as it is.

Given that the wording lasted more than 30 years without anyone
even noticing a problem, I think the case for leaving it alone
is decidedly stronger than the case for making a change.

[toc] | [prev] | [next] | [standalone]

#6269

From	James Kuyper <jameskuyper@alumni.caltech.edu>
Date	2021-07-11 11:41 -0700
Message-ID	<dab9e114-5156-4951-b464-799f231eaafen@googlegroups.com>
In reply to	#6264

On Saturday, July 10, 2021 at 11:49:09 AM UTC-4, Tim Rentsch wrote:
> James Kuyper <james...@alumni.caltech.edu> writes: 
> 
> > I learned a couple of decades ago that adjacent string literals get 
> > concatenated into a single longer literal, even if separated by 
> > arbitrarily large amounts of white-space. 
> > 
> > Yesterday I happened to notice that translation phase 6 says only that 
> > "Adjacent string literal tokens are concatenated.", without saying 
> > anything about white-space. White-space doesn't lose it's significance 
> > until translation phase 7. Therefore, string literals that are separated 
> > by white-space do not qualify as adjacent. There's also no mention of 
> > white-space in the fuller discussion that occurs in 6.4.5p5. 
> > 
> > Am I missing something obvious here? I can imagine someone telling me 
> > that "adjacent" should be understood as "adjacent, ignoring white-space" 
> > - but that doesn't seem obvious to me. It also sounds vaguely familiar, 
> > like I've had this discussion with someone before, but I can't locate 
> > the discussion. Every example of adjacent string literals that appears 
> > in the standard has at least one white-space character separating them, 
> > so the intent is crystal-clear, but the wording doesn't clearly say so. 
> > 
> > If the phrase "White-space characters separating tokens are no longer 
> > significant." were moved from the beginning of the description of phase 
> > 7 to the beginning of the description phase 6, it would make the 
> > insignificance of white space separating string literals perfectly 
> > clear, and as far as I can see, would have no other effect
> The word "adjacent" doesn't alway mean touching. There is another 
> word for that, the word "adjoining". Booking a hotel reservation 
> for adjacent rooms is not the same as a reservation for adjoining 
> rooms.

But, if it doesn't mean "touching", what does it mean? If a blank space
doesn't prevent them from being adjacent, what does? How do you
draw the line between things that do prevent two string literals from
being adjacent, and things that don't? And - most importantly, where
in the actual text of the standard does it clearly make that distinction?
I contend that it doesn't clearly make that distinction anywhere, but
that moving the sentence "White-space characters separating
tokens are no longer significant." From the beginning of phase 7 to
the beginning of phase 6 would remove all ambiguity, making the text
match the way all real world implementations actually handle this
issue, and would have no other effect. Do you disagree? If so, with
which part of what I just said, and for what reason?

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Adjacent string literals

Contents

#6197 — Adjacent string literals

#6198

#6199

#6202

#6203

#6211

#6212

#6200

#6201

#6204

#6205

#6206

#6207

#6208

#6209

#6210

#6264

#6268

#6270

#6269