Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.std.c > #6271

Re: Adjacent string literals

From Tim Rentsch <tr.17687@z991.linuxsc.com>
Newsgroups comp.std.c
Subject Re: Adjacent string literals
Date 2021-07-22 15:26 -0700
Organization A noiseless patient Spider
Message-ID <86eebq56k8.fsf@linuxsc.com> (permalink)
References <rumnae$4mr$1@dont-email.me> <86v95i88zw.fsf@linuxsc.com> <dab9e114-5156-4951-b464-799f231eaafen@googlegroups.com>

Show all headers | View raw


James Kuyper <jameskuyper@alumni.caltech.edu> writes:

> On Saturday, July 10, 2021 at 11:49:09 AM UTC-4, Tim Rentsch wrote:
>
>> James Kuyper <james...@alumni.caltech.edu> writes:
>>
>>> I learned a couple of decades ago that adjacent string literals get
>>> concatenated into a single longer literal, even if separated by
>>> arbitrarily large amounts of white-space.
>>>
>>> Yesterday I happened to notice that translation phase 6 says only that
>>> "Adjacent string literal tokens are concatenated.", without saying
>>> anything about white-space.  White-space doesn't lose it's significance
>>> until translation phase 7.  Therefore, string literals that are separated
>>> by white-space do not qualify as adjacent.  There's also no mention of
>>> white-space in the fuller discussion that occurs in 6.4.5p5.
>>>
>>> Am I missing something obvious here?  I can imagine someone telling me
>>> that "adjacent" should be understood as "adjacent, ignoring white-space"
>>> - but that doesn't seem obvious to me.  It also sounds vaguely familiar,
>>> like I've had this discussion with someone before, but I can't locate
>>> the discussion.  Every example of adjacent string literals that appears
>>> in the standard has at least one white-space character separating them,
>>> so the intent is crystal-clear, but the wording doesn't clearly say so.
>>>
>>> If the phrase "White-space characters separating tokens are no longer
>>> significant." were moved from the beginning of the description of phase
>>> 7 to the beginning of the description phase 6, it would make the
>>> insignificance of white space separating string literals perfectly
>>> clear, and as far as I can see, would have no other effect
>>
>> The word "adjacent" doesn't alway mean touching.  There is another
>> word for that, the word "adjoining".  Booking a hotel reservation
>> for adjacent rooms is not the same as a reservation for adjoining
>> rooms.
>
> But, if it doesn't mean "touching", what does it mean?

In hotels, normally it means on the same floor and with no
intervening rooms or other major building structures (but small
things like utility closets don't count).  In a country inn where
there are standalone cottages rather than rooms, two cottages
would normally be called adjacent if there were no other cottages
in between, and the cottages in question were not inordinately far
apart.

In the C standard it means having no intervening tokens.

> If a blank space
> doesn't prevent them from being adjacent, what does?

Another token (not a string literal token, presumably, but only
because we might consider a sequence of string literal tokens
to be "adjacent tokens").

> How do you
> draw the line between things that do prevent two string literals from
> being adjacent, and things that don't?

In the text of the C standard, the word "adjacent" is an adjective
modifying the noun "tokens", and hence tokens are what matters.
The line is drawn by normal English usage.

> And - most importantly, where in the actual text of the standard
> does it clearly make that distinction?

That depends in part on one's notion of what it means "to clearly
make" a distinction.  Speaking for myself, the combination of
"adjacent" modifying "tokens" and the examples given in 6.4.5 make
the distinction quite clearly enough.

> I contend that it doesn't clearly make that distinction anywhere,

If I may make a suggestion, how you read the C standard doesn't
match the reading mode expected by its authors.  The C standard
wasn't written for a target audience of lawyers or mathematicians,
but by practical software developers expecting it would be read by
other practical software developers.  The issue suggested here is
way below their radar, and indeed way below the radar of most
people who read the C standard.  If no one else has noticed it in
more than 30 years, what does that say about how clear or unclear
the distinction is?

> but
> that moving the sentence "White-space characters separating
> tokens are no longer significant."  From the beginning of phase 7 to
> the beginning of phase 6 would remove all ambiguity, making the text
> match the way all real world implementations actually handle this
> issue, and would have no other effect.  Do you disagree?

I don't either agree or disagree, because I think the extremely
low probability of anyone being confused makes it not worth the
effort of investigating the question.

> If so, with which part of what I just said, and for what reason?

If there is something I disagree with, I think it's the idea that
attempting to "clarify" the language here will necessarily result
in a net benefit.  Consider for example the C++ standard:  its
authors apparently strive for exact and precise (and presumably
ambiguity free) phrasing, but the result is an unreadable mess.
To me it seems obvious that the writing in the C standard is much
closer to a good balance point between being formally exact and
being understandable.  From my point of view, if writing in the C
standard (or other similar standards) isn't understandable, it's
useless, no matter how precise or exact it is.  In this particular
case I would say the current wording is definitely on the right
side of the line.

Back to comp.std.c | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-25 10:15 -0500
  Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 12:22 +0000
    Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-26 13:48 +0100
      Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-01-26 13:05 -0800
      Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:40 +0000
        Re: Adjacent string literals Jakob Bohm <jb-usenet@wisemo.com.invalid> - 2021-01-28 09:53 +0100
          Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-28 05:45 -0500
    Re: Adjacent string literals Richard Damon <Richard@Damon-Family.org> - 2021-01-26 07:52 -0500
    Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 09:29 -0500
      Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-26 21:46 +0000
        Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 18:28 -0500
          Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 01:16 +0000
            Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-26 22:48 -0500
              Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-27 15:46 +0000
                Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-01-27 11:20 -0500
                Re: Adjacent string literals Ben Bacarisse <ben.usenet@bsb.me.uk> - 2021-01-28 03:05 +0000
  Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-10 08:49 -0700
    Re: Adjacent string literals Keith Thompson <Keith.S.Thompson+u@gmail.com> - 2021-07-10 14:58 -0700
      Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 10:29 -0700
    Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-11 11:41 -0700
      Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2021-07-22 15:26 -0700
        Re: Adjacent string literals James Kuyper <jameskuyper@alumni.caltech.edu> - 2021-07-22 17:29 -0700
          Re: Adjacent string literals Tim Rentsch <tr.17687@z991.linuxsc.com> - 2022-01-17 05:29 -0800

csiph-web