Path: csiph.com!3.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail
From: Dave <INVALID.NOREPLY@gnu.org>
Newsgroups: gnu.groff.bug
Subject: [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented
Date: Sat, 15 Aug 2020 13:25:32 -0400 (EDT)
Lines: 100
Approved: bug-groff@gnu.org
Message-ID: <mailman.2225.1597512334.2739.bug-groff@gnu.org>
References: <20200815-122530.sv93119.71147@savannah.gnu.org>
NNTP-Posting-Host: lists.gnu.org
Mime-Version: 1.0
Content-Type: text/plain;charset=UTF-8
X-Trace: usenet.stanford.edu 1597512334 7083 209.51.188.17 (15 Aug 2020 17:25:34 GMT)
X-Complaints-To: action@cs.stanford.edu
To: Dave <saint.snit@gmail.com>, bug-groff@gnu.org
Envelope-to: bug-groff@gnu.org
X-PHP-Originating-Script: 1001:sendmail.php
X-Savane-Server: savannah.gnu.org:443 [2001:470:142::72]
X-Savane-Project: groff
X-Savane-Tracker: bugs
X-Savane-Item-ID: 58962
User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0
X-Apparently-From: 2605:a601:ab42:5b00:d79a:70a3:b6a4:34bf (Savane authenticated user barx)
In-Reply-To: 
X-BeenThere: bug-groff@gnu.org
X-Mailman-Version: 2.1.23
Precedence: list
List-Id: "Bug reports for the GNU version of nroff, troff et al" <bug-groff.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-groff>
List-Post: <mailto:bug-groff@gnu.org>
List-Help: <mailto:bug-groff-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=subscribe>
X-Mailman-Original-Message-ID: <20200815-122530.sv93119.71147@savannah.gnu.org>
Xref: csiph.com gnu.groff.bug:1978

URL:
  <https://savannah.gnu.org/bugs/?58962>

                 Summary: Latin-1 NO-BREAK SPACE does not behave as documented
                 Project: GNU troff
            Submitted by: barx
            Submitted on: Sat 15 Aug 2020 12:25:30 PM CDT
                Category: None
                Severity: 3 - Normal
              Item Group: Incorrect behaviour
                  Status: None
                 Privacy: Public
             Assigned to: None
             Open/Closed: Open
         Discussion Lock: Any
         Planned Release: None

    _______________________________________________________

Details:

(Another bug report spawned from the discovery process of bug #58930.)

Quoth groff_char(7): "the ISO latin1 _no-break space_ is mapped to `\~', the
stretchable space character."

An eminently sensible mapping.  Oh, if only it were so.

In fact, the Latin-1 no-break space (character 160 decimal, A0 hex):

* behaves the same as "\ ", the nonstretchable nonbreaking space character
* matches neither "\ " nor "\~" in an output-equivalency conditional

Examining these in detail:

=== Behavior ===

Consider an input file with one instance of the string "<>", representing a
nonbreaking space.  sed can convert this string to the various types of
nonbreaking space under consideration (the two escapes and the raw Latin-1
character), and the typeset results compared by seeing which ones produce
identical PostScript output.


$ cat t0
Lorem ipsum dolor sit amet, consectetur<>adipiscing elit, sed
do eiusmod tempor incididunt ut labore et dolore magna aliqua.
$ # Baseline test, for escapes expected to be different:
$ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\\~/' t0 | groff) | wc
      8      68     403
$ # Output expected to be the same based on what the docs say:
$ diff <(sed 's/<>/\\~/' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc
      8      68     403
$ # Output that turns out to be the same:
$ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc
      0       0       0
$ 


I'm filing this as "Incorrect behavio[u]r" rather than "Documentation" because
I believe the documented behavior is more sensible than the actual behavior. 
But that's a judgment call and open to debate.

=== Equivalency conditional ===

Either way, if Latin-1 A0 behaves the same as one of "\ " or "\~", the
output-equivalency conditional operator (rendered as 'XXX'YYY' in the info
manual, though a host of characters besides single quotes can be used) ought
to recognize this.  But this operator claims the output of character A0 is
equivalent to neither one (first observed in comment #2 of the aforementioned
bug <http://savannah.gnu.org/bugs/?58930#comment2>).


$ printf ".if '\xA0'\~' .tm equal\n" | groff
$ printf ".if '\xA0'\ ' .tm equal\n" | groff
$ 


(Granted, the documentation muddies what this operator is actually testing. 
The info manual is clear about 'XXX'YYY', saying this is "True if the output
produced by XXX is equal to the output produced by YYY."  But groff(7) is less
clear, saying that the test 's1's2' is "True if string s1 is identical to
string s2," which implies it's comparing _input_ strings.  Were that the case,
you'd expect both the above tests to be false... but you'd also expect
'\[em]'\[u2014]' to be false, which it isn't.)




    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58962>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/