Path: csiph.com!3.eu.feeder.erje.net!feeder.erje.net!newsfeed.xs4all.nl!newsfeed8.news.xs4all.nl!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Dave Newsgroups: gnu.groff.bug Subject: [bug #58962] Latin-1 NO-BREAK SPACE does not behave as documented Date: Sat, 15 Aug 2020 13:25:32 -0400 (EDT) Lines: 100 Approved: bug-groff@gnu.org Message-ID: References: <20200815-122530.sv93119.71147@savannah.gnu.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain;charset=UTF-8 X-Trace: usenet.stanford.edu 1597512334 7083 209.51.188.17 (15 Aug 2020 17:25:34 GMT) X-Complaints-To: action@cs.stanford.edu To: Dave , bug-groff@gnu.org Envelope-to: bug-groff@gnu.org X-PHP-Originating-Script: 1001:sendmail.php X-Savane-Server: savannah.gnu.org:443 [2001:470:142::72] X-Savane-Project: groff X-Savane-Tracker: bugs X-Savane-Item-ID: 58962 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 X-Apparently-From: 2605:a601:ab42:5b00:d79a:70a3:b6a4:34bf (Savane authenticated user barx) In-Reply-To: X-BeenThere: bug-groff@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Bug reports for the GNU version of nroff, troff et al" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20200815-122530.sv93119.71147@savannah.gnu.org> Xref: csiph.com gnu.groff.bug:1978 URL: Summary: Latin-1 NO-BREAK SPACE does not behave as documented Project: GNU troff Submitted by: barx Submitted on: Sat 15 Aug 2020 12:25:30 PM CDT Category: None Severity: 3 - Normal Item Group: Incorrect behaviour Status: None Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Details: (Another bug report spawned from the discovery process of bug #58930.) Quoth groff_char(7): "the ISO latin1 _no-break space_ is mapped to `\~', the stretchable space character." An eminently sensible mapping. Oh, if only it were so. In fact, the Latin-1 no-break space (character 160 decimal, A0 hex): * behaves the same as "\ ", the nonstretchable nonbreaking space character * matches neither "\ " nor "\~" in an output-equivalency conditional Examining these in detail: === Behavior === Consider an input file with one instance of the string "<>", representing a nonbreaking space. sed can convert this string to the various types of nonbreaking space under consideration (the two escapes and the raw Latin-1 character), and the typeset results compared by seeing which ones produce identical PostScript output. $ cat t0 Lorem ipsum dolor sit amet, consectetur<>adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. $ # Baseline test, for escapes expected to be different: $ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\\~/' t0 | groff) | wc 8 68 403 $ # Output expected to be the same based on what the docs say: $ diff <(sed 's/<>/\\~/' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc 8 68 403 $ # Output that turns out to be the same: $ diff <(sed 's/<>/\\ /' t0 | groff) <(sed 's/<>/\xA0/' t0 | groff) | wc 0 0 0 $ I'm filing this as "Incorrect behavio[u]r" rather than "Documentation" because I believe the documented behavior is more sensible than the actual behavior. But that's a judgment call and open to debate. === Equivalency conditional === Either way, if Latin-1 A0 behaves the same as one of "\ " or "\~", the output-equivalency conditional operator (rendered as 'XXX'YYY' in the info manual, though a host of characters besides single quotes can be used) ought to recognize this. But this operator claims the output of character A0 is equivalent to neither one (first observed in comment #2 of the aforementioned bug ). $ printf ".if '\xA0'\~' .tm equal\n" | groff $ printf ".if '\xA0'\ ' .tm equal\n" | groff $ (Granted, the documentation muddies what this operator is actually testing. The info manual is clear about 'XXX'YYY', saying this is "True if the output produced by XXX is equal to the output produced by YYY." But groff(7) is less clear, saying that the test 's1's2' is "True if string s1 is identical to string s2," which implies it's comparing _input_ strings. Were that the case, you'd expect both the above tests to be false... but you'd also expect '\[em]'\[u2014]' to be false, which it isn't.) _______________________________________________________ Reply to this item at: _______________________________________________ Message sent via Savannah https://savannah.gnu.org/