Path: csiph.com!goblin3!goblin.stu.neva.ru!panix!usenet.stanford.edu!not-for-mail From: Dave Newsgroups: gnu.groff.bug Subject: [bug #58930] take baby steps toward Unicode Date: Thu, 20 Aug 2020 01:23:18 -0400 (EDT) Lines: 35 Approved: bug-groff@gnu.org Message-ID: References: <20200810-095606.sv93119.42780@savannah.gnu.org> <20200814-100002.sv108747.62919@savannah.gnu.org> <20200814-220415.sv93119.59625@savannah.gnu.org> <20200814-222905.sv93119.21750@savannah.gnu.org> <20200815-040539.sv108747.29165@savannah.gnu.org> <20200815-123802.sv93119.34881@savannah.gnu.org> <20200815-174643.sv108747.91177@savannah.gnu.org> <20200815-195318.sv93119.97497@savannah.gnu.org> <20200820-002318.sv93119.90750@savannah.gnu.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain;charset=UTF-8 X-Trace: usenet.stanford.edu 1597901000 12243 209.51.188.17 (20 Aug 2020 05:23:20 GMT) X-Complaints-To: action@cs.stanford.edu To: "G. Branden Robinson" , Dave , bug-groff@gnu.org Envelope-to: bug-groff@gnu.org X-PHP-Originating-Script: 1001:sendmail.php X-Savane-Server: savannah.gnu.org:443 [2001:470:142::72] X-Savane-Project: groff X-Savane-Tracker: bugs X-Savane-Item-ID: 58930 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 X-Apparently-From: 2605:a601:ab42:5b00:d79a:70a3:b6a4:34bf (Savane authenticated user barx) In-Reply-To: <20200815-195318.sv93119.97497@savannah.gnu.org> X-BeenThere: bug-groff@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Bug reports for the GNU version of nroff, troff et al" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20200820-002318.sv93119.90750@savannah.gnu.org> X-Mailman-Original-References: <20200810-095606.sv93119.42780@savannah.gnu.org> <20200814-100002.sv108747.62919@savannah.gnu.org> <20200814-220415.sv93119.59625@savannah.gnu.org> <20200814-222905.sv93119.21750@savannah.gnu.org> <20200815-040539.sv108747.29165@savannah.gnu.org> <20200815-123802.sv93119.34881@savannah.gnu.org> <20200815-174643.sv108747.91177@savannah.gnu.org> <20200815-195318.sv93119.97497@savannah.gnu.org> Xref: csiph.com gnu.groff.bug:1987 Follow-up Comment #8, bug #58930 (project groff): [comment #2 comment #2:] > Unicode considers U+2009 THIN SPACE and U+200A HAIR SPACE breakable... > Groff... does not offer breaking versions of these spaces, and the only > reason to add them would be strict compliance with a Unicode property > that probably no one who uses those code points actually wants I believe my reasoning here was inaccurate. Although Unicode _allows_ breaking at a thin space or hair space, it does not _require_ it,* so groff declining to treat these as break points does not violate Unicode compliance at all. Thus I now propose that U+2009 THIN SPACE be mapped to groff's (nonbreaking) \|, and U+200A HAIR SPACE to groff's (nonbreaking) \^. * The gory details: Unicode line breaking is covered in "Unicode Standard Annex #14: Unicode Line Breaking Algorithm" (http://www.unicode.org/reports/tr14/tr14-45.html), whose introductory section makes its scope clear: "Given an input text, [this algorithm] produces a set of positions called 'break opportunities' that are appropriate points to begin a new line. The selection of actual line break positions from the set of break opportunities is not covered by the Unicode Line Breaking Algorithm, but is in the domain of higher level software." Groff declining to break at points that Unicode specifies as "break opportunities" is perfectly in line with this. _______________________________________________________ Reply to this item at: _______________________________________________ Message sent via Savannah https://savannah.gnu.org/