Path: csiph.com!goblin2!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail From: Dave Newsgroups: gnu.groff.bug Subject: [bug #58930] take baby steps toward Unicode Date: Mon, 10 Aug 2020 10:56:08 -0400 (EDT) Lines: 93 Approved: bug-groff@gnu.org Message-ID: References: <20200810-095606.sv93119.42780@savannah.gnu.org> NNTP-Posting-Host: lists.gnu.org Mime-Version: 1.0 Content-Type: text/plain;charset=UTF-8 X-Trace: usenet.stanford.edu 1597071369 28788 209.51.188.17 (10 Aug 2020 14:56:09 GMT) X-Complaints-To: action@cs.stanford.edu To: Dave , bug-groff@gnu.org Envelope-to: bug-groff@gnu.org X-PHP-Originating-Script: 1001:sendmail.php X-Savane-Server: savannah.gnu.org:443 [2001:470:142::72] X-Savane-Project: groff X-Savane-Tracker: bugs X-Savane-Item-ID: 58930 User-Agent: Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 X-Apparently-From: 2605:a601:ab42:5b00:d79a:70a3:b6a4:34bf (Savane authenticated user barx) In-Reply-To: X-BeenThere: bug-groff@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Bug reports for the GNU version of nroff, troff et al" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-Mailman-Original-Message-ID: <20200810-095606.sv93119.42780@savannah.gnu.org> Xref: csiph.com gnu.groff.bug:1958 URL: Summary: take baby steps toward Unicode Project: GNU troff Submitted by: barx Submitted on: Mon 10 Aug 2020 09:56:06 AM CDT Category: Core Severity: 3 - Normal Item Group: New feature Status: None Privacy: Public Assigned to: None Open/Closed: Open Discussion Lock: Any Planned Release: None _______________________________________________________ Details: One small change that would improve groff's Unicode support would be to recognize Unicode versions of things groff already knows how to do. Four examples: ==== U+00A0 NO-BREAK SPACE ==== This character is in the Latin-1 character set, which groff recognizes, and when groff's input is in Latin-1 encoding, it correctly handles this character (though I'm not certain whether it interprets it as "\~" or "\ "). But if the input is some other encoding, preconv converts the character into the string "\[u00A0]", which groff does _not_ recognize. In macro space, a simple .char \[u00A0] \~ is enough to take care of this; presumably the equivalent mechanism to make the code handle it internally is just as simple. ==== U+200B ZERO WIDTH SPACE ==== This is another character implemented in an existing groff escape (\:) but unrecognized as "\[u200B]". In this case, the simple, obvious, elegant solution that worked above: .char \[u200B] \: stupidly, irritatingly, and undocumentedly doesn't work. (.char being unable to map something to an escape, or at least to this particular escape, is another bug--either in the implementation, or the lack of documentation of the restriction--for another day.) ==== U+202F NARROW NO-BREAK SPACE ==== Groff has two nonbreaking thin spaces, \| and \^. It is perhaps unclear which of these groff should map "\[u202F]" to, but either one would be an improvement over its current mapping to the warning "can't find special character `u202F'". ==== U+2011 NON-BREAKING HYPHEN ==== I deem this change "extra credit" as it's the least likely to be easily implementable, groff syntax having no direct correlate. Groff can only (via \%) make an entire "word" (sequence of non-whitespace, including hyphens) unbreakable, but has no easy way to support a mix of breaking and nonbreaking hyphens in the same word, such as making the first hyphen of "jack-in-the-box" nonbreaking but the other two breakable. (This can be done with a mix of \% and \: escapes, as "\%jack-in-\:the-\:box" -- or even, taking advantage of the bug/quirk Branden discovered , as "\%jack-in-\:the-box" -- but this is not obvious.) So it's possible, but convoluted, to represent "jack\[u2011]in-the-box" in groff syntax; whether this means it's equally convoluted in the underlying code, or whether the code actually does have the concept of a nonbreaking hyphen but just doesn't expose a direct representation of it to user space, I cannot guess. _______________________________________________________ Reply to this item at: _______________________________________________ Message sent via Savannah https://savannah.gnu.org/