Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.groff.bug > #1958
| Path | csiph.com!goblin2!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail |
|---|---|
| From | Dave <INVALID.NOREPLY@gnu.org> |
| Newsgroups | gnu.groff.bug |
| Subject | [bug #58930] take baby steps toward Unicode |
| Date | Mon, 10 Aug 2020 10:56:08 -0400 (EDT) |
| Lines | 93 |
| Approved | bug-groff@gnu.org |
| Message-ID | <mailman.1407.1597071369.2739.bug-groff@gnu.org> (permalink) |
| References | <20200810-095606.sv93119.42780@savannah.gnu.org> |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | text/plain;charset=UTF-8 |
| X-Trace | usenet.stanford.edu 1597071369 28788 209.51.188.17 (10 Aug 2020 14:56:09 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | Dave <saint.snit@gmail.com>, bug-groff@gnu.org |
| Envelope-to | bug-groff@gnu.org |
| X-PHP-Originating-Script | 1001:sendmail.php |
| X-Savane-Server | savannah.gnu.org:443 [2001:470:142::72] |
| X-Savane-Project | groff |
| X-Savane-Tracker | bugs |
| X-Savane-Item-ID | 58930 |
| User-Agent | Mozilla/5.0 (X11; Linux i686; rv:45.0) Gecko/20100101 Firefox/45.0 |
| X-Apparently-From | 2605:a601:ab42:5b00:d79a:70a3:b6a4:34bf (Savane authenticated user barx) |
| In-Reply-To | |
| X-BeenThere | bug-groff@gnu.org |
| X-Mailman-Version | 2.1.23 |
| Precedence | list |
| List-Id | "Bug reports for the GNU version of nroff, troff et al" <bug-groff.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=unsubscribe> |
| List-Archive | <https://lists.gnu.org/archive/html/bug-groff> |
| List-Post | <mailto:bug-groff@gnu.org> |
| List-Help | <mailto:bug-groff-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <20200810-095606.sv93119.42780@savannah.gnu.org> |
| Xref | csiph.com gnu.groff.bug:1958 |
Show key headers only | View raw
URL:
<https://savannah.gnu.org/bugs/?58930>
Summary: take baby steps toward Unicode
Project: GNU troff
Submitted by: barx
Submitted on: Mon 10 Aug 2020 09:56:06 AM CDT
Category: Core
Severity: 3 - Normal
Item Group: New feature
Status: None
Privacy: Public
Assigned to: None
Open/Closed: Open
Discussion Lock: Any
Planned Release: None
_______________________________________________________
Details:
One small change that would improve groff's Unicode support would be to
recognize Unicode versions of things groff already knows how to do.
Four examples:
==== U+00A0 NO-BREAK SPACE ====
This character is in the Latin-1 character set, which groff recognizes, and
when groff's input is in Latin-1 encoding, it correctly handles this character
(though I'm not certain whether it interprets it as "\~" or "\ ").
But if the input is some other encoding, preconv converts the character into
the string "\[u00A0]", which groff does _not_ recognize. In macro space, a
simple
.char \[u00A0] \~
is enough to take care of this; presumably the equivalent mechanism to make
the code handle it internally is just as simple.
==== U+200B ZERO WIDTH SPACE ====
This is another character implemented in an existing groff escape (\:) but
unrecognized as "\[u200B]".
In this case, the simple, obvious, elegant solution that worked above:
.char \[u200B] \:
stupidly, irritatingly, and undocumentedly doesn't work. (.char being unable
to map something to an escape, or at least to this particular escape, is
another bug--either in the implementation, or the lack of documentation of the
restriction--for another day.)
==== U+202F NARROW NO-BREAK SPACE ====
Groff has two nonbreaking thin spaces, \| and \^. It is perhaps unclear which
of these groff should map "\[u202F]" to, but either one would be an
improvement over its current mapping to the warning "can't find special
character `u202F'".
==== U+2011 NON-BREAKING HYPHEN ====
I deem this change "extra credit" as it's the least likely to be easily
implementable, groff syntax having no direct correlate. Groff can only (via
\%) make an entire "word" (sequence of non-whitespace, including hyphens)
unbreakable, but has no easy way to support a mix of breaking and nonbreaking
hyphens in the same word, such as making the first hyphen of "jack-in-the-box"
nonbreaking but the other two breakable. (This can be done with a mix of \%
and \: escapes, as "\%jack-in-\:the-\:box" -- or even, taking advantage of the
bug/quirk Branden discovered
<http://lists.gnu.org/archive/html/groff/2020-07/msg00047.html>, as
"\%jack-in-\:the-box" -- but this is not obvious.) So it's possible, but
convoluted, to represent "jack\[u2011]in-the-box" in groff syntax; whether
this means it's equally convoluted in the underlying code, or whether the code
actually does have the concept of a nonbreaking hyphen but just doesn't expose
a direct representation of it to user space, I cannot guess.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58930>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Back to gnu.groff.bug | Previous | Next | Find similar
[bug #58930] take baby steps toward Unicode Dave <INVALID.NOREPLY@gnu.org> - 2020-08-10 10:56 -0400
csiph-web