Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > gnu.groff.bug > #1968
| Path | csiph.com!goblin2!goblin-spool!goblin1!goblin.stu.neva.ru!usenet.stanford.edu!not-for-mail |
|---|---|
| From | "G. Branden Robinson" <INVALID.NOREPLY@gnu.org> |
| Newsgroups | gnu.groff.bug |
| Subject | [bug #58930] take baby steps toward Unicode |
| Date | Fri, 14 Aug 2020 06:00:02 -0400 (EDT) |
| Lines | 104 |
| Approved | bug-groff@gnu.org |
| Message-ID | <mailman.2034.1597399205.2739.bug-groff@gnu.org> (permalink) |
| References | <20200810-095606.sv93119.42780@savannah.gnu.org> <20200814-100002.sv108747.62919@savannah.gnu.org> |
| NNTP-Posting-Host | lists.gnu.org |
| Mime-Version | 1.0 |
| Content-Type | text/plain;charset=UTF-8 |
| X-Trace | usenet.stanford.edu 1597399205 6334 209.51.188.17 (14 Aug 2020 10:00:05 GMT) |
| X-Complaints-To | action@cs.stanford.edu |
| To | "G. Branden Robinson" <g.branden.robinson@gmail.com>, Dave <saint.snit@gmail.com>, bug-groff@gnu.org |
| Envelope-to | bug-groff@gnu.org |
| X-PHP-Originating-Script | 1001:sendmail.php |
| X-Savane-Server | savannah.gnu.org:443 [209.51.188.72] |
| X-Savane-Project | groff |
| X-Savane-Tracker | bugs |
| X-Savane-Item-ID | 58930 |
| User-Agent | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Firefox/68.0 |
| X-Apparently-From | 1.129.111.45 (Savane authenticated user gbranden) |
| In-Reply-To | <20200810-095606.sv93119.42780@savannah.gnu.org> |
| X-BeenThere | bug-groff@gnu.org |
| X-Mailman-Version | 2.1.23 |
| Precedence | list |
| List-Id | "Bug reports for the GNU version of nroff, troff et al" <bug-groff.gnu.org> |
| List-Unsubscribe | <https://lists.gnu.org/mailman/options/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=unsubscribe> |
| List-Archive | <https://lists.gnu.org/archive/html/bug-groff> |
| List-Post | <mailto:bug-groff@gnu.org> |
| List-Help | <mailto:bug-groff-request@gnu.org?subject=help> |
| List-Subscribe | <https://lists.gnu.org/mailman/listinfo/bug-groff>, <mailto:bug-groff-request@gnu.org?subject=subscribe> |
| X-Mailman-Original-Message-ID | <20200814-100002.sv108747.62919@savannah.gnu.org> |
| X-Mailman-Original-References | <20200810-095606.sv93119.42780@savannah.gnu.org> |
| Xref | csiph.com gnu.groff.bug:1968 |
Show key headers only | View raw
Update of bug #58930 (project groff):
Status: None => Need Info
Assigned to: None => gbranden
_______________________________________________________
Follow-up Comment #1:
It's a little demoralizing that even these baby steps seem fraught with
complication.
1. "U+00A0 NO-BREAK SPACE
This character is in the Latin-1 character set, which groff recognizes, and
when groff's input is in Latin-1 encoding, it correctly handles this character
(though I'm not certain whether it interprets it as "\~" or "\ ")."
None of the above, it seems:
$ cat EXPERIMENTS/spaces.groff
.pl 1v
.if '\ '\ ' \eSP = \eSP
.if '\ '\~' \eSP = \e\[ti]
.if '\ '\[u00A0]' \eSP = \e[u00A0]
.br
.if '\~'\ ' \e\[ti] = \eSP
.if '\~'\~' \e\[ti] = \e\[ti]
.if '\~'\[u00A0]' \e\[ti] = \e[u00A0]
.br
.if '\[u00A0]'\ ' \e[u00A0] = \eSP
.if '\[u00A0]'\~' \e[u00A0] = \e\[ti]
.if '\[u00A0]'\[u00A0]' \e[u00A0] = \e[u00A0]
$ ./build/test-groff -Tutf8
\SP = \SP
\~ = \~
\[u00A0] = \[u00A0]
None of these are equivalent to the others. :-/
2. The behavior of \: when used as the RHS of a .char request does indeed seem
a bit strange. It looks like the transform is just not happening:
.pl 1v
.char \[u200B] \:
.ds a \[u200B]
.length i \*a
\ni
8
.pl 1v
.ds a \[u200B]
.length i \*a
\ni
8
.pl 1v
.char a b
.ds a a
\*a
b
That unchanged length of 8, the exact character count of "\[u2000B]" is highly
suspicious to me.
3. Narrow no-break space. Have you named all of the non-breaking spaces in
Unicode in this ticket? I know there are bunch of others (hair space, thin
space, ideographic space, ...) but I don't know what their breaking semantics
are in Unicode.
4. A non-breaking hyphen would then be something that looks like \[hy] but
doesn't actually break? I don't know that this is actually the hardest of the
tasks on this list. You can just use the character as-is in input. groff
doesn't know it's a hyphen, and no hyphenation patterns include it, so it
never gets a break after it.
$ cat EXPERIMENTS/non-breaking-hyphen.groff
.pl 1v
.ds a a\[u2011]
.nr b 50 -1
.while \n+b \*a\c
troff: warning [p 1, 0.0i]: can't break line
a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑a‑
Let me know what you think of these findings.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58930>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
Back to gnu.groff.bug | Previous | Next | Find similar
[bug #58930] take baby steps toward Unicode "G. Branden Robinson" <INVALID.NOREPLY@gnu.org> - 2020-08-14 06:00 -0400
csiph-web