Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #3739
| Subject | Re: Printing UTF8 (Unicode) |
|---|---|
| Newsgroups | comp.lang.postscript |
| References | <4fe53d50-e66a-82b8-48fd-d0928e149698@davidnewall.com> <61ecbe1b$1@news.ausics.net> <20220123133511.00003a2f@cvkm.cz> |
| From | David Newall <davidn@davidnewall.com> |
| Date | 2022-01-26 14:59 +1100 |
| Message-ID | <61f0c70f$1@news.ausics.net> (permalink) |
| Organization | Ausics - https://www.ausics.net |
Hi Carlos, Thanks for your very useful feedback. I will say, up-front, that using Adobe Glyph List (glyphlist.txt found at https://github.com/adobe-type-tools/agl-aglfn) is often sufficient, depending on what unicode values need to be painted and what font is to be used. But I want to do better than "often". I'm using https://antofthy.gitlab.io/info/data/utf8-demo.txt to test my code. It's coverage is ... extensive (and my current code seems to work for all of it -- font withstanding.) On 23/1/22 11:35 pm, Carlos wrote: > Adobe's table (or one similar to it) is included in Ghostscript > (AdobeGlyphList), and maybe other interpreters, too. I didn't know about AdobeGlyphList. The one in Ghostscript (9.50) has multiple names for some unicode values. Converseley Adobe Glyph List (glyphlist.txt found at //github.com/adobe-type-tools/agl-aglfn) contains multiple values for some names. No font is guaranteed to use any of these names and many fonts that I've examined use different names for unicode values (and different values for some names.) > If you know you are dealing with modern fonts that include the uni/u > aliases, you can get rid of the Adobe table lookup altogether... You > don't need the canonical glyph names for those fonts. No font that I've examined includes uni/u names for every glyph, or even for most glyphs. One can't rely on any pre-determined glyph name, nor any pre-determined lookup table. What a mess. On 23/1/22 11:56 pm, Carlos wrote: > I think if a font has a mapping between unicode points and glyphs that > you can extract (with Fontforge or whatever), then it surely also has > uni/u aliases. The Adobe table is for older fonts that don't have them, > so it's the only lookup table you need. I wish that were true, but it's not. After your comment about older fonts, I examined Courier, a Type 1 font (https://web.archive.org/web/20010617080950/http://www.ctan.org/tex-archive/fonts/psfonts/courier/). The CharStrings array breaks my assumptions and my code completely fails. >> I'm not delighted by needing to add a dictionary that's specific to >> the current font to utfshow and unicodeshow because it feels wrong. > > Also, having to pre-process the files to insert the tables is not good. I completely agree. I don't like it. I want to be able to use any font without preprocessing, but I can't see how. > a possible optimization could be not to call > unicodeshow on each codepoint, but identify string intervals where all > bytes are either <= 127 or > 127. Call show on the former, and utfshow > on the latter. Agreed. Ps2pdf slows down dramatically with large number of glyphshows. https://antofthy.gitlab.io/info/data/utf8-demo.txt, which is 50K, takes 4 minutes to process using utf8show and ps2pdf. The utf8-decode phase takes 20ms and Ghostscript takes 510ms. For anyone interested, https://davidnewall/software/utf8show. It's still a work-in-progress. David
Back to comp.lang.postscript | Previous | Next — Previous in thread | Next in thread | Find similar
Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-21 21:56 +1100
Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-21 14:23 +0100
Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-22 12:27 +1100
Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-23 13:31 +1100
Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-23 13:35 +0100
Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-26 14:59 +1100
Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-02-10 15:05 +0100
Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-02-16 13:55 +1100
Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-23 14:10 +1100
Re: Printing UTF8 (Unicode) - opinions please Carlos <carlos@cvkm.cz> - 2022-01-23 13:56 +0100
Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:37 -0800
Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:33 -0800
Re: Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-26 15:06 +1100
csiph-web