Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3739

Re: Printing UTF8 (Unicode)

Subject Re: Printing UTF8 (Unicode)
Newsgroups comp.lang.postscript
References <4fe53d50-e66a-82b8-48fd-d0928e149698@davidnewall.com> <61ecbe1b$1@news.ausics.net> <20220123133511.00003a2f@cvkm.cz>
From David Newall <davidn@davidnewall.com>
Date 2022-01-26 14:59 +1100
Message-ID <61f0c70f$1@news.ausics.net> (permalink)
Organization Ausics - https://www.ausics.net

Show all headers | View raw


Hi Carlos,

Thanks for your very useful feedback.

I will say, up-front, that using Adobe Glyph List (glyphlist.txt found
at https://github.com/adobe-type-tools/agl-aglfn) is often sufficient,
depending on what unicode values need to be painted and what font is to
be used.  But I want to do better than "often".

I'm using https://antofthy.gitlab.io/info/data/utf8-demo.txt to test my
code.  It's coverage is ... extensive (and my current code seems to work
for all of it -- font withstanding.)


On 23/1/22 11:35 pm, Carlos wrote:
> Adobe's table (or one similar to it) is included in Ghostscript
> (AdobeGlyphList), and maybe other interpreters, too.

I didn't know about AdobeGlyphList.  The one in Ghostscript (9.50) has
multiple names for some unicode values.  Converseley Adobe Glyph List
(glyphlist.txt found at //github.com/adobe-type-tools/agl-aglfn)
contains multiple values for some names.

No font is guaranteed to use any of these names and many fonts that I've
examined use different names for unicode values (and different values
for some names.)

> If you know you are dealing with modern fonts that include the uni/u
> aliases, you can get rid of the Adobe table lookup altogether... You
> don't need the canonical glyph names for those fonts.

No font that I've examined includes uni/u names for every glyph, or even
for most glyphs.

One can't rely on any pre-determined glyph name, nor any pre-determined
lookup table.  What a mess.


On 23/1/22 11:56 pm, Carlos wrote:
 > I think if a font has a mapping between unicode points and glyphs that
 > you can extract (with Fontforge or whatever), then it surely also has
 > uni/u aliases. The Adobe table is for older fonts that don't have them,
 > so it's the only lookup table you need.

I wish that were true, but it's not.

After your comment about older fonts, I examined Courier, a Type 1 font
(https://web.archive.org/web/20010617080950/http://www.ctan.org/tex-archive/fonts/psfonts/courier/). 
  The CharStrings array breaks my
assumptions and my code completely fails.

 >> I'm not delighted by needing to add a dictionary that's specific to
 >> the current font to utfshow and unicodeshow because it feels wrong.
 >
 > Also, having to pre-process the files to insert the tables is not good.

I completely agree.  I don't like it.  I want to be able to use any font
without preprocessing, but I can't see how.


 > a possible optimization could be not to call
 > unicodeshow on each codepoint, but identify string intervals where all
 > bytes are either <= 127 or > 127. Call show on the former, and utfshow
 > on the latter.

Agreed.  Ps2pdf slows down dramatically with large number of glyphshows.
https://antofthy.gitlab.io/info/data/utf8-demo.txt, which is 50K, takes
4 minutes to process using utf8show and ps2pdf.  The utf8-decode phase
takes 20ms and Ghostscript takes 510ms.

For anyone interested, https://davidnewall/software/utf8show.  It's
still a work-in-progress.

David

Back to comp.lang.postscript | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-21 21:56 +1100
  Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-21 14:23 +0100
    Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-22 12:27 +1100
  Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-23 13:31 +1100
    Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-23 13:35 +0100
      Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-26 14:59 +1100
        Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-02-10 15:05 +0100
          Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-02-16 13:55 +1100
  Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-23 14:10 +1100
    Re: Printing UTF8 (Unicode) - opinions please Carlos <carlos@cvkm.cz> - 2022-01-23 13:56 +0100
      Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:37 -0800
    Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:33 -0800
      Re: Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-26 15:06 +1100

csiph-web