Re: Printing UTF8 (Unicode) - opinions please

From	Carlos <carlos@cvkm.cz>
Newsgroups	comp.lang.postscript
Subject	Re: Printing UTF8 (Unicode) - opinions please
Date	2022-01-23 13:56 +0100
Message-ID	<20220123135610.00003ecb@cvkm.cz> (permalink)
References	<4fe53d50-e66a-82b8-48fd-d0928e149698@davidnewall.com> <61ecc714$1@news.ausics.net>

Show all headers | View raw

V Sun, 23 Jan 2022 14:10:12 +1100
David Newall <davidn@davidnewall.com> napsáno:

> Hi All,
> 
> I'm soliciting opinions...
> 
> On 21/1/22 9:56 pm, David Newall wrote:
> > I've written some PostScript to allow me to print UTF8-encoded
> > strings ...
> > I also use a table which Adobe published ("UNICODE translation
> > table for non-ASCII characters"), which they say is for going from
> > a glyph name to a Unicode codepoint.  I (ab)use it in the reverse
> > direction.  I turned it into a dictionary keyed on the codepoint.  
> Many (most?) fonts have glyphs which aren't in Adobe's table, or which
> are named differently.  Fontforge can write a table of glyphs in a
> font and their corresponding codepoints.  Using that table,
> unicodeshow looks more like this:
> 
>    % lookup a unicode codepoint (int) in a list of known glyphs (dict)
>    % and display the glyph found.
>    % dict int unicodeshow -
>    /unicodeshow {
>      2 copy known { get } { pop pop /.notdef } ifelse glyphshow
>    } bind def
> 
> While this looks much neater, it requires pre-generating a dictionary
> for each font used.
> 
> I can't decide which approach is better.

I think if a font has a mapping between unicode points and glyphs that
you can extract (with Fontforge or whatever), then it surely also has
uni/u aliases. The Adobe table is for older fonts that don't have them,
so it's the only lookup table you need.

> I'm not delighted by needing to add a dictionary that's specific to
> the current font to utfshow and unicodeshow because it feels wrong.

Also, having to pre-process the files to insert the tables is not good.

[...]
> Opinions?  Would adding to a font dictionary going to break things?
> (I'm looking at you, Acrobat and Distiller.)

Don't know about that, I only use Ghostscript. But if the reason to add
a lookup is speed, a possible optimization could be not to call
unicodeshow on each codepoint, but identify string intervals where all
bytes are either <= 127 or > 127. Call show on the former, and utfshow
on the latter.

C.

Back to comp.lang.postscript | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-21 21:56 +1100
  Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-21 14:23 +0100
    Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-22 12:27 +1100
  Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-23 13:31 +1100
    Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-23 13:35 +0100
      Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-26 14:59 +1100
        Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-02-10 15:05 +0100
          Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-02-16 13:55 +1100
  Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-23 14:10 +1100
    Re: Printing UTF8 (Unicode) - opinions please Carlos <carlos@cvkm.cz> - 2022-01-23 13:56 +0100
      Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:37 -0800
    Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:33 -0800
      Re: Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-26 15:06 +1100

csiph-web