Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.postscript > #3730

Printing UTF8 (Unicode) - opinions please

Subject Printing UTF8 (Unicode) - opinions please
Newsgroups comp.lang.postscript
References <4fe53d50-e66a-82b8-48fd-d0928e149698@davidnewall.com>
From David Newall <davidn@davidnewall.com>
Date Sun, 23 Jan 2022 14:10:12 +1100
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.14.0
MIME-Version 1.0
In-Reply-To <4fe53d50-e66a-82b8-48fd-d0928e149698@davidnewall.com>
Content-Type text/plain; charset=utf-8; format=flowed
Content-Language en-US
Content-Transfer-Encoding 8bit
NNTP-Posting-Host news.ausics.net
Message-ID <61ecc714$1@news.ausics.net> (permalink)
Organization Ausics - https://www.ausics.net
Lines 52
X-Complaints abuse@ausics.net
Path csiph.com!tncsrv06.tnetconsulting.net!news.bbs.nz!news.ausics.net!not-for-mail
Xref csiph.com comp.lang.postscript:3730

Show key headers only | View raw


Hi All,

I'm soliciting opinions...

On 21/1/22 9:56 pm, David Newall wrote:
> I've written some PostScript to allow me to print UTF8-encoded strings
> ...
> I also use a table which Adobe published ("UNICODE translation table for 
> non-ASCII characters"), which they say is for going from a glyph name to 
> a Unicode codepoint.  I (ab)use it in the reverse direction.  I turned 
> it into a dictionary keyed on the codepoint.

Many (most?) fonts have glyphs which aren't in Adobe's table, or which
are named differently.  Fontforge can write a table of glyphs in a font
and their corresponding codepoints.  Using that table, unicodeshow looks
more like this:

   % lookup a unicode codepoint (int) in a list of known glyphs (dict)
   % and display the glyph found.
   % dict int unicodeshow -
   /unicodeshow {
     2 copy known { get } { pop pop /.notdef } ifelse glyphshow
   } bind def

While this looks much neater, it requires pre-generating a dictionary
for each font used.

I can't decide which approach is better.

I'm not delighted by needing to add a dictionary that's specific to the
current font to utfshow and unicodeshow because it feels wrong.

I suppose whatever fonts are used to print unicode will be embedded in
the PS, so I could add the table to each font's dictionary.  I wonder if
that would cause confusion to anybody reading the code:

   /unicodeshow { % int unicodeshow -
     currentfont /unicode 2 copy known not {
       pop pop /unicodeshow cvx /invalidfont
       /.error where {pop .error} {signalerror} ifelse
     } if
     get exch 2 copy known { get } { pop pop /.notdef } ifelse glyphshow
   } bind def

Maybe that's not so awful.

Opinions?  Would adding to a font dictionary going to break things?
(I'm looking at you, Acrobat and Distiller.)

Regards,

David

Back to comp.lang.postscript | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-21 21:56 +1100
  Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-21 14:23 +0100
    Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-22 12:27 +1100
  Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-23 13:31 +1100
    Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-01-23 13:35 +0100
      Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-01-26 14:59 +1100
        Re: Printing UTF8 (Unicode) Carlos <carlos@cvkm.cz> - 2022-02-10 15:05 +0100
          Re: Printing UTF8 (Unicode) David Newall <davidn@davidnewall.com> - 2022-02-16 13:55 +1100
  Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-23 14:10 +1100
    Re: Printing UTF8 (Unicode) - opinions please Carlos <carlos@cvkm.cz> - 2022-01-23 13:56 +0100
      Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:37 -0800
    Re: Printing UTF8 (Unicode) - opinions please luser droog <luser.droog@gmail.com> - 2022-01-24 08:33 -0800
      Re: Printing UTF8 (Unicode) - opinions please David Newall <davidn@davidnewall.com> - 2022-01-26 15:06 +1100

csiph-web