Re: JSON reader/writer in PostScript (second version)

Path	csiph.com!aioe.org!.POSTED.U158MBNsLt97drbZ2zludw.user.gioia.aioe.org!not-for-mail
From	news@zzo38computer.org.invalid
Newsgroups	comp.lang.postscript
Subject	Re: JSON reader/writer in PostScript (second version)
Date	Sun, 15 Sep 2019 18:34:51 +0000
Organization	Aioe.org NNTP Server
Lines	34
Message-ID	<1568569984.bystand@zzo38computer.org> (permalink)
References	<1567559086.bystand@zzo38computer.org> <1567977875.bystand@zzo38computer.org> <1ac2aa42-301c-4343-99de-2943b33fe7b0@googlegroups.com> <a148ab80-3ae6-4bd4-aa86-9c6e8533d8eb@googlegroups.com>
NNTP-Posting-Host	U158MBNsLt97drbZ2zludw.user.gioia.aioe.org
Mime-Version	1.0
X-Complaints-To	abuse@aioe.org
User-Agent	bystand/0.6.2
X-Notice	Filtered by postfilter v. 0.9.2
Xref	csiph.com comp.lang.postscript:3452

Show key headers only | View raw

luser droog <luser.droog@gmail.com> wrote:
> 
> The next issue is: I don't understand what to do with unicode 
> characters if they are discovered. It appears that OP's code
> reads in the multibyte sequences, constructs the codepoint in
> an int, and then truncates that to 8 bits and stores it in a
> string. That doesn't seem right, but I can't really think of
> anything better. Maybe an option either to leave the utf8 alone,
> or convert to arrays of integers? It's not clear to me what
> a PostScript program could hope to do with unicode data.
> 
> So I haven't written any utf8 handling. If I do add it, I think
> it should be added to the parser library itself as an input
> filter. The C version has these already.

The first version of my program will treat \u escapes in the way you
mention; only the low 8 bits of the codepoints are used.

The second version of my program has an option to instead convert any \u
escapes into UTF-8 encoding. (However, it will not convert surrogate pairs
into astral characters.)

Regardless of the version and of the option, if it reads any unescaped
non-ASCII characters, they will be passed through as is; it will not
interpret UTF-8 input at all, but just passes it through.

You might be able to write UTF-8 text on the page with codespace ranges,
so maybe there is the possibility to use Unicode data in that way.

If you want to add UTF-8 handling in your own program though, you can do
it whichever way you think is good, I think.

-- 
Note: I am not always able to read/post messages during Monday-Friday.

Back to comp.lang.postscript | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

JSON reader/writer in PostScript news@zzo38computer.org.invalid - 2019-09-04 01:19 +0000
  Re: JSON reader/writer in PostScript luser droog <luser.droog@gmail.com> - 2019-09-05 23:05 -0700
    Re: JSON reader/writer in PostScript luser droog <luser.droog@gmail.com> - 2019-09-06 22:06 -0700
  JSON reader/writer in PostScript (second version) news@zzo38computer.org.invalid - 2019-09-08 21:32 +0000
    Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-12 19:11 -0700
      Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-14 19:50 -0700
        Re: JSON reader/writer in PostScript (second version) news@zzo38computer.org.invalid - 2019-09-15 18:34 +0000
    Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-13 19:53 -0700

csiph-web