Path: csiph.com!aioe.org!.POSTED.U158MBNsLt97drbZ2zludw.user.gioia.aioe.org!not-for-mail From: news@zzo38computer.org.invalid Newsgroups: comp.lang.postscript Subject: Re: JSON reader/writer in PostScript (second version) Date: Sun, 15 Sep 2019 18:34:51 +0000 Organization: Aioe.org NNTP Server Lines: 34 Message-ID: <1568569984.bystand@zzo38computer.org> References: <1567559086.bystand@zzo38computer.org> <1567977875.bystand@zzo38computer.org> <1ac2aa42-301c-4343-99de-2943b33fe7b0@googlegroups.com> NNTP-Posting-Host: U158MBNsLt97drbZ2zludw.user.gioia.aioe.org Mime-Version: 1.0 X-Complaints-To: abuse@aioe.org User-Agent: bystand/0.6.2 X-Notice: Filtered by postfilter v. 0.9.2 Xref: csiph.com comp.lang.postscript:3452 luser droog wrote: > > The next issue is: I don't understand what to do with unicode > characters if they are discovered. It appears that OP's code > reads in the multibyte sequences, constructs the codepoint in > an int, and then truncates that to 8 bits and stores it in a > string. That doesn't seem right, but I can't really think of > anything better. Maybe an option either to leave the utf8 alone, > or convert to arrays of integers? It's not clear to me what > a PostScript program could hope to do with unicode data. > > So I haven't written any utf8 handling. If I do add it, I think > it should be added to the parser library itself as an input > filter. The C version has these already. The first version of my program will treat \u escapes in the way you mention; only the low 8 bits of the codepoints are used. The second version of my program has an option to instead convert any \u escapes into UTF-8 encoding. (However, it will not convert surrogate pairs into astral characters.) Regardless of the version and of the option, if it reads any unescaped non-ASCII characters, they will be passed through as is; it will not interpret UTF-8 input at all, but just passes it through. You might be able to write UTF-8 text on the page with codespace ranges, so maybe there is the possibility to use Unicode data in that way. If you want to add UTF-8 handling in your own program though, you can do it whichever way you think is good, I think. -- Note: I am not always able to read/post messages during Monday-Friday.