Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.postscript > #3452
| Path | csiph.com!aioe.org!.POSTED.U158MBNsLt97drbZ2zludw.user.gioia.aioe.org!not-for-mail |
|---|---|
| From | news@zzo38computer.org.invalid |
| Newsgroups | comp.lang.postscript |
| Subject | Re: JSON reader/writer in PostScript (second version) |
| Date | Sun, 15 Sep 2019 18:34:51 +0000 |
| Organization | Aioe.org NNTP Server |
| Lines | 34 |
| Message-ID | <1568569984.bystand@zzo38computer.org> (permalink) |
| References | <1567559086.bystand@zzo38computer.org> <1567977875.bystand@zzo38computer.org> <1ac2aa42-301c-4343-99de-2943b33fe7b0@googlegroups.com> <a148ab80-3ae6-4bd4-aa86-9c6e8533d8eb@googlegroups.com> |
| NNTP-Posting-Host | U158MBNsLt97drbZ2zludw.user.gioia.aioe.org |
| Mime-Version | 1.0 |
| X-Complaints-To | abuse@aioe.org |
| User-Agent | bystand/0.6.2 |
| X-Notice | Filtered by postfilter v. 0.9.2 |
| Xref | csiph.com comp.lang.postscript:3452 |
Show key headers only | View raw
luser droog <luser.droog@gmail.com> wrote: > > The next issue is: I don't understand what to do with unicode > characters if they are discovered. It appears that OP's code > reads in the multibyte sequences, constructs the codepoint in > an int, and then truncates that to 8 bits and stores it in a > string. That doesn't seem right, but I can't really think of > anything better. Maybe an option either to leave the utf8 alone, > or convert to arrays of integers? It's not clear to me what > a PostScript program could hope to do with unicode data. > > So I haven't written any utf8 handling. If I do add it, I think > it should be added to the parser library itself as an input > filter. The C version has these already. The first version of my program will treat \u escapes in the way you mention; only the low 8 bits of the codepoints are used. The second version of my program has an option to instead convert any \u escapes into UTF-8 encoding. (However, it will not convert surrogate pairs into astral characters.) Regardless of the version and of the option, if it reads any unescaped non-ASCII characters, they will be passed through as is; it will not interpret UTF-8 input at all, but just passes it through. You might be able to write UTF-8 text on the page with codespace ranges, so maybe there is the possibility to use Unicode data in that way. If you want to add UTF-8 handling in your own program though, you can do it whichever way you think is good, I think. -- Note: I am not always able to read/post messages during Monday-Friday.
Back to comp.lang.postscript | Previous | Next — Previous in thread | Next in thread | Find similar
JSON reader/writer in PostScript news@zzo38computer.org.invalid - 2019-09-04 01:19 +0000
Re: JSON reader/writer in PostScript luser droog <luser.droog@gmail.com> - 2019-09-05 23:05 -0700
Re: JSON reader/writer in PostScript luser droog <luser.droog@gmail.com> - 2019-09-06 22:06 -0700
JSON reader/writer in PostScript (second version) news@zzo38computer.org.invalid - 2019-09-08 21:32 +0000
Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-12 19:11 -0700
Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-14 19:50 -0700
Re: JSON reader/writer in PostScript (second version) news@zzo38computer.org.invalid - 2019-09-15 18:34 +0000
Re: JSON reader/writer in PostScript (second version) luser droog <luser.droog@gmail.com> - 2019-09-13 19:53 -0700
csiph-web