Groups | Search | Server Info | Login | Register
| Newsgroups | perl.unicode |
|---|---|
| Subject | Re: Encode UTF-8 optimizations |
| References | <201608121731.32716@pali> <20160822130518.GA9176@pali> <62e8d4d6-b474-b037-5d77-5f67d3e20371@khwilliamson.com> <201608222247.46077@pali> |
| Message-ID | <40d6edc2-8dd7-d869-2387-577d67783ef3@khwilliamson.com> (permalink) |
| Date | 2016-08-24 22:49 -0600 |
| From | public@khwilliamson.com (Karl Williamson) |
On 08/22/2016 02:47 PM, pali@cpan.org wrote:
snip
> I added some tests for overlong sequences. Only for ASCII platforms, tests for EBCDIC
> are missing (sorry, I do not have access to any EBCDIC platform for testing).
It's fine to skip those tests on EBCDIC.
>
>>> > > Anyway, how it behave on EBCDIC platforms? And maybe another question
>>> > > what should Encode::encode('UTF-8', $str) do on EBCDIC? Encode $str to
>>> > > UTF-8 or to UTF-EBCDIC?
>> >
>> > It works fine on EBCDIC platforms. There are other bugs in Encode on
>> > EBCDIC that I plan on investigating as time permits. Doing this has
>> > fixed some of these for free. The uvuni() functions should in almost
>> > all instances be uvchr(), and my patch does that.
> Now I'm thinking if FBCHAR_UTF8 define is working also on EBCDIC... I think that it
> should be different for UTF-EBCDIC.
I'll fix that
>
>> > On EBCDIC platforms, UTF-8 is defined to be UTF-EBCDIC (or vice versa if
>> > you prefer), so $str will effectively be in the version of UTF-EBCDIC
>> > valid for the platform it is running on (there are differences depending
>> > on the platform's underlying code page).
> So it means that on EBCDIC platforms you cannot process file which is encoded in UTF-8?
> As Encode::decode("UTF-8", $str) expect $str to be in UTF-EBCDIC and not in UTF-8 (as I
> understood).
>
Yes. The two worlds do not meet. If you are on an EBCDIC platform, the
native encoding is UTF-EBCDIC tailored to the code page the platform
runs on.
In searching, I did not find anything that converts between the two, so
I wrote a Perl script to do so. Our OS/390 man, Yaroslav, wrote one in C.
Back to perl.unicode | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100
csiph-web