Groups | Search | Server Info | Login | Register


Groups > perl.unicode > #220

Re: Encode UTF-8 optimizations

Newsgroups perl.unicode
Date 2016-08-25 09:48 +0200
Subject Re: Encode UTF-8 optimizations
Message-ID <20160825074833.GA27810@pali> (permalink)
References <201608121731.32716@pali> <20160822130518.GA9176@pali> <62e8d4d6-b474-b037-5d77-5f67d3e20371@khwilliamson.com> <201608222247.46077@pali> <40d6edc2-8dd7-d869-2387-577d67783ef3@khwilliamson.com>
From pali@cpan.org

Show all headers | View raw


On Wednesday 24 August 2016 22:49:21 Karl Williamson wrote:
> On 08/22/2016 02:47 PM, pali@cpan.org wrote:
> 
> snip
> 
> >I added some tests for overlong sequences. Only for ASCII platforms, tests for EBCDIC
> >are missing (sorry, I do not have access to any EBCDIC platform for testing).
> 
> It's fine to skip those tests on EBCDIC.

Ok.

> >>>> > Anyway, how it behave on EBCDIC platforms? And maybe another question
> >>>> > what should  Encode::encode('UTF-8', $str) do on EBCDIC? Encode $str to
> >>>> > UTF-8 or to UTF-EBCDIC?
> >>>
> >>> It works fine on EBCDIC platforms.  There are other bugs in Encode on
> >>> EBCDIC that I plan on investigating as time permits.  Doing this has
> >>> fixed some of these for free.  The uvuni() functions should in almost
> >>> all instances be uvchr(), and my patch does that.
> >Now I'm thinking if FBCHAR_UTF8 define is working also on EBCDIC... I think that it
> >should be different for UTF-EBCDIC.
> 
> I'll fix that
> >
> >>> On EBCDIC platforms, UTF-8 is defined to be UTF-EBCDIC (or vice versa if
> >>> you prefer), so $str will effectively be in the version of UTF-EBCDIC
> >>> valid for the platform it is running on (there are differences depending
> >>> on the platform's underlying code page).
> >So it means that on EBCDIC platforms you cannot process file which is encoded in UTF-8?
> >As Encode::decode("UTF-8", $str) expect $str to be in UTF-EBCDIC and not in UTF-8 (as I
> >understood).
> >
> Yes.  The two worlds do not meet.  If you are on an EBCDIC platform, the
> native encoding is UTF-EBCDIC tailored to the code page the platform runs
> on.
> 
> In searching, I did not find anything that converts between the two, so I
> wrote a Perl script to do so.  Our OS/390 man, Yaroslav, wrote one in C.

Thank you for information! I though that "UTF-8" encoding (with hyphen)
is that strict and correct UTF-8 version on both ASCII & EBCDIC
platforms as in Encode documentation is nothing written that on EBCDIC
is is different...

Anyway, if you need some help with Encode module or something different,
let me know. As I want to have UTF-8 support in Encode correctly
working...

Back to perl.unicode | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
  Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
    Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
      Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
        Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
        Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
            Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
              Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100

csiph-web