Groups | Search | Server Info | Login | Register


Groups > perl.unicode > #215

Re: Encode UTF-8 optimizations

Newsgroups perl.unicode
Subject Re: Encode UTF-8 optimizations
References <201608121731.32716@pali> <20160822130518.GA9176@pali> <62e8d4d6-b474-b037-5d77-5f67d3e20371@khwilliamson.com> <201608222247.46077@pali>
Message-ID <4e654c06-2b51-31a8-c2ab-f98c4dcf421d@khwilliamson.com> (permalink)
Date 2016-08-22 15:19 -0600
From public@khwilliamson.com (Karl Williamson)

Show all headers | View raw


On 08/22/2016 02:47 PM, pali@cpan.org wrote:
>> > And I think you misunderstand when is_utf8_char_slow() is called.  It is
>> > called only when the next byte in the input indicates that the only
>> > legal UTF-8 that might follow would be for a code point that is at least
>> > U+200000, almost twice as high as the highest legal Unicode code point.
>> > It is a Perl extension to handle such code points, unlike other
>> > languages.  But the Perl core is not optimized for them, nor will it be.
>> >   My point is that is_utf8_char_slow() will only be called in very
>> > specialized cases, and we need not make those cases have as good a
>> > performance as normal ones.
> In strict mode, there is absolutely no need to call is_utf8_char_slow(). As in strict
> mode such sequence must be always invalid (it is above last valid Unicode character)
> This is what I tried to tell.
>
> And currently is_strict_utf8_string_loc() first calls isUTF8_CHAR() (which could call
> is_utf8_char_slow()) and after that is check for UTF8_IS_SUPER().

I only have time to respond to this portion just now.

The code could be tweaked to call UTF8_IS_SUPER first, but I'm asserting 
that an optimizing compiler will see that any call to 
is_utf8_char_slow() is pointless, and will optimize it out.

Back to perl.unicode | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
  Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
    Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
      Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
        Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
        Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
            Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
              Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100

csiph-web