Groups | Search | Server Info | Login | Register


Groups > perl.unicode > #210

Re: Encode UTF-8 optimizations

Newsgroups perl.unicode
Subject Re: Encode UTF-8 optimizations
References <201608121731.32716@pali> <20160819084221.GA5236@atrey.karlin.mff.cuni.cz> <bcfddfeb-5045-95b8-b283-7a243542bcab@khwilliamson.com> <201608211034.35281@pali>
Message-ID <5ddbd58f-4d4c-c58e-71d3-188f46a4e052@khwilliamson.com> (permalink)
Date 2016-08-21 08:49 -0600
From public@khwilliamson.com (Karl Williamson)

Show all headers | View raw


On 08/21/2016 02:34 AM, pali@cpan.org wrote:
> On Sunday 21 August 2016 03:10:40 Karl Williamson wrote:
>> Top posting.
>>
>> Attached is my alternative patch.  It effectively uses a different
>> algorithm to avoid decoding the input into code points, and to copy
>> all spans of valid input at once, instead of character at a time.
>>
>> And it uses only currently available functions.
>
> And that's the problem. As already wrote in previous email, calling
> function from shared library cannot be heavy optimized as inlined
> function and cause slow down. You are calling is_utf8_string_loc for
> non-strict mode which is not inlined and so encode/decode of non-strict
> mode will be slower...
>
> And also in is_strict_utf8_string_loc you are calling isUTF8_CHAR which
> is calling _is_utf8_char_slow and which is calling utf8n_to_uvchr which
> cannot be inlined too...
>
> Therefore I think this is not good approach...
>

Then you should run your benchmarks to find out the performance.

On valid input, is_utf8_string_loc() is called once per string.  The 
function call overhead and non-inlining should be not noticeable.

On valid input, is_utf8_char_slow() is never called.  The used-parts can 
be inlined.

On invalid input, performance should be a minor consideration.

The inner loop is much tighter in both functions; likely it can be held 
in the cache.  The algorithm avoids a bunch of work compared to the 
previous one.  I doubt that it will be slower than that.  The only way 
to know in any performance situation is to actually test.  And know that 
things will be different depending on the underlying hardware, so only 
large differences are really significant.

Back to perl.unicode | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
  Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
    Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
      Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
        Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
        Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
            Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
              Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100

csiph-web