Groups | Search | Server Info | Login | Register
| Newsgroups | perl.unicode |
|---|---|
| Subject | Re: Encode UTF-8 optimizations |
| References | <201608121731.32716@pali> <20160819084221.GA5236@atrey.karlin.mff.cuni.cz> <bcfddfeb-5045-95b8-b283-7a243542bcab@khwilliamson.com> <201608211034.35281@pali> |
| Message-ID | <5ddbd58f-4d4c-c58e-71d3-188f46a4e052@khwilliamson.com> (permalink) |
| Date | 2016-08-21 08:49 -0600 |
| From | public@khwilliamson.com (Karl Williamson) |
On 08/21/2016 02:34 AM, pali@cpan.org wrote: > On Sunday 21 August 2016 03:10:40 Karl Williamson wrote: >> Top posting. >> >> Attached is my alternative patch. It effectively uses a different >> algorithm to avoid decoding the input into code points, and to copy >> all spans of valid input at once, instead of character at a time. >> >> And it uses only currently available functions. > > And that's the problem. As already wrote in previous email, calling > function from shared library cannot be heavy optimized as inlined > function and cause slow down. You are calling is_utf8_string_loc for > non-strict mode which is not inlined and so encode/decode of non-strict > mode will be slower... > > And also in is_strict_utf8_string_loc you are calling isUTF8_CHAR which > is calling _is_utf8_char_slow and which is calling utf8n_to_uvchr which > cannot be inlined too... > > Therefore I think this is not good approach... > Then you should run your benchmarks to find out the performance. On valid input, is_utf8_string_loc() is called once per string. The function call overhead and non-inlining should be not noticeable. On valid input, is_utf8_char_slow() is never called. The used-parts can be inlined. On invalid input, performance should be a minor consideration. The inner loop is much tighter in both functions; likely it can be held in the cache. The algorithm avoids a bunch of work compared to the previous one. I doubt that it will be slower than that. The only way to know in any performance situation is to actually test. And know that things will be different depending on the underlying hardware, so only large differences are really significant.
Back to perl.unicode | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100
csiph-web