Groups | Search | Server Info | Login | Register
| Newsgroups | perl.unicode |
|---|---|
| Subject | Re: Encode UTF-8 optimizations |
| Date | 2016-08-22 23:39 +0200 |
| References | <201608121731.32716@pali> <201608222247.46077@pali> <4e654c06-2b51-31a8-c2ab-f98c4dcf421d@khwilliamson.com> |
| Message-ID | <201608222339.51155@pali> (permalink) |
| From | pali@cpan.org |
(this only applies for strict UTF-8)
On Monday 22 August 2016 23:19:51 Karl Williamson wrote:
> The code could be tweaked to call UTF8_IS_SUPER first, but I'm
> asserting that an optimizing compiler will see that any call to
> is_utf8_char_slow() is pointless, and will optimize it out.
Such optimization cannot be done and compiler cannot know such thing...
You have this code:
+ const STRLEN char_len = isUTF8_CHAR(x, send);
+
+ if ( UNLIKELY(! char_len)
+ || ( UNLIKELY(isUTF8_POSSIBLY_PROBLEMATIC(*x))
+ && ( UNLIKELY(UTF8_IS_SURROGATE(x, send))
+ || UNLIKELY(UTF8_IS_SUPER(x, send))
+ || UNLIKELY(UTF8_IS_NONCHAR(x, send)))))
+ {
+ *ep = x;
+ return FALSE;
+ }
Here isUTF8_CHAR() macro will call function is_utf8_char_slow() if
condition IS_UTF8_CHAR_FAST(UTF8SKIP(x))) is truth. And because
is_utf8_char_slow() is external library function compiler has absolutely
no idea what that function is doing. In non-functional world such
function could have side effect, etc and compiler really cannot
eliminate that call.
Moving UTF8_IS_SUPER before isUTF8_CHAR maybe could help, but I'm septic
if gcc really can propagate constant from PL_utf8skip[] array back and
prove that IS_UTF8_CHAR_FAST must be always true when UTF8_IS_SUPER is
true too...
Rather add IS_UTF8_CHAR_FAST(UTF8SKIP(s))) check (or similar) before
isUTF8_CHAR() call. That should totally eliminate generating code with
call to is_utf8_char_slow() function.
With UTF8_IS_SUPER there can be branch in binary code which never will
be evaluated.
Back to perl.unicode | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100
csiph-web