Groups | Search | Server Info | Login | Register


Groups > perl.unicode > #217

Re: Encode UTF-8 optimizations

Newsgroups perl.unicode
Subject Re: Encode UTF-8 optimizations
Date 2016-08-22 23:39 +0200
References <201608121731.32716@pali> <201608222247.46077@pali> <4e654c06-2b51-31a8-c2ab-f98c4dcf421d@khwilliamson.com>
Message-ID <201608222339.51155@pali> (permalink)
From pali@cpan.org

Show all headers | View raw


(this only applies for strict UTF-8)

On Monday 22 August 2016 23:19:51 Karl Williamson wrote:
> The code could be tweaked to call UTF8_IS_SUPER first, but I'm
> asserting that an optimizing compiler will see that any call to
> is_utf8_char_slow() is pointless, and will optimize it out.

Such optimization cannot be done and compiler cannot know such thing...

You have this code:

+        const STRLEN char_len = isUTF8_CHAR(x, send);
+
+        if (    UNLIKELY(! char_len)
+            || (    UNLIKELY(isUTF8_POSSIBLY_PROBLEMATIC(*x))
+                && (   UNLIKELY(UTF8_IS_SURROGATE(x, send))
+                    || UNLIKELY(UTF8_IS_SUPER(x, send))
+                    || UNLIKELY(UTF8_IS_NONCHAR(x, send)))))
+        {
+            *ep = x;
+            return FALSE;
+        }

Here isUTF8_CHAR() macro will call function is_utf8_char_slow() if 
condition IS_UTF8_CHAR_FAST(UTF8SKIP(x))) is truth. And because 
is_utf8_char_slow() is external library function compiler has absolutely 
no idea what that function is doing. In non-functional world such 
function could have side effect, etc and compiler really cannot 
eliminate that call.

Moving UTF8_IS_SUPER before isUTF8_CHAR maybe could help, but I'm septic 
if gcc really can propagate constant from PL_utf8skip[] array back and 
prove that IS_UTF8_CHAR_FAST must be always true when UTF8_IS_SUPER is 
true too...

Rather add IS_UTF8_CHAR_FAST(UTF8SKIP(s))) check (or similar) before 
isUTF8_CHAR() call. That should totally eliminate generating code with 
call to is_utf8_char_slow() function.

With UTF8_IS_SUPER there can be branch in binary code which never will 
be evaluated.

Back to perl.unicode | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
  Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
    Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
      Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
        Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
        Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
          Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
            Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
              Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100

csiph-web