Groups | Search | Server Info | Login | Register
| Newsgroups | perl.unicode |
|---|---|
| Subject | Encode UTF-8 optimizations |
| Date | 2016-07-10 01:12 +0200 |
| Message-ID | <201607100112.45201@pali> (permalink) |
| From | pali@cpan.org |
Hi! As we know utf8::encode() does not provide correct UTF-8 encoding
and Encode::encode("UTF-8", ...) should be used instead. Also opening
file should be done by :encoding(UTF-8) layer instead :utf8.
But UTF-8 strict implementation in Encode module is horrible slow when
comparing to utf8::encode(). It is implemented in Encode.xs file and for
benchmarking can be this XS implementation called directly by:
use Encode;
my $output = Encode::utf8::encode_xs({strict_utf8 => 1}, $input)
(without overhead of Encode module...)
Here are my results on 160 bytes long input string:
Encode::utf8::encode_xs({strict_utf8 => 1}, ...): 8 wallclock secs ( 8.56 usr + 0.00 sys = 8.56 CPU) @ 467289.72/s (n=4000000)
Encode::utf8::encode_xs({strict_utf8 => 0}, ...): 1 wallclock secs ( 1.66 usr + 0.00 sys = 1.66 CPU) @ 2409638.55/s (n=4000000)
utf8::encode: 1 wallclock secs ( 0.39 usr + 0.00 sys = 0.39 CPU) @ 10256410.26/s (n=4000000)
I found two bottle necks (slow sv_catpv* and utf8n_to_uvuni functions)
and did some optimizations. Final results are:
Encode::utf8::encode_xs({strict_utf8 => 1}, ...): 2 wallclock secs ( 3.27 usr + 0.00 sys = 3.27 CPU) @ 1223241.59/s (n=4000000)
Encode::utf8::encode_xs({strict_utf8 => 0}, ...): 1 wallclock secs ( 1.68 usr + 0.00 sys = 1.68 CPU) @ 2380952.38/s (n=4000000)
utf8::encode: 1 wallclock secs ( 0.40 usr + 0.00 sys = 0.40 CPU) @ 10000000.00/s (n=4000000)
Patches are on github at pull request:
https://github.com/dankogai/p5-encode/pull/56
I would like if somebody review my patches and tell if this is the
right way for optimizations...
Back to perl.unicode | Previous | Next — Next in thread | Find similar
Encode UTF-8 optimizations pali@cpan.org - 2016-07-10 01:12 +0200 Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-11 17:41 -0600
csiph-web