Groups | Search | Server Info | Login | Register
| Newsgroups | perl.unicode |
|---|---|
| Date | 2016-09-01 09:30 +0200 |
| Subject | Re: Encode UTF-8 optimizations |
| Message-ID | <20160901073008.GA11865@pali> (permalink) |
| References | <201608121731.32716@pali> <20160825074833.GA27810@pali> <e3b394fe-77a8-51f8-7793-318f50a88f69@khwilliamson.com> <201608312343.43936@pali> <d5301b47-c970-54ad-144a-95c31655e828@khwilliamson.com> |
| From | pali@cpan.org |
On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote:
> On 08/31/2016 03:43 PM, pali@cpan.org wrote:
> >On Monday 29 August 2016 17:00:00 Karl Williamson wrote:
> >>If you'd be willing to test this out, especially the performance
> >>parts that would be great!
> >[snip]
> >>There are 2 experimental performance commits. If you want to see if
> >>they actually improve performance by doing a before/after compare
> >>that would be nice.
> >
> >So here are my results:
> >
> >strict = bless({strict_utf8 => 1}, "Encode::utf8")->encode_xs/decode_xs
> >lax = bless({strict_utf8 => 0}, "Encode::utf8")->encode_xs/decode_xs
> >int = utf8::encode/decode
> >
> >all = join "", map { chr } 0 .. 0x10FFFF
> >short = "žluťoučký kůň pěl ďábelské ódy " x 45
> >long = $short x 1000
> >ishort = "\xA0" x 1000
> >ilong = "\xA0" x 1000000
> >
> >your = 9c03449800417dd02cc1af613951a1002490a52a
> >orig = f16e7fa35c1302aa056db5d8d022b7861c1dd2e8
> >my = orig without c8247c27c13d1cf152398e453793a91916d2185d
> >your1 = your without b65e9a52d8b428146ee554d724b9274f8e77286c
> >your2 = your without 9ccc3ecd1119ccdb64e91b1f03376916aa8cc6f7
> >
> >
> >decode
> > all ilong ishort long short
> > my: - int 285.94/s 14988.61/s 4694109.54/s 704.15/s 599678.93/s
> > orig: - int 292.41/s 15121.98/s 4782883.50/s 494.33/s 553182.28/s
> > your1: - int 271.21/s 14232.25/s 4706722.93/s 599.68/s 554941.90/s
> > your2: - int 280.85/s 14090.33/s 4210573.40/s 593.93/s 558487.86/s
> > your: - int 283.23/s 15121.98/s 4500252.51/s 691.95/s 678859.55/s
> >
> > all ilong ishort long short
> > my: - lax 83.28/s 202.22/s 142049.67/s 181.82/s 163352.41/s
> > orig: - lax 53.49/s 201.58/s 152422.11/s 147.13/s 133974.37/s
> > your1: - lax 255.13/s 53.75/s 47590.82/s 560.34/s 431447.77/s
> > your2: - lax 281.71/s 48.41/s 43260.19/s 634.16/s 445365.29/s
> > your: - lax 286.96/s 46.35/s 42848.40/s 632.20/s 442546.52/s
> >
> > all ilong ishort long short
> > my: - strict 90.48/s 200.00/s 143081.15/s 197.53/s 175800.00/s
> > orig: - strict 49.21/s 202.22/s 149447.34/s 142.81/s 128290.63/s
> > your1: - strict 154.94/s 48.16/s 44237.93/s 191.36/s 169228.16/s
> > your2: - strict 158.75/s 40.06/s 37244.06/s 195.95/s 173588.68/s
> > your: - strict 158.26/s 38.54/s 36898.14/s 195.95/s 172504.61/s
> >
> >
> >encode
> > all ilong ishort long short
> > my: - int 5197722.67/s 5227338.26/s 5210583.97/s 5163520.62/s 5227338.26/s
> > orig: - int 5449888.54/s 5381336.48/s 5370254.05/s 5449888.54/s 5301624.60/s
> > your1: - int 5244200.62/s 5293830.28/s 5277183.02/s 5361483.07/s 5260640.13/s
> > your2: - int 5435994.67/s 5432587.30/s 5398312.30/s 5487602.22/s 5606457.74/s
> > your: - int 5261172.17/s 5327441.90/s 5310582.91/s 5310582.91/s 5361483.07/s
> >
> > all ilong ishort long short
> > my: - lax 2442.24/s 15084.08/s 2882995.00/s 7993.15/s 2716293.65/s
> > orig: - lax 2438.39/s 15121.98/s 2933419.33/s 7965.22/s 2665521.81/s
> > your1: - lax 2229.94/s 14908.60/s 2117316.51/s 7428.89/s 2011133.75/s
> > your2: - lax 2400.92/s 15121.98/s 3046739.87/s 8065.41/s 2742961.18/s
> > your: - lax 2368.00/s 15168.94/s 2862328.67/s 8090.85/s 2685694.50/s
> >
> > all ilong ishort long short
> > my: - strict 92.16/s 204.81/s 157772.05/s 200.00/s 190344.59/s
> > orig: - strict 49.04/s 202.22/s 160767.72/s 142.81/s 133548.90/s
> > your1: - strict 147.75/s 46.91/s 46095.57/s 194.36/s 176949.84/s
> > your2: - strict 159.25/s 40.19/s 38034.59/s 196.20/s 185166.45/s
> > your: - strict 158.26/s 38.54/s 37012.73/s 196.20/s 186357.23/s
> >
> >
> >So looks like that experimental commits did not speed up encoder or decoder.
> >
> >What is relevant from these tests is that your patches slow down encoding
> >and decoding of illegal sequences like "\xA0" x 1000000 about 4-5 times.
> >
>
> Thanks for your efforts. Also relevant is that this speeds up validation
> under decode by over a factor of 5.
Yes, and that is really great!
> Given that most inputs will be mostly
> valid, this outweighs the slowdown, and so I have pushed the
> non-experimental non-Encode-changes portions to blead.
Look like that those two experimental commits did not change performance
at least for my test cases. I used gcc-4.6.3 on x86-64 with default
cflags (so with -O2).
> We may change Encode in blead too, since it already differs from cpan. I'll
> have to get Sawyer's opinion on that. But the next step is for me to fix
> Devel::PPPort to handle the things that Encode needs, issue a pull request
> there, and after that is resolved issue an Encode PR.
In my opinion we should sync Encode version in blead and on cpan.
Currently they are more or less different which can cause problems...
Anyway, I have some suggestions for changes about warnings in
Encode::utf8 package. If you have time, please look at that (I sent
email) and tell what do you think about it... In my opinion that should
be fixed too and I can prepare patches (after decision will be made).
Back to perl.unicode | Previous | Next — Previous in thread | Next in thread | Find similar
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100
csiph-web