Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > perl.unicode > #202 > unrolled thread

Re: Encode UTF-8 optimizations

Started bypali@cpan.org
First post2016-08-12 17:31 +0200
Last post2016-11-01 10:53 +0100
Articles 5 on this page of 25 — 3 participants

Back to article view | Back to perl.unicode


Contents

  Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-12 17:31 +0200
    Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-18 23:06 -0600
      Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-19 10:42 +0200
        Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 19:10 -0600
          Re: Encode UTF-8 optimizations pagaltzis@gmx.de (Aristotle Pagaltzis) - 2016-08-21 04:33 +0200
            Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-20 20:55 -0600
          Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-21 10:34 +0200
            Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-21 08:49 -0600
              Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 15:05 +0200
                Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 13:43 -0600
                  Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 22:47 +0200
                    Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:19 -0600
                      Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-22 15:38 -0600
                        Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:45 +0200
                      Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-22 23:39 +0200
                    Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-24 22:49 -0600
                      Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-25 09:48 +0200
                        Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-29 09:00 -0600
                          Re: Encode UTF-8 optimizations pali@cpan.org - 2016-08-31 23:43 +0200
                            Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-08-31 21:27 -0600
                              Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-01 09:30 +0200
                                Re: Encode UTF-8 optimizations pali@cpan.org - 2016-09-25 12:06 +0200
                                  Re: Encode UTF-8 optimizations public@khwilliamson.com (Karl Williamson) - 2016-09-25 10:49 -0600
                                    Re: Encode UTF-8 optimizations pali@cpan.org - 2016-10-27 10:25 +0200
                                      Re: Encode UTF-8 optimizations pali@cpan.org - 2016-11-01 10:53 +0100

Page 2 of 2 — ← Prev page 1 [2]


#224

Frompali@cpan.org
Date2016-09-01 09:30 +0200
Message-ID<20160901073008.GA11865@pali>
In reply to#223
On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote:
> On 08/31/2016 03:43 PM, pali@cpan.org wrote:
> >On Monday 29 August 2016 17:00:00 Karl Williamson wrote:
> >>If you'd be willing to test this out, especially the performance
> >>parts that would be great!
> >[snip]
> >>There are 2 experimental performance commits.  If you want to see if
> >>they actually improve performance by doing a before/after compare
> >>that would be nice.
> >
> >So here are my results:
> >
> >strict = bless({strict_utf8 => 1}, "Encode::utf8")->encode_xs/decode_xs
> >lax    = bless({strict_utf8 => 0}, "Encode::utf8")->encode_xs/decode_xs
> >int    = utf8::encode/decode
> >
> >all    = join "", map { chr } 0 .. 0x10FFFF
> >short  = "žluťoučký kůň pěl ďábelské ódy " x 45
> >long   = $short x 1000
> >ishort = "\xA0" x 1000
> >ilong  = "\xA0" x 1000000
> >
> >your   = 9c03449800417dd02cc1af613951a1002490a52a
> >orig   = f16e7fa35c1302aa056db5d8d022b7861c1dd2e8
> >my     = orig without c8247c27c13d1cf152398e453793a91916d2185d
> >your1  = your without b65e9a52d8b428146ee554d724b9274f8e77286c
> >your2  = your without 9ccc3ecd1119ccdb64e91b1f03376916aa8cc6f7
> >
> >
> >decode
> >                          all            ilong          ishort         long           short
> >    my: - int           285.94/s     14988.61/s   4694109.54/s       704.15/s    599678.93/s
> >  orig: - int           292.41/s     15121.98/s   4782883.50/s       494.33/s    553182.28/s
> > your1: - int           271.21/s     14232.25/s   4706722.93/s       599.68/s    554941.90/s
> > your2: - int           280.85/s     14090.33/s   4210573.40/s       593.93/s    558487.86/s
> >  your: - int           283.23/s     15121.98/s   4500252.51/s       691.95/s    678859.55/s
> >
> >                          all            ilong          ishort         long           short
> >    my: - lax            83.28/s       202.22/s    142049.67/s       181.82/s    163352.41/s
> >  orig: - lax            53.49/s       201.58/s    152422.11/s       147.13/s    133974.37/s
> > your1: - lax           255.13/s        53.75/s     47590.82/s       560.34/s    431447.77/s
> > your2: - lax           281.71/s        48.41/s     43260.19/s       634.16/s    445365.29/s
> >  your: - lax           286.96/s        46.35/s     42848.40/s       632.20/s    442546.52/s
> >
> >                          all            ilong          ishort         long           short
> >    my: - strict         90.48/s       200.00/s    143081.15/s       197.53/s    175800.00/s
> >  orig: - strict         49.21/s       202.22/s    149447.34/s       142.81/s    128290.63/s
> > your1: - strict        154.94/s        48.16/s     44237.93/s       191.36/s    169228.16/s
> > your2: - strict        158.75/s        40.06/s     37244.06/s       195.95/s    173588.68/s
> >  your: - strict        158.26/s        38.54/s     36898.14/s       195.95/s    172504.61/s
> >
> >
> >encode
> >                          all            ilong          ishort         long           short
> >    my: - int       5197722.67/s   5227338.26/s   5210583.97/s   5163520.62/s   5227338.26/s
> >  orig: - int       5449888.54/s   5381336.48/s   5370254.05/s   5449888.54/s   5301624.60/s
> > your1: - int       5244200.62/s   5293830.28/s   5277183.02/s   5361483.07/s   5260640.13/s
> > your2: - int       5435994.67/s   5432587.30/s   5398312.30/s   5487602.22/s   5606457.74/s
> >  your: - int       5261172.17/s   5327441.90/s   5310582.91/s   5310582.91/s   5361483.07/s
> >
> >                          all            ilong          ishort         long           short
> >    my: - lax          2442.24/s     15084.08/s   2882995.00/s      7993.15/s   2716293.65/s
> >  orig: - lax          2438.39/s     15121.98/s   2933419.33/s      7965.22/s   2665521.81/s
> > your1: - lax          2229.94/s     14908.60/s   2117316.51/s      7428.89/s   2011133.75/s
> > your2: - lax          2400.92/s     15121.98/s   3046739.87/s      8065.41/s   2742961.18/s
> >  your: - lax          2368.00/s     15168.94/s   2862328.67/s      8090.85/s   2685694.50/s
> >
> >                          all            ilong          ishort         long           short
> >    my: - strict         92.16/s       204.81/s    157772.05/s       200.00/s    190344.59/s
> >  orig: - strict         49.04/s       202.22/s    160767.72/s       142.81/s    133548.90/s
> > your1: - strict        147.75/s        46.91/s     46095.57/s       194.36/s    176949.84/s
> > your2: - strict        159.25/s        40.19/s     38034.59/s       196.20/s    185166.45/s
> >  your: - strict        158.26/s        38.54/s     37012.73/s       196.20/s    186357.23/s
> >
> >
> >So looks like that experimental commits did not speed up encoder or decoder.
> >
> >What is relevant from these tests is that your patches slow down encoding
> >and decoding of illegal sequences like "\xA0" x 1000000 about 4-5 times.
> >
> 
> Thanks for your efforts.  Also relevant is that this speeds up validation
> under decode by over a factor of 5.

Yes, and that is really great!

> Given that most inputs will be mostly
> valid, this outweighs the slowdown, and so I have pushed the
> non-experimental non-Encode-changes portions to blead.

Look like that those two experimental commits did not change performance
at least for my test cases. I used gcc-4.6.3 on x86-64 with default
cflags (so with -O2).

> We may change Encode in blead too, since it already differs from cpan. I'll
> have to get Sawyer's opinion on that.  But the next step is for me to fix
> Devel::PPPort to handle the things that Encode needs, issue a pull request
> there, and after that is resolved issue an Encode PR.

In my opinion we should sync Encode version in blead and on cpan.
Currently they are more or less different which can cause problems...

Anyway, I have some suggestions for changes about warnings in
Encode::utf8 package. If you have time, please look at that (I sent
email) and tell what do you think about it... In my opinion that should
be fixed too and I can prepare patches (after decision will be made).

[toc] | [prev] | [next] | [standalone]


#225

Frompali@cpan.org
Date2016-09-25 12:06 +0200
Message-ID<201609251206.48719@pali>
In reply to#224
On Thursday 01 September 2016 09:30:08 pali@cpan.org wrote:
> On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote:
> > We may change Encode in blead too, since it already differs from
> > cpan. I'll have to get Sawyer's opinion on that.  But the next
> > step is for me to fix Devel::PPPort to handle the things that
> > Encode needs, issue a pull request there, and after that is
> > resolved issue an Encode PR.

Hi! One month passed, do you have any progress in syncing blead and cpan 
Encode version? Or do you need some help?

> In my opinion we should sync Encode version in blead and on cpan.
> Currently they are more or less different which can cause problems...
> 
> Anyway, I have some suggestions for changes about warnings in
> Encode::utf8 package. If you have time, please look at that (I sent
> email) and tell what do you think about it... In my opinion that
> should be fixed too and I can prepare patches (after decision will
> be made).

[toc] | [prev] | [next] | [standalone]


#226

Frompublic@khwilliamson.com (Karl Williamson)
Date2016-09-25 10:49 -0600
Message-ID<27837e6d-8c52-45f8-5152-ee92987151ab@khwilliamson.com>
In reply to#225
On 09/25/2016 04:06 AM, pali@cpan.org wrote:
> On Thursday 01 September 2016 09:30:08 pali@cpan.org wrote:
>> On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote:
>>> We may change Encode in blead too, since it already differs from
>>> cpan. I'll have to get Sawyer's opinion on that.  But the next
>>> step is for me to fix Devel::PPPort to handle the things that
>>> Encode needs, issue a pull request there, and after that is
>>> resolved issue an Encode PR.
>
> Hi! One month passed, do you have any progress in syncing blead and cpan
> Encode version? Or do you need some help?

I don't see any way to easily split up the tasks.  In the next 48 hours 
I will push to blead the changes for Encode to use.  In working on this, 
I've seen some other things I think should happen to blead to give XS 
writers all they need so they won't be tempted to get to such a low 
level as before,and introduce security bugs.  And I am working on this 
and expect to finish this coming week.  After this soaks in blead for a 
while, I'll issue a pull request so that all the tools are in 
Devel::PPPort.  At that time Encode can be sync'd.
>
>> In my opinion we should sync Encode version in blead and on cpan.
>> Currently they are more or less different which can cause problems...
>>
>> Anyway, I have some suggestions for changes about warnings in
>> Encode::utf8 package. If you have time, please look at that (I sent
>> email) and tell what do you think about it... In my opinion that
>> should be fixed too and I can prepare patches (after decision will
>> be made).
>

[toc] | [prev] | [next] | [standalone]


#227

Frompali@cpan.org
Date2016-10-27 10:25 +0200
Message-ID<20161027082525.GA22794@pali>
In reply to#226
On Sunday 25 September 2016 10:49:41 Karl Williamson wrote:
> On 09/25/2016 04:06 AM, pali@cpan.org wrote:
> >On Thursday 01 September 2016 09:30:08 pali@cpan.org wrote:
> >>On Wednesday 31 August 2016 21:27:37 Karl Williamson wrote:
> >>>We may change Encode in blead too, since it already differs from
> >>>cpan. I'll have to get Sawyer's opinion on that.  But the next
> >>>step is for me to fix Devel::PPPort to handle the things that
> >>>Encode needs, issue a pull request there, and after that is
> >>>resolved issue an Encode PR.
> >
> >Hi! One month passed, do you have any progress in syncing blead and cpan
> >Encode version? Or do you need some help?
> 
> I don't see any way to easily split up the tasks.  In the next 48 hours I
> will push to blead the changes for Encode to use.  In working on this, I've
> seen some other things I think should happen to blead to give XS writers all
> they need so they won't be tempted to get to such a low level as before,and
> introduce security bugs.  And I am working on this and expect to finish this
> coming week.  After this soaks in blead for a while, I'll issue a pull
> request so that all the tools are in Devel::PPPort.  At that time Encode can
> be sync'd.

Hi! I send my changes and fixes for Encode to upstream on github.
It includes fixes for more crashed reported in rt.cpan.org.

I think that those fixes for crashes should be included also in blead as
processing untrusted data (e.g. prepared from attacker) leads in crash
of whole perl...

[toc] | [prev] | [next] | [standalone]


#228

Frompali@cpan.org
Date2016-11-01 10:53 +0100
Message-ID<20161101095304.GA8928@pali>
In reply to#227
Hi! New Encode 2.87 with lots of fixes for Encode.xs and
Encode::MIME::Header was released. Can you sync/import it into blead?

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | perl.unicode


csiph-web