Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #38304 > unrolled thread
| Started by | Demian Brecht <demianbrecht@gmail.com> |
|---|---|
| First post | 2013-02-06 13:55 -0800 |
| Last post | 2013-02-07 23:44 +1100 |
| Articles | 19 — 8 participants |
Back to article view | Back to comp.lang.python
Re: Curious to see alternate approach on a search/replace via regex Demian Brecht <demianbrecht@gmail.com> - 2013-02-06 13:55 -0800
Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-07 03:04 +0000
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-06 19:31 -0800
Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 09:45 +1100
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 15:13 -0800
Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 10:59 +1100
Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-07 17:55 -0700
Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 14:02 +1100
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:35 -0800
Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-07 18:08 -0700
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:57 -0800
Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-08 01:21 -0700
Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 22:43 +1100
Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-08 09:26 -0700
Re: Curious to see alternate approach on a search/replace via regex Serhiy Storchaka <storchaka@gmail.com> - 2013-02-15 22:58 +0200
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-26 11:20 -0800
Re: Curious to see alternate approach on a search/replace via regex Dave Angel <davea@davea.name> - 2013-02-08 01:27 -0500
Re: Curious to see alternate approach on a search/replace via regex jmfauth <wxjmfauth@gmail.com> - 2013-02-07 03:08 -0800
Re: Curious to see alternate approach on a search/replace via regex Chris Angelico <rosuav@gmail.com> - 2013-02-07 23:44 +1100
| From | Demian Brecht <demianbrecht@gmail.com> |
|---|---|
| Date | 2013-02-06 13:55 -0800 |
| Subject | Re: Curious to see alternate approach on a search/replace via regex |
| Message-ID | <mailman.1426.1360187770.2939.python-list@python.org> |
Well, an alternative /could/ be:
from urlparse import urlparse
parts = urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
print '%s%s_%s' % (parts.netloc.replace('.', '_'),
parts.path.replace('/', '_'),
parts.query.replace('&', '_').replace('=', '_')
)
Although with the result of:
alongnameofasite1234567_com_q_sports_run_a_1_b_1
1288 function calls in 0.004 seconds
Compared to regex method:
498 function calls (480 primitive calls) in 0.000 seconds
I'd prefer the regex method myself.
Demian Brecht
http://demianbrecht.github.com
On 2013-02-06 1:41 PM, "rh" <richard_hubbe11@lavabit.com> wrote:
>http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
[toc] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-02-07 03:04 +0000 |
| Message-ID | <511319c7$0$21812$c3e8da3$76491128@news.astraweb.com> |
| In reply to | #38304 |
On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> Well, an alternative /could/ be:
>
> from urlparse import urlparse
>
> parts =
> urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> parts.path.replace('/', '_'),
> parts.query.replace('&', '_').replace('=', '_') )
>
>
> Although with the result of:
>
> alongnameofasite1234567_com_q_sports_run_a_1_b_1
> 1288 function calls in 0.004 seconds
>
>
> Compared to regex method:
>
> 498 function calls (480 primitive calls) in 0.000 seconds
>
> I'd prefer the regex method myself.
I dispute those results. I think you are mostly measuring the time to
print the result, and I/O is quite slow. My tests show that using urlparse
is 33% faster than using regexes, and far more understandable and
maintainable.
py> from urlparse import urlparse
py> def mangle(url):
... parts = urlparse(url)
... return '%s%s_%s' % (parts.netloc.replace('.', '_'),
... parts.path.replace('/', '_'),
... parts.query.replace('&', '_').replace('=', '_')
... )
...
py> import re
py> def u2f(u):
... nx = re.compile(r'https?://(.+)$')
... u = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... return ux.sub('_', u)
...
py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
py> assert u2f(s) == mangle(s)
py>
py> from timeit import Timer
py> setup = 'from __main__ import s, u2f, mangle'
py> t1 = Timer('mangle(s)', setup)
py> t2 = Timer('u2f(s)', setup)
py>
py> min(t1.repeat(repeat=7))
7.2962000370025635
py> min(t2.repeat(repeat=7))
10.981598854064941
py>
py> (10.98-7.29)/10.98
0.33606557377049184
(Timings done using Python 2.6 on my laptop -- your speeds may vary.)
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Date | 2013-02-06 19:31 -0800 |
| Message-ID | <mailman.1437.1360207874.2939.python-list@python.org> |
| In reply to | #38328 |
On 07 Feb 2013 03:04:39 GMT
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
>
> > Well, an alternative /could/ be:
> >
> > from urlparse import urlparse
> >
> > parts =
> > urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> > print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> > parts.path.replace('/', '_'),
> > parts.query.replace('&', '_').replace('=', '_') )
> >
> >
> > Although with the result of:
> >
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> > 1288 function calls in 0.004 seconds
> >
> >
> > Compared to regex method:
> >
> > 498 function calls (480 primitive calls) in 0.000 seconds
> >
> > I'd prefer the regex method myself.
>
> I dispute those results. I think you are mostly measuring the time to
> print the result, and I/O is quite slow. My tests show that using
> urlparse is 33% faster than using regexes, and far more
> understandable and maintainable.
>
>
> py> from urlparse import urlparse
> py> def mangle(url):
> ... parts = urlparse(url)
> ... return '%s%s_%s' % (parts.netloc.replace('.', '_'),
> ... parts.path.replace('/', '_'),
> ... parts.query.replace('&', '_').replace('=', '_')
> ... )
> ...
> py> import re
> py> def u2f(u):
> ... nx = re.compile(r'https?://(.+)$')
> ... u = nx.search(u).group(1)
> ... ux = re.compile(r'([-:./?&=]+)')
> ... return ux.sub('_', u)
> ...
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py>
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py>
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
>
>
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)
I am using 2.7.3 and I put the re.compile outside the function and it
performed faster than urlparse. I don't print out the data.
Fast
^
| compiled regex
| urlparse
| plain regex
| all-at-once search/replace with alternation
Slow
>
>
>
> --
> Steven
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-02-08 09:45 +1100 |
| Message-ID | <51142e96$0$6512$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #38329 |
rh wrote: > I am using 2.7.3 and I put the re.compile outside the function and it > performed faster than urlparse. I don't print out the data. I find that hard to believe. re.compile caches its results, so except for the very first time it is called, it is very fast -- basically a function call and a dict lookup. I find it implausible that a micro-optimization such as you describe could be responsible for speeding the code up by over 33%. But since you don't demonstrate any actual working code, you could be correct, or you could be timing it wrong. Without seeing your timing code, my guess is that you are doing it wrong. Timing code is tricky, which is why I always show my work. If I get it wrong, someone will hopefully tell me. Otherwise, I might as well be making up the numbers. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Date | 2013-02-07 15:13 -0800 |
| Message-ID | <mailman.1460.1360278819.2939.python-list@python.org> |
| In reply to | #38378 |
On Fri, 08 Feb 2013 09:45:41 +1100
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> rh wrote:
>
> > I am using 2.7.3 and I put the re.compile outside the function and
> > it performed faster than urlparse. I don't print out the data.
>
> I find that hard to believe. re.compile caches its results, so except
> for the very first time it is called, it is very fast -- basically a
> function call and a dict lookup. I find it implausible that a
> micro-optimization such as you describe could be responsible for
> speeding the code up by over 33%.
Not sure where you came up with that number. Maybe another post?
I never gave any numbers, just comparisons.
>
> But since you don't demonstrate any actual working code, you could be
> correct, or you could be timing it wrong. Without seeing your timing
> code, my guess is that you are doing it wrong. Timing code is tricky,
> which is why I always show my work. If I get it wrong, someone will
> hopefully tell me. Otherwise, I might as well be making up the
> numbers.
re.compile
starttime = time.time()
for i in range(numloops):
u2f()
msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
print(msg)
>
>
>
> --
> Steven
>
--
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-02-08 10:59 +1100 |
| Message-ID | <51143feb$0$29974$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #38382 |
rh wrote:
> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> rh wrote:
>>
>> > I am using 2.7.3 and I put the re.compile outside the function and
>> > it performed faster than urlparse. I don't print out the data.
>>
>> I find that hard to believe. re.compile caches its results, so except
>> for the very first time it is called, it is very fast -- basically a
>> function call and a dict lookup. I find it implausible that a
>> micro-optimization such as you describe could be responsible for
>> speeding the code up by over 33%.
>
> Not sure where you came up with that number. Maybe another post?
That number comes from my post, which you replied to.
http://mail.python.org/pipermail/python-list/2013-February/640056.html
By the way, are you aware that you are setting the X-No-Archive header on
your posts?
> I never gave any numbers, just comparisons.
>
>>
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.
>
> re.compile
> starttime = time.time()
> for i in range(numloops):
> u2f()
>
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)
I suggest you go back to my earlier post, the one you responded to, and look
at how I use the timeit module to time small code snippets. Then read the
documentation for it, and the comments in the source code. If you can get
hold of the Python Cookbook, read Tim Peters' comments in that.
http://docs.python.org/2/library/timeit.html
http://docs.python.org/3/library/timeit.html
Oh, one last thing... pulling out "re.compile" outside of the function does
absolutely nothing. You don't even compile anything. It basically looks up
that a compile function exists in the re module, and that's all.
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-02-07 17:55 -0700 |
| Message-ID | <mailman.1463.1360284981.2939.python-list@python.org> |
| In reply to | #38385 |
On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Oh, one last thing... pulling out "re.compile" outside of the function does
> absolutely nothing. You don't even compile anything. It basically looks up
> that a compile function exists in the re module, and that's all.
Using Python 2.7:
>>> t1 = Timer("""
... nx = re.compile(r'https?://(.+)$')
... v = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... ux.sub('_', v)""", """
... import re
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> t2 = Timer("""
... v = nx.search(u).group(1)
... ux.sub('_', v)""", """
... import re
... nx = re.compile(r'https?://(.+)$')
... ux = re.compile(r'([-:./?&=]+)')
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t1.repeat())
11.625409933385388
>>> min(t2.repeat())
8.825254885746652
Whatever caching is being done by re.compile, that's still a 24%
savings by moving the compile calls into the setup.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-02-08 14:02 +1100 |
| Message-ID | <51146ab6$0$30002$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #38389 |
Ian Kelly wrote: > On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano > <steve+comp.lang.python@pearwood.info> wrote: >> Oh, one last thing... pulling out "re.compile" outside of the function >> does absolutely nothing. You don't even compile anything. It basically >> looks up that a compile function exists in the re module, and that's all. > > Using Python 2.7: [...] > Whatever caching is being done by re.compile, that's still a 24% > savings by moving the compile calls into the setup. That may or may not be the case, but rh didn't compile anything. He moved "re.compile" literally, with no arguments, out of the timing code. That clearly does nothing except confirm that re.compile exists. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Date | 2013-02-07 21:35 -0800 |
| Message-ID | <mailman.1475.1360301738.2939.python-list@python.org> |
| In reply to | #38397 |
On Fri, 08 Feb 2013 14:02:14 +1100 Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Ian Kelly wrote: > > > On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano > > <steve+comp.lang.python@pearwood.info> wrote: > >> Oh, one last thing... pulling out "re.compile" outside of the > >> function does absolutely nothing. You don't even compile anything. > >> It basically looks up that a compile function exists in the re > >> module, and that's all. > > > > Using Python 2.7: > [...] > > Whatever caching is being done by re.compile, that's still a 24% > > savings by moving the compile calls into the setup. > > That may or may not be the case, but rh didn't compile anything. He > moved "re.compile" literally, with no arguments, out of the timing > code. That clearly does nothing except confirm that re.compile exists. My initial post has the function and in there are two re.compile calls. I moved those out of the function and see repeatable time efficiency improvements. FWIW the fastest so far was posted by Peter Otten and didn't use regex. As a new learner of python (or any language) I like to know what habits will serve me well into the future. So the only reason I look at the time it takes is as a sanity check to make sure I'm not learning bad habits. In this case someone else pointed out time comparisons and off the thread went into timings! I did take note of your previous post using timeit and filed that away into the gray matter for some other day. > > > > -- > Steven > --
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-02-07 18:08 -0700 |
| Message-ID | <mailman.1464.1360285724.2939.python-list@python.org> |
| In reply to | #38385 |
On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.
On the other hand, if you add an re.purge() call to the start of t1 to
clear the cache:
>>> t3 = Timer("""
... re.purge()
... nx = re.compile(r'https?://(.+)$')
... v = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... ux.sub('_', v)""", """
... import re
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t3.repeat(number=10000))
3.5532990924824617
Which is approximately 30 times slower, so clearly the regular
expression *is* being cached. I think what we're seeing here is that
the time needed to look up the compiled regular expression in the
cache is a significant fraction of the time needed to actually execute
it.
[toc] | [prev] | [next] | [standalone]
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Date | 2013-02-07 21:57 -0800 |
| Message-ID | <mailman.1477.1360303085.2939.python-list@python.org> |
| In reply to | #38385 |
On Thu, 7 Feb 2013 18:08:00 -0700
Ian Kelly <ian.g.kelly@gmail.com> wrote:
> On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly@gmail.com>
> wrote:
> > Whatever caching is being done by re.compile, that's still a 24%
> > savings by moving the compile calls into the setup.
>
> On the other hand, if you add an re.purge() call to the start of t1 to
> clear the cache:
>
> >>> t3 = Timer("""
> ... re.purge()
> ... nx = re.compile(r'https?://(.+)$')
> ... v = nx.search(u).group(1)
> ... ux = re.compile(r'([-:./?&=]+)')
> ... ux.sub('_', v)""", """
> ... import re
> ... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
> >>> min(t3.repeat(number=10000))
> 3.5532990924824617
>
> Which is approximately 30 times slower, so clearly the regular
> expression *is* being cached. I think what we're seeing here is that
> the time needed to look up the compiled regular expression in the
> cache is a significant fraction of the time needed to actually execute
> it.
By "actually execute" you mean to apply the compiled expression
to the search or sub? Or do you mean the time needed to compile
the pattern into a regex obj?
I presumed that compiling the pattern at each iteration was expensive
and that's why I expected moving it out of the function to reduce the
time needed to search/sub.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-02-08 01:21 -0700 |
| Message-ID | <mailman.1492.1360311744.2939.python-list@python.org> |
| In reply to | #38385 |
On Thu, Feb 7, 2013 at 10:57 PM, rh <richard_hubbe11@lavabit.com> wrote: > On Thu, 7 Feb 2013 18:08:00 -0700 > Ian Kelly <ian.g.kelly@gmail.com> wrote: > >> Which is approximately 30 times slower, so clearly the regular >> expression *is* being cached. I think what we're seeing here is that >> the time needed to look up the compiled regular expression in the >> cache is a significant fraction of the time needed to actually execute >> it. > > By "actually execute" you mean to apply the compiled expression > to the search or sub? Or do you mean the time needed to compile > the pattern into a regex obj? The former. Both are dwarfed by the time needed to compile the pattern.
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve+comp.lang.python@pearwood.info> |
|---|---|
| Date | 2013-02-08 22:43 +1100 |
| Message-ID | <5114e4c8$0$29975$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #38430 |
Ian Kelly wrote: > On Thu, Feb 7, 2013 at 10:57 PM, rh <richard_hubbe11@lavabit.com> wrote: >> On Thu, 7 Feb 2013 18:08:00 -0700 >> Ian Kelly <ian.g.kelly@gmail.com> wrote: >> >>> Which is approximately 30 times slower, so clearly the regular >>> expression *is* being cached. I think what we're seeing here is that >>> the time needed to look up the compiled regular expression in the >>> cache is a significant fraction of the time needed to actually execute >>> it. >> >> By "actually execute" you mean to apply the compiled expression >> to the search or sub? Or do you mean the time needed to compile >> the pattern into a regex obj? > > The former. Both are dwarfed by the time needed to compile the pattern. Surely that depends on the size of the pattern, and the size of the data being worked on. Compiling the pattern "s[ai]t" doesn't take that much work, it's only six characters and very simple. Applying it to: "sazsid"*1000000 + "sat" on the other hand may be a tad expensive. Sweeping generalities about the cost of compiling regexes versus searching with them are risky. -- Steven
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2013-02-08 09:26 -0700 |
| Message-ID | <mailman.1513.1360341326.2939.python-list@python.org> |
| In reply to | #38443 |
On Fri, Feb 8, 2013 at 4:43 AM, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote: > Ian Kelly wrote: > Surely that depends on the size of the pattern, and the size of the data > being worked on. Natually. > Compiling the pattern "s[ai]t" doesn't take that much work, it's only six > characters and very simple. Applying it to: > > "sazsid"*1000000 + "sat" > > on the other hand may be a tad expensive. > > Sweeping generalities about the cost of compiling regexes versus searching > with them are risky. I was referring to the specific timing measurements I made earlier in this thread, not generalizing.
[toc] | [prev] | [next] | [standalone]
| From | Serhiy Storchaka <storchaka@gmail.com> |
|---|---|
| Date | 2013-02-15 22:58 +0200 |
| Message-ID | <mailman.1847.1360961927.2939.python-list@python.org> |
| In reply to | #38385 |
On 08.02.13 03:08, Ian Kelly wrote: > I think what we're seeing here is that > the time needed to look up the compiled regular expression in the > cache is a significant fraction of the time needed to actually execute > it. There is a bug issue for this. See http://bugs.python.org/issue16389 .
[toc] | [prev] | [next] | [standalone]
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Date | 2013-02-26 11:20 -0800 |
| Message-ID | <mailman.2569.1361906409.2939.python-list@python.org> |
| In reply to | #38385 |
On Fri, 15 Feb 2013 22:58:30 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote: > On 08.02.13 03:08, Ian Kelly wrote: > > I think what we're seeing here is that > > the time needed to look up the compiled regular expression in the > > cache is a significant fraction of the time needed to actually > > execute it. > > There is a bug issue for this. See http://bugs.python.org/issue16389 . > I can't tell what is the problem, is it fixed or still in progress?
[toc] | [prev] | [next] | [standalone]
| From | Dave Angel <davea@davea.name> |
|---|---|
| Date | 2013-02-08 01:27 -0500 |
| Message-ID | <mailman.1486.1360304886.2939.python-list@python.org> |
| In reply to | #38378 |
On 02/07/2013 06:13 PM, rh wrote:
> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> <snip>
>>
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.
>
> re.compile
That statement does explicitly nothing useful. It certainly doesn't
compile anything, or call any regex code.
> starttime = time.time()
> for i in range(numloops):
> u2f()
>
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)
>
--
DaveA
[toc] | [prev] | [next] | [standalone]
| From | jmfauth <wxjmfauth@gmail.com> |
|---|---|
| Date | 2013-02-07 03:08 -0800 |
| Message-ID | <d822860f-1a49-4470-9c1d-4a83f364a4bb@ia3g2000vbb.googlegroups.com> |
| In reply to | #38328 |
On 7 fév, 04:04, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> > Well, an alternative /could/ be:
>
> ...
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py>
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py>
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
>
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)
>
--------
[OT] Sorry, but I find all these "timeit" I see here and there
more and more ridiculous.
Maybe it's the language itself, which became ridiculous.
code:
r = repeat("('WHERE IN THE WORLD IS CARMEN?'*10).lower()")
print('1:', r)
r = repeat("('WHERE IN THE WORLD IS HÉLÈNE?'*10).lower()")
print('2:', r)
t = Timer("re.sub('CARMEN', 'CARMEN', 'WHERE IN THE WORLD IS
CARMEN?'*10)", "import re")
r = t.repeat()
print('3:', r)
t = Timer("re.sub('HÉLÈNE', 'HÉLÈNE', 'WHERE IN THE WORLD IS
HÉLÈNE?'*10)", "import re")
r = t.repeat()
print('4:', r)
result:
>c:\python32\pythonw -u "vitesse3.py"
1: [2.578785478740226, 2.5738459157233833, 2.5739002658825543]
2: [2.57605654937141, 2.5784755252962572, 2.5775366066044896]
3: [11.856728254324088, 11.856321809655501, 11.857456073846905]
4: [12.111787643688231, 12.102743462128771, 12.098514783440208]
>Exit code: 0
>c:\Python33\pythonw -u "vitesse3.py"
1: [0.6063335264470632, 0.6104798922133946, 0.6078580877959869]
2: [4.080205081267272, 4.079303183698418, 4.0786836706522145]
3: [18.093742209318215, 18.079666699618095, 18.07107661757692]
4: [18.852576768615222, 18.841418050790622, 18.840745369110437]
>Exit code: 0
The future is bright for ... ascii users.
jmf
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-02-07 23:44 +1100 |
| Message-ID | <mailman.1446.1360241081.2939.python-list@python.org> |
| In reply to | #38344 |
On Thu, Feb 7, 2013 at 10:08 PM, jmfauth <wxjmfauth@gmail.com> wrote: > The future is bright for ... ascii users. > > jmf So you're admitting to being not very bright? *ducks* Seriously jmf, please don't hijack threads just to whine about contrived issues of Unicode performance yet again. That horse is dead. Go fork Python and reimplement buggy narrow builds if you want to, the rest of us are happy with a bug-free Python. ChrisA
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web