Groups > comp.lang.python > #38304 > unrolled thread

Re: Curious to see alternate approach on a search/replace via regex

Started by	Demian Brecht <demianbrecht@gmail.com>
First post	2013-02-06 13:55 -0800
Last post	2013-02-07 23:44 +1100
Articles	19 — 8 participants

Back to article view | Back to comp.lang.python

  Re: Curious to see alternate approach on a search/replace via regex Demian Brecht <demianbrecht@gmail.com> - 2013-02-06 13:55 -0800
    Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-07 03:04 +0000
      Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-06 19:31 -0800
        Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 09:45 +1100
          Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 15:13 -0800
            Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 10:59 +1100
              Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-07 17:55 -0700
                Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 14:02 +1100
                  Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:35 -0800
              Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-07 18:08 -0700
              Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:57 -0800
              Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-08 01:21 -0700
                Re: Curious to see alternate approach on a search/replace via regex Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-02-08 22:43 +1100
                  Re: Curious to see alternate approach on a search/replace via regex Ian Kelly <ian.g.kelly@gmail.com> - 2013-02-08 09:26 -0700
              Re: Curious to see alternate approach on a search/replace via regex Serhiy Storchaka <storchaka@gmail.com> - 2013-02-15 22:58 +0200
              Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-26 11:20 -0800
          Re: Curious to see alternate approach on a search/replace via regex Dave Angel <davea@davea.name> - 2013-02-08 01:27 -0500
      Re: Curious to see alternate approach on a search/replace via regex jmfauth <wxjmfauth@gmail.com> - 2013-02-07 03:08 -0800
        Re: Curious to see alternate approach on a search/replace via regex Chris Angelico <rosuav@gmail.com> - 2013-02-07 23:44 +1100

#38304 — Re: Curious to see alternate approach on a search/replace via regex

From	Demian Brecht <demianbrecht@gmail.com>
Date	2013-02-06 13:55 -0800
Subject	Re: Curious to see alternate approach on a search/replace via regex
Message-ID	<mailman.1426.1360187770.2939.python-list@python.org>

Well, an alternative /could/ be:

from urlparse import urlparse

parts = urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
print '%s%s_%s' % (parts.netloc.replace('.', '_'),
    parts.path.replace('/', '_'),
    parts.query.replace('&', '_').replace('=', '_')
    ) 


Although with the result of:

alongnameofasite1234567_com_q_sports_run_a_1_b_1
         1288 function calls in 0.004 seconds


Compared to regex method:

498 function calls (480 primitive calls) in 0.000 seconds

I'd prefer the regex method myself.

Demian Brecht
http://demianbrecht.github.com




On 2013-02-06 1:41 PM, "rh" <richard_hubbe11@lavabit.com> wrote:

>http://alongnameofasite1234567.com/q?sports=run&a=1&b=1

[toc] | [next] | [standalone]

#38328

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-02-07 03:04 +0000
Message-ID	<511319c7$0$21812$c3e8da3$76491128@news.astraweb.com>
In reply to	#38304

On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:

> Well, an alternative /could/ be:
> 
> from urlparse import urlparse
> 
> parts =
> urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> print '%s%s_%s' % (parts.netloc.replace('.', '_'),
>     parts.path.replace('/', '_'),
>     parts.query.replace('&', '_').replace('=', '_') )
> 
> 
> Although with the result of:
> 
> alongnameofasite1234567_com_q_sports_run_a_1_b_1
>          1288 function calls in 0.004 seconds
> 
> 
> Compared to regex method:
> 
> 498 function calls (480 primitive calls) in 0.000 seconds
> 
> I'd prefer the regex method myself.

I dispute those results. I think you are mostly measuring the time to 
print the result, and I/O is quite slow. My tests show that using urlparse 
is 33% faster than using regexes, and far more understandable and 
maintainable.


py> from urlparse import urlparse
py> def mangle(url):
...     parts = urlparse(url)
...     return '%s%s_%s' % (parts.netloc.replace('.', '_'),
...             parts.path.replace('/', '_'),
...             parts.query.replace('&', '_').replace('=', '_')
...             )
... 
py> import re
py> def u2f(u):
...     nx = re.compile(r'https?://(.+)$')
...     u = nx.search(u).group(1)
...     ux = re.compile(r'([-:./?&=]+)')
...     return ux.sub('_', u)
... 
py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
py> assert u2f(s) == mangle(s)
py> 
py> from timeit import Timer
py> setup = 'from __main__ import s, u2f, mangle'
py> t1 = Timer('mangle(s)', setup)
py> t2 = Timer('u2f(s)', setup)
py> 
py> min(t1.repeat(repeat=7))
7.2962000370025635
py> min(t2.repeat(repeat=7))
10.981598854064941
py>
py> (10.98-7.29)/10.98
0.33606557377049184


(Timings done using Python 2.6 on my laptop -- your speeds may vary.)



-- 
Steven

[toc] | [prev] | [next] | [standalone]

#38329

From	rh <richard_hubbe11@lavabit.com>
Date	2013-02-06 19:31 -0800
Message-ID	<mailman.1437.1360207874.2939.python-list@python.org>
In reply to	#38328

On 07 Feb 2013 03:04:39 GMT
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> 
> > Well, an alternative /could/ be:
> > 
> > from urlparse import urlparse
> > 
> > parts =
> > urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> > print '%s%s_%s' % (parts.netloc.replace('.', '_'),
> >     parts.path.replace('/', '_'),
> >     parts.query.replace('&', '_').replace('=', '_') )
> > 
> > 
> > Although with the result of:
> > 
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> >          1288 function calls in 0.004 seconds
> > 
> > 
> > Compared to regex method:
> > 
> > 498 function calls (480 primitive calls) in 0.000 seconds
> > 
> > I'd prefer the regex method myself.
> 
> I dispute those results. I think you are mostly measuring the time to 
> print the result, and I/O is quite slow. My tests show that using
> urlparse is 33% faster than using regexes, and far more
> understandable and maintainable.
> 
> 
> py> from urlparse import urlparse
> py> def mangle(url):
> ...     parts = urlparse(url)
> ...     return '%s%s_%s' % (parts.netloc.replace('.', '_'),
> ...             parts.path.replace('/', '_'),
> ...             parts.query.replace('&', '_').replace('=', '_')
> ...             )
> ... 
> py> import re
> py> def u2f(u):
> ...     nx = re.compile(r'https?://(.+)$')
> ...     u = nx.search(u).group(1)
> ...     ux = re.compile(r'([-:./?&=]+)')
> ...     return ux.sub('_', u)
> ... 
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py> 
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py> 
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
> 
> 
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)

I am using 2.7.3 and I put the re.compile outside the function and it
performed faster than urlparse. I don't print out the data. 

Fast
^
|  compiled regex
|  urlparse
|  plain regex
|  all-at-once search/replace with alternation
Slow

> 
> 
> 
> -- 
> Steven

[toc] | [prev] | [next] | [standalone]

#38378

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-02-08 09:45 +1100
Message-ID	<51142e96$0$6512$c3e8da3$5496439d@news.astraweb.com>
In reply to	#38329

rh wrote:

> I am using 2.7.3 and I put the re.compile outside the function and it
> performed faster than urlparse. I don't print out the data.

I find that hard to believe. re.compile caches its results, so except for
the very first time it is called, it is very fast -- basically a function
call and a dict lookup. I find it implausible that a micro-optimization
such as you describe could be responsible for speeding the code up by over
33%.

But since you don't demonstrate any actual working code, you could be
correct, or you could be timing it wrong. Without seeing your timing code,
my guess is that you are doing it wrong. Timing code is tricky, which is
why I always show my work. If I get it wrong, someone will hopefully tell
me. Otherwise, I might as well be making up the numbers.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#38382

From	rh <richard_hubbe11@lavabit.com>
Date	2013-02-07 15:13 -0800
Message-ID	<mailman.1460.1360278819.2939.python-list@python.org>
In reply to	#38378

On Fri, 08 Feb 2013 09:45:41 +1100
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> rh wrote:
> 
> > I am using 2.7.3 and I put the re.compile outside the function and
> > it performed faster than urlparse. I don't print out the data.
> 
> I find that hard to believe. re.compile caches its results, so except
> for the very first time it is called, it is very fast -- basically a
> function call and a dict lookup. I find it implausible that a
> micro-optimization such as you describe could be responsible for
> speeding the code up by over 33%.

Not sure where you came up with that number. Maybe another post?
I never gave any numbers, just comparisons.

> 
> But since you don't demonstrate any actual working code, you could be
> correct, or you could be timing it wrong. Without seeing your timing
> code, my guess is that you are doing it wrong. Timing code is tricky,
> which is why I always show my work. If I get it wrong, someone will
> hopefully tell me. Otherwise, I might as well be making up the
> numbers.

re.compile
starttime = time.time()
for i in range(numloops):
    u2f()

msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
print(msg)

> 
> 
> 
> -- 
> Steven
> 


--

[toc] | [prev] | [next] | [standalone]

#38385

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-02-08 10:59 +1100
Message-ID	<51143feb$0$29974$c3e8da3$5496439d@news.astraweb.com>
In reply to	#38382

rh wrote:

> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
> 
>> rh wrote:
>> 
>> > I am using 2.7.3 and I put the re.compile outside the function and
>> > it performed faster than urlparse. I don't print out the data.
>> 
>> I find that hard to believe. re.compile caches its results, so except
>> for the very first time it is called, it is very fast -- basically a
>> function call and a dict lookup. I find it implausible that a
>> micro-optimization such as you describe could be responsible for
>> speeding the code up by over 33%.
> 
> Not sure where you came up with that number. Maybe another post?

That number comes from my post, which you replied to.

http://mail.python.org/pipermail/python-list/2013-February/640056.html

By the way, are you aware that you are setting the X-No-Archive header on
your posts?

> I never gave any numbers, just comparisons.
> 
>> 
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.
> 
> re.compile
> starttime = time.time()
> for i in range(numloops):
>     u2f()
> 
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)

I suggest you go back to my earlier post, the one you responded to, and look
at how I use the timeit module to time small code snippets. Then read the
documentation for it, and the comments in the source code. If you can get
hold of the Python Cookbook, read Tim Peters' comments in that.

http://docs.python.org/2/library/timeit.html
http://docs.python.org/3/library/timeit.html

Oh, one last thing... pulling out "re.compile" outside of the function does
absolutely nothing. You don't even compile anything. It basically looks up
that a compile function exists in the re module, and that's all.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#38389

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-02-07 17:55 -0700
Message-ID	<mailman.1463.1360284981.2939.python-list@python.org>
In reply to	#38385

On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Oh, one last thing... pulling out "re.compile" outside of the function does
> absolutely nothing. You don't even compile anything. It basically looks up
> that a compile function exists in the re module, and that's all.

Using Python 2.7:

>>> t1 = Timer("""
... nx = re.compile(r'https?://(.+)$')
... v = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... ux.sub('_', v)""", """
... import re
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> t2 = Timer("""
... v = nx.search(u).group(1)
... ux.sub('_', v)""", """
... import re
... nx = re.compile(r'https?://(.+)$')
... ux = re.compile(r'([-:./?&=]+)')
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t1.repeat())
11.625409933385388
>>> min(t2.repeat())
8.825254885746652

Whatever caching is being done by re.compile, that's still a 24%
savings by moving the compile calls into the setup.

[toc] | [prev] | [next] | [standalone]

#38397

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-02-08 14:02 +1100
Message-ID	<51146ab6$0$30002$c3e8da3$5496439d@news.astraweb.com>
In reply to	#38389

Ian Kelly wrote:

> On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> Oh, one last thing... pulling out "re.compile" outside of the function
>> does absolutely nothing. You don't even compile anything. It basically
>> looks up that a compile function exists in the re module, and that's all.
> 
> Using Python 2.7:
[...]
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.

That may or may not be the case, but rh didn't compile anything. He
moved "re.compile" literally, with no arguments, out of the timing code.
That clearly does nothing except confirm that re.compile exists.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#38406

From	rh <richard_hubbe11@lavabit.com>
Date	2013-02-07 21:35 -0800
Message-ID	<mailman.1475.1360301738.2939.python-list@python.org>
In reply to	#38397

On Fri, 08 Feb 2013 14:02:14 +1100
Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:

> Ian Kelly wrote:
> 
> > On Thu, Feb 7, 2013 at 4:59 PM, Steven D'Aprano
> > <steve+comp.lang.python@pearwood.info> wrote:
> >> Oh, one last thing... pulling out "re.compile" outside of the
> >> function does absolutely nothing. You don't even compile anything.
> >> It basically looks up that a compile function exists in the re
> >> module, and that's all.
> > 
> > Using Python 2.7:
> [...]
> > Whatever caching is being done by re.compile, that's still a 24%
> > savings by moving the compile calls into the setup.
> 
> That may or may not be the case, but rh didn't compile anything. He
> moved "re.compile" literally, with no arguments, out of the timing
> code. That clearly does nothing except confirm that re.compile exists.

My initial post has the function and in there are two re.compile calls.
I moved those out of the function and see repeatable time efficiency
improvements.

FWIW the fastest so far was posted by Peter Otten and didn't
use regex.

As a new learner of python (or any language) I like to know what
habits will serve me well into the future. So the only reason I look
at the time it takes is as a sanity check to make sure I'm not
learning bad habits.  In this case someone else pointed out time
comparisons and off the thread went into timings!

I did take note of your previous post using timeit and filed
that away into the gray matter for some other day.

> 
> 
> 
> -- 
> Steven
> 

--

[toc] | [prev] | [next] | [standalone]

#38390

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-02-07 18:08 -0700
Message-ID	<mailman.1464.1360285724.2939.python-list@python.org>
In reply to	#38385

On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly@gmail.com> wrote:
> Whatever caching is being done by re.compile, that's still a 24%
> savings by moving the compile calls into the setup.

On the other hand, if you add an re.purge() call to the start of t1 to
clear the cache:

>>> t3 = Timer("""
... re.purge()
... nx = re.compile(r'https?://(.+)$')
... v = nx.search(u).group(1)
... ux = re.compile(r'([-:./?&=]+)')
... ux.sub('_', v)""", """
... import re
... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
>>> min(t3.repeat(number=10000))
3.5532990924824617

Which is approximately 30 times slower, so clearly the regular
expression *is* being cached.  I think what we're seeing here is that
the time needed to look up the compiled regular expression in the
cache is a significant fraction of the time needed to actually execute
it.

[toc] | [prev] | [next] | [standalone]

#38409

From	rh <richard_hubbe11@lavabit.com>
Date	2013-02-07 21:57 -0800
Message-ID	<mailman.1477.1360303085.2939.python-list@python.org>
In reply to	#38385

On Thu, 7 Feb 2013 18:08:00 -0700
Ian Kelly <ian.g.kelly@gmail.com> wrote:

> On Thu, Feb 7, 2013 at 5:55 PM, Ian Kelly <ian.g.kelly@gmail.com>
> wrote:
> > Whatever caching is being done by re.compile, that's still a 24%
> > savings by moving the compile calls into the setup.
> 
> On the other hand, if you add an re.purge() call to the start of t1 to
> clear the cache:
> 
> >>> t3 = Timer("""
> ... re.purge()
> ... nx = re.compile(r'https?://(.+)$')
> ... v = nx.search(u).group(1)
> ... ux = re.compile(r'([-:./?&=]+)')
> ... ux.sub('_', v)""", """
> ... import re
> ... u = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'""")
> >>> min(t3.repeat(number=10000))
> 3.5532990924824617
> 
> Which is approximately 30 times slower, so clearly the regular
> expression *is* being cached.  I think what we're seeing here is that
> the time needed to look up the compiled regular expression in the
> cache is a significant fraction of the time needed to actually execute
> it.

By "actually execute" you mean to apply the compiled expression
to the search or sub? Or do you mean the time needed to compile
the pattern into a regex obj?

I presumed that compiling the pattern at each iteration was expensive
and that's why I expected moving it out of the function to reduce the
time needed to search/sub.

[toc] | [prev] | [next] | [standalone]

#38430

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-02-08 01:21 -0700
Message-ID	<mailman.1492.1360311744.2939.python-list@python.org>
In reply to	#38385

On Thu, Feb 7, 2013 at 10:57 PM, rh <richard_hubbe11@lavabit.com> wrote:
> On Thu, 7 Feb 2013 18:08:00 -0700
> Ian Kelly <ian.g.kelly@gmail.com> wrote:
>
>> Which is approximately 30 times slower, so clearly the regular
>> expression *is* being cached.  I think what we're seeing here is that
>> the time needed to look up the compiled regular expression in the
>> cache is a significant fraction of the time needed to actually execute
>> it.
>
> By "actually execute" you mean to apply the compiled expression
> to the search or sub? Or do you mean the time needed to compile
> the pattern into a regex obj?

The former.  Both are dwarfed by the time needed to compile the pattern.

[toc] | [prev] | [next] | [standalone]

#38443

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2013-02-08 22:43 +1100
Message-ID	<5114e4c8$0$29975$c3e8da3$5496439d@news.astraweb.com>
In reply to	#38430

Ian Kelly wrote:

> On Thu, Feb 7, 2013 at 10:57 PM, rh <richard_hubbe11@lavabit.com> wrote:
>> On Thu, 7 Feb 2013 18:08:00 -0700
>> Ian Kelly <ian.g.kelly@gmail.com> wrote:
>>
>>> Which is approximately 30 times slower, so clearly the regular
>>> expression *is* being cached.  I think what we're seeing here is that
>>> the time needed to look up the compiled regular expression in the
>>> cache is a significant fraction of the time needed to actually execute
>>> it.
>>
>> By "actually execute" you mean to apply the compiled expression
>> to the search or sub? Or do you mean the time needed to compile
>> the pattern into a regex obj?
> 
> The former.  Both are dwarfed by the time needed to compile the pattern.

Surely that depends on the size of the pattern, and the size of the data
being worked on.

Compiling the pattern "s[ai]t" doesn't take that much work, it's only six
characters and very simple. Applying it to:

"sazsid"*1000000 + "sat"

on the other hand may be a tad expensive.

Sweeping generalities about the cost of compiling regexes versus searching
with them are risky.

-- 
Steven

[toc] | [prev] | [next] | [standalone]

#38463

From	Ian Kelly <ian.g.kelly@gmail.com>
Date	2013-02-08 09:26 -0700
Message-ID	<mailman.1513.1360341326.2939.python-list@python.org>
In reply to	#38443

On Fri, Feb 8, 2013 at 4:43 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> Ian Kelly wrote:
> Surely that depends on the size of the pattern, and the size of the data
> being worked on.

Natually.

> Compiling the pattern "s[ai]t" doesn't take that much work, it's only six
> characters and very simple. Applying it to:
>
> "sazsid"*1000000 + "sat"
>
> on the other hand may be a tad expensive.
>
> Sweeping generalities about the cost of compiling regexes versus searching
> with them are risky.

I was referring to the specific timing measurements I made earlier in
this thread, not generalizing.

[toc] | [prev] | [next] | [standalone]

#38960

From	Serhiy Storchaka <storchaka@gmail.com>
Date	2013-02-15 22:58 +0200
Message-ID	<mailman.1847.1360961927.2939.python-list@python.org>
In reply to	#38385

On 08.02.13 03:08, Ian Kelly wrote:
> I think what we're seeing here is that
> the time needed to look up the compiled regular expression in the
> cache is a significant fraction of the time needed to actually execute
> it.

There is a bug issue for this. See http://bugs.python.org/issue16389 .

[toc] | [prev] | [next] | [standalone]

#39995

From	rh <richard_hubbe11@lavabit.com>
Date	2013-02-26 11:20 -0800
Message-ID	<mailman.2569.1361906409.2939.python-list@python.org>
In reply to	#38385

On Fri, 15 Feb 2013 22:58:30 +0200
Serhiy Storchaka <storchaka@gmail.com> wrote:

> On 08.02.13 03:08, Ian Kelly wrote:
> > I think what we're seeing here is that
> > the time needed to look up the compiled regular expression in the
> > cache is a significant fraction of the time needed to actually
> > execute it.
> 
> There is a bug issue for this. See http://bugs.python.org/issue16389 .
> 

I can't tell what is the problem, is it fixed or still in progress?

[toc] | [prev] | [next] | [standalone]

#38420

From	Dave Angel <davea@davea.name>
Date	2013-02-08 01:27 -0500
Message-ID	<mailman.1486.1360304886.2939.python-list@python.org>
In reply to	#38378

On 02/07/2013 06:13 PM, rh wrote:
> On Fri, 08 Feb 2013 09:45:41 +1100
> Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
>
>> <snip>
>>
>> But since you don't demonstrate any actual working code, you could be
>> correct, or you could be timing it wrong. Without seeing your timing
>> code, my guess is that you are doing it wrong. Timing code is tricky,
>> which is why I always show my work. If I get it wrong, someone will
>> hopefully tell me. Otherwise, I might as well be making up the
>> numbers.
>
> re.compile

That statement does explicitly nothing useful.  It certainly doesn't 
compile anything, or call any regex code.

> starttime = time.time()
> for i in range(numloops):
>      u2f()
>
> msg = '\nElapsed {0:.3f}'.format(time.time() - starttime)
> print(msg)
>


-- 
DaveA

[toc] | [prev] | [next] | [standalone]

#38344

From	jmfauth <wxjmfauth@gmail.com>
Date	2013-02-07 03:08 -0800
Message-ID	<d822860f-1a49-4470-9c1d-4a83f364a4bb@ia3g2000vbb.googlegroups.com>
In reply to	#38328

On 7 fév, 04:04, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
> On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote:
> > Well, an alternative /could/ be:
>
> ...
> py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1'
> py> assert u2f(s) == mangle(s)
> py>
> py> from timeit import Timer
> py> setup = 'from __main__ import s, u2f, mangle'
> py> t1 = Timer('mangle(s)', setup)
> py> t2 = Timer('u2f(s)', setup)
> py>
> py> min(t1.repeat(repeat=7))
> 7.2962000370025635
> py> min(t2.repeat(repeat=7))
> 10.981598854064941
> py>
> py> (10.98-7.29)/10.98
> 0.33606557377049184
>
> (Timings done using Python 2.6 on my laptop -- your speeds may vary.)
>

--------


[OT] Sorry, but I find all these "timeit" I see here and there
more and more ridiculous.

Maybe it's the language itself, which became ridiculous.


code:

r = repeat("('WHERE IN THE WORLD IS CARMEN?'*10).lower()")
print('1:', r)

r = repeat("('WHERE IN THE WORLD IS HÉLÈNE?'*10).lower()")
print('2:', r)

t = Timer("re.sub('CARMEN', 'CARMEN', 'WHERE IN THE WORLD IS
CARMEN?'*10)", "import re")
r = t.repeat()
print('3:', r)

t = Timer("re.sub('HÉLÈNE', 'HÉLÈNE', 'WHERE IN THE WORLD IS
HÉLÈNE?'*10)", "import re")
r = t.repeat()
print('4:', r)

result:

>c:\python32\pythonw -u "vitesse3.py"
1: [2.578785478740226, 2.5738459157233833, 2.5739002658825543]
2: [2.57605654937141, 2.5784755252962572, 2.5775366066044896]
3: [11.856728254324088, 11.856321809655501, 11.857456073846905]
4: [12.111787643688231, 12.102743462128771, 12.098514783440208]
>Exit code: 0
>c:\Python33\pythonw -u "vitesse3.py"
1: [0.6063335264470632, 0.6104798922133946, 0.6078580877959869]
2: [4.080205081267272, 4.079303183698418, 4.0786836706522145]
3: [18.093742209318215, 18.079666699618095, 18.07107661757692]
4: [18.852576768615222, 18.841418050790622, 18.840745369110437]
>Exit code: 0

The future is bright for ... ascii users.

jmf

[toc] | [prev] | [next] | [standalone]

#38346

From	Chris Angelico <rosuav@gmail.com>
Date	2013-02-07 23:44 +1100
Message-ID	<mailman.1446.1360241081.2939.python-list@python.org>
In reply to	#38344

On Thu, Feb 7, 2013 at 10:08 PM, jmfauth <wxjmfauth@gmail.com> wrote:
> The future is bright for ... ascii users.
>
> jmf

So you're admitting to being not very bright?

*ducks*

Seriously jmf, please don't hijack threads just to whine about
contrived issues of Unicode performance yet again. That horse is dead.
Go fork Python and reimplement buggy narrow builds if you want to, the
rest of us are happy with a bug-free Python.

ChrisA

[toc] | [prev] | [standalone]

csiph-web

Re: Curious to see alternate approach on a search/replace via regex

Contents

#38304 — Re: Curious to see alternate approach on a search/replace via regex

#38328

#38329

#38378

#38382

#38385

#38389

#38397

#38406

#38390

#38409

#38430

#38443

#38463

#38960

#39995

#38420

#38344

#38346