Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.datemas.de!rt.uk.eu.org!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Subject: Re: Why is regex so slow?
Mime-Version: 1.0 (Apple Message framework v1283)
Content-Type: text/plain; charset=us-ascii
From: Roy Smith <roy@panix.com>
In-Reply-To: <CANc-5UyouN5EQw_GDecaAR+inyAB26g=e2pZ=zBuspXdmxJh+Q@mail.gmail.com>
Date: Tue, 18 Jun 2013 13:08:52 -0400
Content-Transfer-Encoding: quoted-printable
References: <kpq2r9$gg6$1@panix2.panix.com> <CANc-5UyouN5EQw_GDecaAR+inyAB26g=e2pZ=zBuspXdmxJh+Q@mail.gmail.com>
To: Skip Montanaro <skip@pobox.com>
Cc: python-list@python.org
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.3544.1371575342.3114.python-list@python.org>
Lines: 30
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:48644


On Jun 18, 2013, at 1:01 PM, Skip Montanaro wrote:

>> I don't understand why the first way is so much slower.
>=20
> I have no obvious answers, but a couple suggestions:
>=20
> 1. Can you anchor the pattern at the beginning of the line?  (use
> match() instead of search())

That's one of the things we tried.  Didn't make any difference.

> 2. Does it get faster it you eliminate the "(.*)" part of the pattern?

Just tried that, it also didn't make any difference.

> It seems that if you find a line matching the first part of the
> pattern, you could just as easily split the line yourself instead of
> creating a group.


At this point, I'm not so much interested in making this faster as =
understanding why it's so slow.  I'm tempted to open this up as a =
performance bug against the regex module (which I assume will be =
rejected, at least for the 2.x series).

---
Roy Smith
roy@panix.com