Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.003 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python,': 0.02; 'tree': 0.05; 'string': 0.09; '"if': 0.09; 'collier': 0.09; 'data):': 0.09; 'data:': 0.09; 'optimizing': 0.09; 'suffix': 0.09; 'python': 0.11; 'systems.': 0.12; "'b'": 0.16; 'sequential': 0.16; 'all.': 0.16; 'wrote:': 0.18; 'module': 0.19; '>>>': 0.22; 'python?': 0.22; 'header:User-Agent:1': 0.23; 'lets': 0.24; 'header:In-Reply- To:1': 0.27; 'to:2**1': 0.27; 'array': 0.29; 'character': 0.29; 'unix': 0.29; 'received:10.0.0': 0.31; '100000': 0.31; '13,': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'volunteers': 0.31; 'up.': 0.33; 'ago': 0.33; 'fri,': 0.33; 'plain': 0.33; 'could': 0.34; 'basic': 0.35; 'johnson': 0.35; 'operations': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'indexed': 0.36; 'module.': 0.36; 'received:10.0': 0.36; 'experience,': 0.37; 'received:10': 0.37; 'to:addr:python-list': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'algorithms': 0.60; 'july': 0.63; 'love': 0.65; 'to:addr:gmail.com': 0.65; 'tasks.': 0.68; 'jul': 0.74; 'paper': 0.75; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=message-id:date:from:user-agent:mime-version:to:subject:references :in-reply-to:content-type:content-transfer-encoding; bh=4Dp9HbdCH9VnF93lG7kjzIb00vitZJTX30kGqjsF7f0=; b=znb1u1jfwpg7YzfWk4arFk/LWwPjId5ZzzYgI57tUZFaqMPaWoyMJVRhYJiTiFZOVA RIr95fp3avpYNu0mpvRLfzjN4cQ44E4dI0qQBIkmhEj76HCcr4cKzokqlIgLpWyrNQm7 S9rzN31S8rFSzjOvhTALncUv+Ll81UB4napTCQYmPXORkxJEdbDnpKpuOT4kNAa01Chm Pvuk6TRYEex+ub9oqWDU95EQg8zyba1ggbGPaIFPa1fnHPnMXJ1cQrvEfCOOs0qLhTMJ /dXtaRALAk2S4hyLJhvcu2kayy4M71jVKjzwUUoZuRacO3y/R1dbKDnFfBfae38ol1iT cC2Q== X-Received: by 10.50.32.70 with SMTP id g6mr6643836igi.2.1373882769102; Mon, 15 Jul 2013 03:06:09 -0700 (PDT) Date: Mon, 15 Jul 2013 06:06:06 -0400 From: Devyn Collier Johnson User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130623 Thunderbird/17.0.7 MIME-Version: 1.0 To: 88888 Dihedral , Python Mailing List Subject: Re: RE Module Performance References: <571a6dfe-fd66-42cf-92fc-8b97cbe6e9e4@googlegroups.com> <51DFDE65.5040001@Gmail.com> <51e0e7aa$0$9505$c3e8da3$5496439d@news.astraweb.com> <9a97b618-3e80-4149-9155-14fb210a0758@googlegroups.com> In-Reply-To: <9a97b618-3e80-4149-9155-14fb210a0758@googlegroups.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 58 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1373882772 news.xs4all.nl 15967 [2001:888:2000:d::a6]:41343 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:50666 On 07/14/2013 02:17 PM, 88888 Dihedral wrote: > On Saturday, July 13, 2013 1:37:46 PM UTC+8, Steven D'Aprano wrote: >> On Fri, 12 Jul 2013 13:58:29 -0400, Devyn Collier Johnson wrote: >> >> >> >>> I plan to spend some time optimizing the re.py module for Unix systems. >>> I would love to amp up my programs that use that module. >> >> >> In my experience, often the best way to optimize a regex is to not use it >> >> at all. >> >> >> >> [steve@ando ~]$ python -m timeit -s "import re" \ >> >>> -s "data = 'a'*100+'b'" \ >>> "if re.search('b', data): pass" >> 100000 loops, best of 3: 2.77 usec per loop >> >> >> >> [steve@ando ~]$ python -m timeit -s "data = 'a'*100+'b'" \ >> >>> "if 'b' in data: pass" >> 1000000 loops, best of 3: 0.219 usec per loop >> >> >> >> In Python, we often use plain string operations instead of regex-based >> >> solutions for basic tasks. Regexes are a 10lb sledge hammer. Don't use >> >> them for cracking peanuts. >> >> >> >> >> >> >> >> -- >> >> Steven > OK, lets talk about the indexed search algorithms of > a character streamor strig which can be buffered and > indexed randomly for RW operations but faster in sequential > block RW operations after some pre-processing. > > This was solved long time ago in the suffix array or > suffix tree part and summarized in the famous BWT paper in 199X. > > Do we want volunteers to speed up > search operations in the string module in Python? It would be nice if someone could speed it up.