Path: csiph.com!usenet.pasdenom.info!news.albasani.net!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.005 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'result,': 0.05; 'python': 0.09; 'be:': 0.09; 'method:': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'slow.': 0.09; 'def': 0.10; "'_')": 0.16; "'from": 0.16; '2.7.3': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'subject:search': 0.16; 'wed,': 0.16; 'wrote:': 0.17; 'tests': 0.18; 'feb': 0.19; 'mostly': 0.20; 'import': 0.21; "i'd": 0.22; 'header:User-Agent:1': 0.26; 'compiled': 0.27; 'plain': 0.27; '2.6': 0.27; 'header:X -Complaints-To:1': 0.28; 'subject:/': 0.28; 'skip:( 20': 0.28; 'assert': 0.29; "d'aprano": 0.29; 'i/o': 0.29; 'steven': 0.29; "skip:' 10": 0.30; 'function': 0.30; 'seconds': 0.30; 'print': 0.32; 'to:addr:python-list': 0.33; 'done': 0.34; 'compared': 0.35; 'faster': 0.35; 'received:org': 0.36; 'data.': 0.36; 'method': 0.36; 'charset:us-ascii': 0.36; 'skip:p 20': 0.36; 'quite': 0.37; 'far': 0.37; 'subject:: ': 0.38; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'skip:u 10': 0.60; 'skip:a 40': 0.61; 'more': 0.63; 'show': 0.63; 'of:': 0.65; 'laptop': 0.66; 'url:a': 0.72; '2013': 0.84; 'dispute': 0.84; 'received:sd.cox.net': 0.84; 'subject:via': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: rh Subject: Re: Curious to see alternate approach on a search/replace via regex Date: Wed, 6 Feb 2013 19:31:08 -0800 References: <511319c7$0$21812$c3e8da3$76491128@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: ip68-227-87-145.sb.sd.cox.net User-Agent: dsodnetnin X-Mailer: EZnn0.37p X-Newsreader: EZnn0.37p X-Gmane-NNTP-Posting-Host: EZnn0.37p Original-Received: from slem by 1.1 with local X-No-Archive: yes Archive: no X-Archive: expiry=11 X-Archive: encrypt X-Operating-System: Barebones_6.1 X-Gmane-NNTP-Posting-Host: 192.168.1.1 X-NNTP-Posting-Host: 192.168.1.1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 85 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1360207874 news.xs4all.nl 6849 [2001:888:2000:d::a6]:41769 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:38329 On 07 Feb 2013 03:04:39 GMT Steven D'Aprano wrote: > On Wed, 06 Feb 2013 13:55:58 -0800, Demian Brecht wrote: > > > Well, an alternative /could/ be: > > > > from urlparse import urlparse > > > > parts = > > urlparse('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1') > > print '%s%s_%s' % (parts.netloc.replace('.', '_'), > > parts.path.replace('/', '_'), > > parts.query.replace('&', '_').replace('=', '_') ) > > > > > > Although with the result of: > > > > alongnameofasite1234567_com_q_sports_run_a_1_b_1 > > 1288 function calls in 0.004 seconds > > > > > > Compared to regex method: > > > > 498 function calls (480 primitive calls) in 0.000 seconds > > > > I'd prefer the regex method myself. > > I dispute those results. I think you are mostly measuring the time to > print the result, and I/O is quite slow. My tests show that using > urlparse is 33% faster than using regexes, and far more > understandable and maintainable. > > > py> from urlparse import urlparse > py> def mangle(url): > ... parts = urlparse(url) > ... return '%s%s_%s' % (parts.netloc.replace('.', '_'), > ... parts.path.replace('/', '_'), > ... parts.query.replace('&', '_').replace('=', '_') > ... ) > ... > py> import re > py> def u2f(u): > ... nx = re.compile(r'https?://(.+)$') > ... u = nx.search(u).group(1) > ... ux = re.compile(r'([-:./?&=]+)') > ... return ux.sub('_', u) > ... > py> s = 'http://alongnameofasite1234567.com/q?sports=run&a=1&b=1' > py> assert u2f(s) == mangle(s) > py> > py> from timeit import Timer > py> setup = 'from __main__ import s, u2f, mangle' > py> t1 = Timer('mangle(s)', setup) > py> t2 = Timer('u2f(s)', setup) > py> > py> min(t1.repeat(repeat=7)) > 7.2962000370025635 > py> min(t2.repeat(repeat=7)) > 10.981598854064941 > py> > py> (10.98-7.29)/10.98 > 0.33606557377049184 > > > (Timings done using Python 2.6 on my laptop -- your speeds may vary.) I am using 2.7.3 and I put the re.compile outside the function and it performed faster than urlparse. I don't print out the data. Fast ^ | compiled regex | urlparse | plain regex | all-at-once search/replace with alternation Slow > > > > -- > Steven