Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #38431

Re: Curious to see alternate approach on a search/replace via regex

X-Received by 10.224.189.78 with SMTP id dd14mr2952991qab.0.1360313593089; Fri, 08 Feb 2013 00:53:13 -0800 (PST)
X-Received by 10.49.95.68 with SMTP id di4mr388866qeb.0.1360313593069; Fri, 08 Feb 2013 00:53:13 -0800 (PST)
Path csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!npeer01.iad.highwinds-media.com!news.highwinds-media.com!feed-me.highwinds-media.com!p13no2071046qai.0!news-out.google.com!k2ni19278qap.0!nntp.google.com!p13no2071040qai.0!postnews.google.com!glegroupsg2000goo.googlegroups.com!not-for-mail
Newsgroups comp.lang.python
Date Fri, 8 Feb 2013 00:53:12 -0800 (PST)
In-Reply-To <mailman.1476.1360302430.2939.python-list@python.org>
Complaints-To groups-abuse@google.com
Injection-Info glegroupsg2000goo.googlegroups.com; posting-host=124.168.207.54; posting-account=ZAg6xAoAAAAmY8bBi3VzYjWntm8Ct1P8
NNTP-Posting-Host 124.168.207.54
References <mailman.1425.1360186878.2939.python-list@python.org> <14f76032-8753-4e17-897c-447242180e69@googlegroups.com> <mailman.1476.1360302430.2939.python-list@python.org>
User-Agent G2/1.0
MIME-Version 1.0
Message-ID <d325e0c9-977d-415f-8bde-1e2e38ef1f19@googlegroups.com> (permalink)
Subject Re: Curious to see alternate approach on a search/replace via regex
From Nick Mellor <thebalancepro@gmail.com>
Cc python-list@python.org
Injection-Date Fri, 08 Feb 2013 08:53:13 +0000
Content-Type text/plain; charset=ISO-8859-1
X-Received-Bytes 4483
Xref csiph.com comp.lang.python:38431

Show key headers only | View raw


Hi RH,

It's essential to know about regex, of course, but often there's a better, easier-to-read way to do things in Python.

One of Python's aims is clarity and ease of reading.

Regex is complex, potentially inefficient and hard to read (as well as being the only reasonable way to do things sometimes.)

Best,

Nick

On Friday, 8 February 2013 16:47:03 UTC+11, rh  wrote:
> On Thu, 7 Feb 2013 04:53:22 -0800 (PST)
> 
> Nick Mellor <t...o@gmail.com> wrote:
> 
> 
> 
> > Hi RH,
> 
> > 
> 
> > translate methods might be faster (and a little easier to read) for
> 
> > your use case. Just precompute and re-use the translation table
> 
> > punct_flatten.
> 
> > 
> 
> > Note that the translate method has changed somewhat for Python 3 due
> 
> > to the separation of text from bytes. The is a Python 3 version.
> 
> > 
> 
> > from urllib.parse import urlparse
> 
> > 
> 
> > flattened_chars = "./&=?"
> 
> > punct_flatten = str.maketrans(flattened_chars, '_' * len
> 
> > (flattened_chars)) parts = urlparse
> 
> > ('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> 
> > unflattened = parts.netloc + parts.path + parts.query flattened =
> 
> > unflattened.translate(punct_flatten) print (flattened)
> 
> 
> 
> I like the idea of using a library but since I'm learning python I wanted
> 
> to try out the regex stuff. I haven't looked but I'd think that urllib might
> 
> (should?) have a builtin so that one wouldn't have to specify the 
> 
> flattened_chars list. I'm sure there's a name for those chars but I don't know
> 
> it. Maybe just punctuation??
> 
> 
> 
> Also my version converts the ? into _ but urllib sees that as the query
> 
> separator and removes it. Just point this out for completeness sake.
> 
> 
> 
> This would mimic what I did:
> 
> unflattened = parts.netloc + parts.path + '_' + parts.query
> 
> 
> 
> > 
> 
> > Cheers,
> 
> > 
> 
> > Nick
> 
> > 
> 
> > On Thursday, 7 February 2013 08:41:05 UTC+11, rh  wrote:
> 
> > > I am curious to know if others would have done this differently.
> 
> > > And if so
> 
> > > 
> 
> > > how so?
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > This converts a url to a more easily managed filename, stripping the
> 
> > > 
> 
> > > http protocol off. 
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > This:
> 
> > > 
> 
> > >  
> 
> > > 
> 
> > > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > becomes this:
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > def u2f(u):
> 
> > > 
> 
> > >     nx = re.compile(r'https?://(.+)$')
> 
> > > 
> 
> > >     u = nx.search(u).group(1)
> 
> > > 
> 
> > >     ux = re.compile(r'([-:./?&=]+)')
> 
> > > 
> 
> > >     return ux.sub('_', u)
> 
> > > 
> 
> > > 
> 
> > > 
> 
> > > One alternate is to not do the compile step. There must also be a
> 
> > > way to
> 
> > > 
> 
> > > do it all at once. i.e. remove the protocol and replace the chars.
> 
> 
> 
> 
> 
> --

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-06 13:41 -0800
  Re: Curious to see alternate approach on a search/replace via regex Roy Smith <roy@panix.com> - 2013-02-06 16:54 -0500
  Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-07 04:53 -0800
    Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:47 -0800
      Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-08 00:53 -0800
      Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-08 00:53 -0800
  Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-07 04:53 -0800

csiph-web