Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #38376 > unrolled thread

Re: Curious to see alternate approach on a search/replace via regex

Started byrh <richard_hubbe11@lavabit.com>
First post2013-02-07 13:04 -0800
Last post2013-02-07 13:04 -0800
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 13:04 -0800

#38376 — Re: Curious to see alternate approach on a search/replace via regex

Fromrh <richard_hubbe11@lavabit.com>
Date2013-02-07 13:04 -0800
SubjectRe: Curious to see alternate approach on a search/replace via regex
Message-ID<mailman.1458.1360271112.2939.python-list@python.org>
On Thu, 07 Feb 2013 10:49:06 +0100
Peter Otten <__peter__@web.de> wrote:

> rh wrote:
> 
> > I am curious to know if others would have done this differently.
> > And if so how so?
> > 
> > This converts a url to a more easily managed filename, stripping the
> > http protocol off.
> > 
> > This:
> >  
> > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
> > 
> > becomes this:
> > 
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> > 
> > 
> > def u2f(u):
> >     nx = re.compile(r'https?://(.+)$')
> >     u = nx.search(u).group(1)
> >     ux = re.compile(r'([-:./?&=]+)')
> >     return ux.sub('_', u)
> > 
> > One alternate is to not do the compile step. There must also be a
> > way to do it all at once. i.e. remove the protocol and replace the
> > chars.
> 
> Completely without regular expressions:
> 
> import string
> 
> ILLEGAL = "-:./?&="
> try:
>     TRANS = string.maketrans(ILLEGAL, "_" * len(ILLEGAL))
> except AttributeError:
>     # python 3
>     TRANS = dict.fromkeys(map(ord, ILLEGAL), "_")
> 
> PROTOCOLS = {"http", "https"}
> 
> def url_to_file(url):
>     protocol, sep, rest = url.partition("://")
>     if protocol not in PROTOCOLS:
>         raise ValueError
>     return rest.translate(TRANS)
> 
> if __name__ == "__main__":
>     url = "http://alongnameofasite1234567.com/q?sports=run&a=1&b=1"
>     print(url)
>     print(url_to_file(url))

2.7.3 is 85% faster than 3.3.0
(no printing in my test)

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web