Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #38376

Re: Curious to see alternate approach on a search/replace via regex

From rh <richard_hubbe11@lavabit.com>
Subject Re: Curious to see alternate approach on a search/replace via regex
Date 2013-02-07 13:04 -0800
References <20130206134105.9352d665e1f8d0719021fcf9@lavabit.com> <kevta5$ii3$1@ger.gmane.org>
Newsgroups comp.lang.python
Message-ID <mailman.1458.1360271112.2939.python-list@python.org> (permalink)

Show all headers | View raw


On Thu, 07 Feb 2013 10:49:06 +0100
Peter Otten <__peter__@web.de> wrote:

> rh wrote:
> 
> > I am curious to know if others would have done this differently.
> > And if so how so?
> > 
> > This converts a url to a more easily managed filename, stripping the
> > http protocol off.
> > 
> > This:
> >  
> > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
> > 
> > becomes this:
> > 
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> > 
> > 
> > def u2f(u):
> >     nx = re.compile(r'https?://(.+)$')
> >     u = nx.search(u).group(1)
> >     ux = re.compile(r'([-:./?&=]+)')
> >     return ux.sub('_', u)
> > 
> > One alternate is to not do the compile step. There must also be a
> > way to do it all at once. i.e. remove the protocol and replace the
> > chars.
> 
> Completely without regular expressions:
> 
> import string
> 
> ILLEGAL = "-:./?&="
> try:
>     TRANS = string.maketrans(ILLEGAL, "_" * len(ILLEGAL))
> except AttributeError:
>     # python 3
>     TRANS = dict.fromkeys(map(ord, ILLEGAL), "_")
> 
> PROTOCOLS = {"http", "https"}
> 
> def url_to_file(url):
>     protocol, sep, rest = url.partition("://")
>     if protocol not in PROTOCOLS:
>         raise ValueError
>     return rest.translate(TRANS)
> 
> if __name__ == "__main__":
>     url = "http://alongnameofasite1234567.com/q?sports=run&a=1&b=1"
>     print(url)
>     print(url_to_file(url))

2.7.3 is 85% faster than 3.3.0
(no printing in my test)

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 13:04 -0800

csiph-web