Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #38376
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Subject | Re: Curious to see alternate approach on a search/replace via regex |
| Date | 2013-02-07 13:04 -0800 |
| References | <20130206134105.9352d665e1f8d0719021fcf9@lavabit.com> <kevta5$ii3$1@ger.gmane.org> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1458.1360271112.2939.python-list@python.org> (permalink) |
On Thu, 07 Feb 2013 10:49:06 +0100
Peter Otten <__peter__@web.de> wrote:
> rh wrote:
>
> > I am curious to know if others would have done this differently.
> > And if so how so?
> >
> > This converts a url to a more easily managed filename, stripping the
> > http protocol off.
> >
> > This:
> >
> > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
> >
> > becomes this:
> >
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> >
> >
> > def u2f(u):
> > nx = re.compile(r'https?://(.+)$')
> > u = nx.search(u).group(1)
> > ux = re.compile(r'([-:./?&=]+)')
> > return ux.sub('_', u)
> >
> > One alternate is to not do the compile step. There must also be a
> > way to do it all at once. i.e. remove the protocol and replace the
> > chars.
>
> Completely without regular expressions:
>
> import string
>
> ILLEGAL = "-:./?&="
> try:
> TRANS = string.maketrans(ILLEGAL, "_" * len(ILLEGAL))
> except AttributeError:
> # python 3
> TRANS = dict.fromkeys(map(ord, ILLEGAL), "_")
>
> PROTOCOLS = {"http", "https"}
>
> def url_to_file(url):
> protocol, sep, rest = url.partition("://")
> if protocol not in PROTOCOLS:
> raise ValueError
> return rest.translate(TRANS)
>
> if __name__ == "__main__":
> url = "http://alongnameofasite1234567.com/q?sports=run&a=1&b=1"
> print(url)
> print(url_to_file(url))
2.7.3 is 85% faster than 3.3.0
(no printing in my test)
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 13:04 -0800
csiph-web