Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #38407
| From | rh <richard_hubbe11@lavabit.com> |
|---|---|
| Subject | Re: Curious to see alternate approach on a search/replace via regex |
| Date | 2013-02-07 21:47 -0800 |
| References | <mailman.1425.1360186878.2939.python-list@python.org> <14f76032-8753-4e17-897c-447242180e69@googlegroups.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1476.1360302430.2939.python-list@python.org> (permalink) |
On Thu, 7 Feb 2013 04:53:22 -0800 (PST)
Nick Mellor <thebalancepro@gmail.com> wrote:
> Hi RH,
>
> translate methods might be faster (and a little easier to read) for
> your use case. Just precompute and re-use the translation table
> punct_flatten.
>
> Note that the translate method has changed somewhat for Python 3 due
> to the separation of text from bytes. The is a Python 3 version.
>
> from urllib.parse import urlparse
>
> flattened_chars = "./&=?"
> punct_flatten = str.maketrans(flattened_chars, '_' * len
> (flattened_chars)) parts = urlparse
> ('http://alongnameofasite1234567.com/q?sports=run&a=1&b=1')
> unflattened = parts.netloc + parts.path + parts.query flattened =
> unflattened.translate(punct_flatten) print (flattened)
I like the idea of using a library but since I'm learning python I wanted
to try out the regex stuff. I haven't looked but I'd think that urllib might
(should?) have a builtin so that one wouldn't have to specify the
flattened_chars list. I'm sure there's a name for those chars but I don't know
it. Maybe just punctuation??
Also my version converts the ? into _ but urllib sees that as the query
separator and removes it. Just point this out for completeness sake.
This would mimic what I did:
unflattened = parts.netloc + parts.path + '_' + parts.query
>
> Cheers,
>
> Nick
>
> On Thursday, 7 February 2013 08:41:05 UTC+11, rh wrote:
> > I am curious to know if others would have done this differently.
> > And if so
> >
> > how so?
> >
> >
> >
> > This converts a url to a more easily managed filename, stripping the
> >
> > http protocol off.
> >
> >
> >
> > This:
> >
> >
> >
> > http://alongnameofasite1234567.com/q?sports=run&a=1&b=1
> >
> >
> >
> > becomes this:
> >
> >
> >
> > alongnameofasite1234567_com_q_sports_run_a_1_b_1
> >
> >
> >
> >
> >
> > def u2f(u):
> >
> > nx = re.compile(r'https?://(.+)$')
> >
> > u = nx.search(u).group(1)
> >
> > ux = re.compile(r'([-:./?&=]+)')
> >
> > return ux.sub('_', u)
> >
> >
> >
> > One alternate is to not do the compile step. There must also be a
> > way to
> >
> > do it all at once. i.e. remove the protocol and replace the chars.
--
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-06 13:41 -0800
Re: Curious to see alternate approach on a search/replace via regex Roy Smith <roy@panix.com> - 2013-02-06 16:54 -0500
Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-07 04:53 -0800
Re: Curious to see alternate approach on a search/replace via regex rh <richard_hubbe11@lavabit.com> - 2013-02-07 21:47 -0800
Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-08 00:53 -0800
Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-08 00:53 -0800
Re: Curious to see alternate approach on a search/replace via regex Nick Mellor <thebalancepro@gmail.com> - 2013-02-07 04:53 -0800
csiph-web