Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53174
| From | Neil Cerutti <neilc@norwich.edu> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: String splitting with exceptions |
| Date | 2013-08-28 18:08 +0000 |
| Organization | Norwich University |
| Message-ID | <b86skbFf4flU1@mid.individual.net> (permalink) |
| References | <kvl9e5$19gk$1@leila.iecc.com> |
On 2013-08-28, John Levine <johnl@iecc.com> wrote:
> I have a crufty old DNS provisioning system that I'm rewriting and I
> hope improving in python. (It's based on tinydns if you know what
> that is.)
>
> The record formats are, in the worst case, like this:
>
> foo.[DOM]::[IP6::4361:6368:6574]:600::
>
> What I would like to do is to split this string into a list like this:
>
> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>
> Colons are separators except when they're inside square
> brackets. I have been messing around with re.split() and
> re.findall() and haven't been able to come up with either a
> working separator pattern for split() or a working field
> pattern for findall(). I came pretty close with findall() but
> can't get it to reliably match the nothing between two adjacent
> colons not inside brackets.
>
> Any suggestions? I realize I could do it in a loop where I pick
> stuff off the front of the string, but yuck.
A little parser, as Skip suggested, is a good way to go.
The brackets make your string context-sensitive, a difficult
concept to cleanly parse with a regex.
I initially hoped a csv module dialect could work, but the quote
character is (currently) hard-coded to be a single, simple
character, i.e., I can't tell it to treat [xxx] as "xxx".
What about Skip's suggestion? A little parser. It might seem
crass or something, but it really is easier than musceling a
regex into a context sensitive grammer.
def dns_split(s):
in_brackets = False
b = 0 # index of beginning of current string
for i, c in enumerate(s):
if not in_brackets:
if c == "[":
in_brackets = True
elif c == ':':
yield s[b:i]
b = i+1
elif c == "]":
in_brackets = False
>>> print(list(dns_split(s)))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
It'll gag on nested brackets (fixable with a counter) and has no
error handling (requires thought), but it's a start.
--
Neil Cerutti
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 16:44 +0000
Re: String splitting with exceptions Skip Montanaro <skip@pobox.com> - 2013-08-28 11:55 -0500
Re: String splitting with exceptions random832@fastmail.us - 2013-08-28 13:14 -0400
Re: String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 21:35 +0000
Re: String splitting with exceptions Tim Chase <python.list@tim.thechases.com> - 2013-08-28 12:32 -0500
Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:18 +0000
Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:08 +0000
Re: String splitting with exceptions Peter Otten <__peter__@web.de> - 2013-08-28 20:31 +0200
Re: String splitting with exceptions wxjmfauth@gmail.com - 2013-08-29 00:26 -0700
csiph-web