Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #53176

Re: String splitting with exceptions

From Peter Otten <__peter__@web.de>
Subject Re: String splitting with exceptions
Date 2013-08-28 20:31 +0200
Organization None
References <kvl9e5$19gk$1@leila.iecc.com> <b86skbFf4flU1@mid.individual.net>
Newsgroups comp.lang.python
Message-ID <mailman.321.1377714653.19984.python-list@python.org> (permalink)

Show all headers | View raw


Neil Cerutti wrote:

> On 2013-08-28, John Levine <johnl@iecc.com> wrote:
>> I have a crufty old DNS provisioning system that I'm rewriting and I
>> hope improving in python.  (It's based on tinydns if you know what
>> that is.)
>>
>> The record formats are, in the worst case, like this:
>>
>> foo.[DOM]::[IP6::4361:6368:6574]:600::
>>
>> What I would like to do is to split this string into a list like this:
>>
>> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>>
>> Colons are separators except when they're inside square
>> brackets.  I have been messing around with re.split() and
>> re.findall() and haven't been able to come up with either a
>> working separator pattern for split() or a working field
>> pattern for findall().  I came pretty close with findall() but
>> can't get it to reliably match the nothing between two adjacent
>> colons not inside brackets.
>>
>> Any suggestions? I realize I could do it in a loop where I pick
>> stuff off the front of the string, but yuck.
> 
> A little parser, as Skip suggested, is a good way to go.
> 
> The brackets make your string context-sensitive, a difficult
> concept to cleanly parse with a regex.
> 
> I initially hoped a csv module dialect could work, but the quote
> character is (currently) hard-coded to be a single, simple
> character, i.e., I can't tell it to treat [xxx] as "xxx".
> 
> What about Skip's suggestion? A little parser. It might seem
> crass or something, but it really is easier than musceling a
> regex into a context sensitive grammer.
> 
> def dns_split(s):
>     in_brackets = False
>     b = 0 # index of beginning of current string
>     for i, c in enumerate(s):
>         if not in_brackets:
>             if c == "[":
>                 in_brackets = True
>             elif c == ':':
>                 yield s[b:i]
>                 b = i+1
>         elif c == "]":
>             in_brackets = False

I think you need one more yield outside the loop.

>>>> print(list(dns_split(s)))
> ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
> 
> It'll gag on nested brackets (fixable with a counter) and has no
> error handling (requires thought), but it's a start.
 
Something similar on top of regex:

>>> def split(s):
...     start = level = 0
...     for m in re.compile(r"[[:\]]").finditer(s):
...             if m.group() == "[": level += 1
...             elif m.group() == "]":
...                     assert level
...                     level -= 1
...             elif level == 0:
...                     yield s[start:m.start()]
...                     start = m.end()
...     yield s[start:]
... 
>>> list(split("a[b:c:]:d"))
['a[b:c:]', 'd']
>>> list(split("a[b:c[:]]:d"))
['a[b:c[:]]', 'd']
>>> list(split(""))
['']
>>> list(split(":"))
['', '']
>>> list(split(":x"))
['', 'x']
>>> list(split("[:x]"))
['[:x]']
>>> list(split(":[:x]"))
['', '[:x]']
>>> list(split(":[:[:]:x]"))
['', '[:[:]:x]']
>>> list(split("[:::]"))
['[:::]']
>>> s = "foo.[DOM]::[IP6::4361:6368:6574]:600::"
>>> list(split(s))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

Note that there is one more empty string which I believe the OP forgot.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 16:44 +0000
  Re: String splitting with exceptions Skip Montanaro <skip@pobox.com> - 2013-08-28 11:55 -0500
  Re: String splitting with exceptions random832@fastmail.us - 2013-08-28 13:14 -0400
    Re: String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 21:35 +0000
  Re: String splitting with exceptions Tim Chase <python.list@tim.thechases.com> - 2013-08-28 12:32 -0500
    Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:18 +0000
  Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:08 +0000
    Re: String splitting with exceptions Peter Otten <__peter__@web.de> - 2013-08-28 20:31 +0200
  Re: String splitting with exceptions wxjmfauth@gmail.com - 2013-08-29 00:26 -0700

csiph-web