Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #53174

Re: String splitting with exceptions

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!us.feeder.erje.net!news2.arglkargh.de!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From Neil Cerutti <neilc@norwich.edu>
Newsgroups comp.lang.python
Subject Re: String splitting with exceptions
Date 28 Aug 2013 18:08:11 GMT
Organization Norwich University
Lines 58
Message-ID <b86skbFf4flU1@mid.individual.net> (permalink)
References <kvl9e5$19gk$1@leila.iecc.com>
Mime-Version 1.0
Content-Type text/plain; charset=us-ascii
Content-Transfer-Encoding 7bit
X-Trace individual.net A7qipYftSQkG6RWnzWZbsQ/tWv/HsDFWVpsEYwbD93MXOauz2Y
Cancel-Lock sha1:DsNWnCcvUQUogH8lFVv4QL7eiHg=
User-Agent slrn/0.9.9p1/mm/ao (Win32)
Xref csiph.com comp.lang.python:53174

Show key headers only | View raw


On 2013-08-28, John Levine <johnl@iecc.com> wrote:
> I have a crufty old DNS provisioning system that I'm rewriting and I
> hope improving in python.  (It's based on tinydns if you know what
> that is.)
>
> The record formats are, in the worst case, like this:
>
> foo.[DOM]::[IP6::4361:6368:6574]:600::
>
> What I would like to do is to split this string into a list like this:
>
> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>
> Colons are separators except when they're inside square
> brackets.  I have been messing around with re.split() and
> re.findall() and haven't been able to come up with either a
> working separator pattern for split() or a working field
> pattern for findall().  I came pretty close with findall() but
> can't get it to reliably match the nothing between two adjacent
> colons not inside brackets.
>
> Any suggestions? I realize I could do it in a loop where I pick
> stuff off the front of the string, but yuck.

A little parser, as Skip suggested, is a good way to go.

The brackets make your string context-sensitive, a difficult
concept to cleanly parse with a regex.

I initially hoped a csv module dialect could work, but the quote
character is (currently) hard-coded to be a single, simple
character, i.e., I can't tell it to treat [xxx] as "xxx".

What about Skip's suggestion? A little parser. It might seem
crass or something, but it really is easier than musceling a
regex into a context sensitive grammer.

def dns_split(s):
    in_brackets = False
    b = 0 # index of beginning of current string
    for i, c in enumerate(s):
        if not in_brackets:
            if c == "[":
                in_brackets = True
            elif c == ':':
                yield s[b:i]
                b = i+1
        elif c == "]":
            in_brackets = False

>>> print(list(dns_split(s)))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']

It'll gag on nested brackets (fixable with a counter) and has no
error handling (requires thought), but it's a start.

-- 
Neil Cerutti

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 16:44 +0000
  Re: String splitting with exceptions Skip Montanaro <skip@pobox.com> - 2013-08-28 11:55 -0500
  Re: String splitting with exceptions random832@fastmail.us - 2013-08-28 13:14 -0400
    Re: String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 21:35 +0000
  Re: String splitting with exceptions Tim Chase <python.list@tim.thechases.com> - 2013-08-28 12:32 -0500
    Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:18 +0000
  Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:08 +0000
    Re: String splitting with exceptions Peter Otten <__peter__@web.de> - 2013-08-28 20:31 +0200
  Re: String splitting with exceptions wxjmfauth@gmail.com - 2013-08-29 00:26 -0700

csiph-web