Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #53176

Re: String splitting with exceptions

Path csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'elif': 0.05; "'',": 0.07; 'context': 0.07; 'nested': 0.07; 'string': 0.09; 'character,': 0.09; 'messing': 0.09; 'provisioning': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.12; "(it's": 0.16; 'adjacent': 0.16; 'brackets': 0.16; 'brackets.': 0.16; 'cleanly': 0.16; 'colons': 0.16; 'csv': 0.16; 'hard-coded': 0.16; 'i.e.,': 0.16; 'loop.': 0.16; 'parser.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'reliably': 0.16; 'rewriting': 0.16; 'subject:String': 0.16; 'suggested,': 0.16; 'worst': 0.16; 'index': 0.16; 'wrote:': 0.18; 'module': 0.19; 'skip:f 30': 0.19; 'split': 0.19; 'work,': 0.20; '>>>': 0.22; 'header:User-Agent:1': 0.23; 'error': 0.23; 'parse': 0.24; 'skip': 0.24; 'string,': 0.24; "haven't": 0.24; 'handling': 0.26; 'skip:" 40': 0.26; 'this:': 0.26; 'header:X-Complaints- To:1': 0.27; 'record': 0.27; 'character': 0.29; "i'm": 0.30; 'easier': 0.31; '>>>>': 0.31; 'assert': 0.31; 'go.': 0.31; "skip:' 40": 0.31; 'front': 0.32; 'stuff': 0.32; 'beginning': 0.33; 'could': 0.34; 'subject:with': 0.35; "can't": 0.35; 'except': 0.35; 'something': 0.35; 'case,': 0.35; 'but': 0.35; 'there': 0.35; 'really': 0.36; 'are,': 0.36; 'false': 0.36; 'yield': 0.36; 'similar': 0.36; 'two': 0.37; 'list': 0.37; 'level': 0.37; 'improving': 0.38; 'initially': 0.38; 'skip:[ 10': 0.38; 'to:addr :python-list': 0.38; 'little': 0.38; 'quote': 0.39; 'realize': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'tell': 0.60; 'hope': 0.61; 'john': 0.61; 'simple': 0.61; 'field': 0.63; 'pick': 0.64; 'more': 0.64; 'between': 0.67; 'close': 0.67; 'believe': 0.68; 'skip:r 30': 0.69; 'square': 0.74; 'dialect': 0.84; 'parser,': 0.84; 'single,': 0.84; 'start.': 0.84
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Peter Otten <__peter__@web.de>
Subject Re: String splitting with exceptions
Date Wed, 28 Aug 2013 20:31:17 +0200
Organization None
References <kvl9e5$19gk$1@leila.iecc.com> <b86skbFf4flU1@mid.individual.net>
Mime-Version 1.0
Content-Type text/plain; charset="ISO-8859-1"
Content-Transfer-Encoding 7Bit
X-Gmane-NNTP-Posting-Host p5084b754.dip0.t-ipconnect.de
User-Agent KNode/4.7.3
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.321.1377714653.19984.python-list@python.org> (permalink)
Lines 98
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1377714653 news.xs4all.nl 15885 [2001:888:2000:d::a6]:41596
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:53176

Show key headers only | View raw


Neil Cerutti wrote:

> On 2013-08-28, John Levine <johnl@iecc.com> wrote:
>> I have a crufty old DNS provisioning system that I'm rewriting and I
>> hope improving in python.  (It's based on tinydns if you know what
>> that is.)
>>
>> The record formats are, in the worst case, like this:
>>
>> foo.[DOM]::[IP6::4361:6368:6574]:600::
>>
>> What I would like to do is to split this string into a list like this:
>>
>> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>>
>> Colons are separators except when they're inside square
>> brackets.  I have been messing around with re.split() and
>> re.findall() and haven't been able to come up with either a
>> working separator pattern for split() or a working field
>> pattern for findall().  I came pretty close with findall() but
>> can't get it to reliably match the nothing between two adjacent
>> colons not inside brackets.
>>
>> Any suggestions? I realize I could do it in a loop where I pick
>> stuff off the front of the string, but yuck.
> 
> A little parser, as Skip suggested, is a good way to go.
> 
> The brackets make your string context-sensitive, a difficult
> concept to cleanly parse with a regex.
> 
> I initially hoped a csv module dialect could work, but the quote
> character is (currently) hard-coded to be a single, simple
> character, i.e., I can't tell it to treat [xxx] as "xxx".
> 
> What about Skip's suggestion? A little parser. It might seem
> crass or something, but it really is easier than musceling a
> regex into a context sensitive grammer.
> 
> def dns_split(s):
>     in_brackets = False
>     b = 0 # index of beginning of current string
>     for i, c in enumerate(s):
>         if not in_brackets:
>             if c == "[":
>                 in_brackets = True
>             elif c == ':':
>                 yield s[b:i]
>                 b = i+1
>         elif c == "]":
>             in_brackets = False

I think you need one more yield outside the loop.

>>>> print(list(dns_split(s)))
> ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
> 
> It'll gag on nested brackets (fixable with a counter) and has no
> error handling (requires thought), but it's a start.
 
Something similar on top of regex:

>>> def split(s):
...     start = level = 0
...     for m in re.compile(r"[[:\]]").finditer(s):
...             if m.group() == "[": level += 1
...             elif m.group() == "]":
...                     assert level
...                     level -= 1
...             elif level == 0:
...                     yield s[start:m.start()]
...                     start = m.end()
...     yield s[start:]
... 
>>> list(split("a[b:c:]:d"))
['a[b:c:]', 'd']
>>> list(split("a[b:c[:]]:d"))
['a[b:c[:]]', 'd']
>>> list(split(""))
['']
>>> list(split(":"))
['', '']
>>> list(split(":x"))
['', 'x']
>>> list(split("[:x]"))
['[:x]']
>>> list(split(":[:x]"))
['', '[:x]']
>>> list(split(":[:[:]:x]"))
['', '[:[:]:x]']
>>> list(split("[:::]"))
['[:::]']
>>> s = "foo.[DOM]::[IP6::4361:6368:6574]:600::"
>>> list(split(s))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']

Note that there is one more empty string which I believe the OP forgot.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 16:44 +0000
  Re: String splitting with exceptions Skip Montanaro <skip@pobox.com> - 2013-08-28 11:55 -0500
  Re: String splitting with exceptions random832@fastmail.us - 2013-08-28 13:14 -0400
    Re: String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 21:35 +0000
  Re: String splitting with exceptions Tim Chase <python.list@tim.thechases.com> - 2013-08-28 12:32 -0500
    Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:18 +0000
  Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:08 +0000
    Re: String splitting with exceptions Peter Otten <__peter__@web.de> - 2013-08-28 20:31 +0200
  Re: String splitting with exceptions wxjmfauth@gmail.com - 2013-08-29 00:26 -0700

csiph-web