Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #53176
| Path | csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!goblin3!goblin2!goblin.stu.neva.ru!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <python-python-list@m.gmane.org> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'elif': 0.05; "'',": 0.07; 'context': 0.07; 'nested': 0.07; 'string': 0.09; 'character,': 0.09; 'messing': 0.09; 'provisioning': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'def': 0.12; "(it's": 0.16; 'adjacent': 0.16; 'brackets': 0.16; 'brackets.': 0.16; 'cleanly': 0.16; 'colons': 0.16; 'csv': 0.16; 'hard-coded': 0.16; 'i.e.,': 0.16; 'loop.': 0.16; 'parser.': 0.16; 'received:80.91.229.3': 0.16; 'received:dip0.t-ipconnect.de': 0.16; 'received:plane.gmane.org': 0.16; 'received:t-ipconnect.de': 0.16; 'reliably': 0.16; 'rewriting': 0.16; 'subject:String': 0.16; 'suggested,': 0.16; 'worst': 0.16; 'index': 0.16; 'wrote:': 0.18; 'module': 0.19; 'skip:f 30': 0.19; 'split': 0.19; 'work,': 0.20; '>>>': 0.22; 'header:User-Agent:1': 0.23; 'error': 0.23; 'parse': 0.24; 'skip': 0.24; 'string,': 0.24; "haven't": 0.24; 'handling': 0.26; 'skip:" 40': 0.26; 'this:': 0.26; 'header:X-Complaints- To:1': 0.27; 'record': 0.27; 'character': 0.29; "i'm": 0.30; 'easier': 0.31; '>>>>': 0.31; 'assert': 0.31; 'go.': 0.31; "skip:' 40": 0.31; 'front': 0.32; 'stuff': 0.32; 'beginning': 0.33; 'could': 0.34; 'subject:with': 0.35; "can't": 0.35; 'except': 0.35; 'something': 0.35; 'case,': 0.35; 'but': 0.35; 'there': 0.35; 'really': 0.36; 'are,': 0.36; 'false': 0.36; 'yield': 0.36; 'similar': 0.36; 'two': 0.37; 'list': 0.37; 'level': 0.37; 'improving': 0.38; 'initially': 0.38; 'skip:[ 10': 0.38; 'to:addr :python-list': 0.38; 'little': 0.38; 'quote': 0.39; 'realize': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'tell': 0.60; 'hope': 0.61; 'john': 0.61; 'simple': 0.61; 'field': 0.63; 'pick': 0.64; 'more': 0.64; 'between': 0.67; 'close': 0.67; 'believe': 0.68; 'skip:r 30': 0.69; 'square': 0.74; 'dialect': 0.84; 'parser,': 0.84; 'single,': 0.84; 'start.': 0.84 |
| X-Injected-Via-Gmane | http://gmane.org/ |
| To | python-list@python.org |
| From | Peter Otten <__peter__@web.de> |
| Subject | Re: String splitting with exceptions |
| Date | Wed, 28 Aug 2013 20:31:17 +0200 |
| Organization | None |
| References | <kvl9e5$19gk$1@leila.iecc.com> <b86skbFf4flU1@mid.individual.net> |
| Mime-Version | 1.0 |
| Content-Type | text/plain; charset="ISO-8859-1" |
| Content-Transfer-Encoding | 7Bit |
| X-Gmane-NNTP-Posting-Host | p5084b754.dip0.t-ipconnect.de |
| User-Agent | KNode/4.7.3 |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.321.1377714653.19984.python-list@python.org> (permalink) |
| Lines | 98 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1377714653 news.xs4all.nl 15885 [2001:888:2000:d::a6]:41596 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:53176 |
Show key headers only | View raw
Neil Cerutti wrote:
> On 2013-08-28, John Levine <johnl@iecc.com> wrote:
>> I have a crufty old DNS provisioning system that I'm rewriting and I
>> hope improving in python. (It's based on tinydns if you know what
>> that is.)
>>
>> The record formats are, in the worst case, like this:
>>
>> foo.[DOM]::[IP6::4361:6368:6574]:600::
>>
>> What I would like to do is to split this string into a list like this:
>>
>> [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ]
>>
>> Colons are separators except when they're inside square
>> brackets. I have been messing around with re.split() and
>> re.findall() and haven't been able to come up with either a
>> working separator pattern for split() or a working field
>> pattern for findall(). I came pretty close with findall() but
>> can't get it to reliably match the nothing between two adjacent
>> colons not inside brackets.
>>
>> Any suggestions? I realize I could do it in a loop where I pick
>> stuff off the front of the string, but yuck.
>
> A little parser, as Skip suggested, is a good way to go.
>
> The brackets make your string context-sensitive, a difficult
> concept to cleanly parse with a regex.
>
> I initially hoped a csv module dialect could work, but the quote
> character is (currently) hard-coded to be a single, simple
> character, i.e., I can't tell it to treat [xxx] as "xxx".
>
> What about Skip's suggestion? A little parser. It might seem
> crass or something, but it really is easier than musceling a
> regex into a context sensitive grammer.
>
> def dns_split(s):
> in_brackets = False
> b = 0 # index of beginning of current string
> for i, c in enumerate(s):
> if not in_brackets:
> if c == "[":
> in_brackets = True
> elif c == ':':
> yield s[b:i]
> b = i+1
> elif c == "]":
> in_brackets = False
I think you need one more yield outside the loop.
>>>> print(list(dns_split(s)))
> ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '']
>
> It'll gag on nested brackets (fixable with a counter) and has no
> error handling (requires thought), but it's a start.
Something similar on top of regex:
>>> def split(s):
... start = level = 0
... for m in re.compile(r"[[:\]]").finditer(s):
... if m.group() == "[": level += 1
... elif m.group() == "]":
... assert level
... level -= 1
... elif level == 0:
... yield s[start:m.start()]
... start = m.end()
... yield s[start:]
...
>>> list(split("a[b:c:]:d"))
['a[b:c:]', 'd']
>>> list(split("a[b:c[:]]:d"))
['a[b:c[:]]', 'd']
>>> list(split(""))
['']
>>> list(split(":"))
['', '']
>>> list(split(":x"))
['', 'x']
>>> list(split("[:x]"))
['[:x]']
>>> list(split(":[:x]"))
['', '[:x]']
>>> list(split(":[:[:]:x]"))
['', '[:[:]:x]']
>>> list(split("[:::]"))
['[:::]']
>>> s = "foo.[DOM]::[IP6::4361:6368:6574]:600::"
>>> list(split(s))
['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', '']
Note that there is one more empty string which I believe the OP forgot.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 16:44 +0000
Re: String splitting with exceptions Skip Montanaro <skip@pobox.com> - 2013-08-28 11:55 -0500
Re: String splitting with exceptions random832@fastmail.us - 2013-08-28 13:14 -0400
Re: String splitting with exceptions John Levine <johnl@iecc.com> - 2013-08-28 21:35 +0000
Re: String splitting with exceptions Tim Chase <python.list@tim.thechases.com> - 2013-08-28 12:32 -0500
Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:18 +0000
Re: String splitting with exceptions Neil Cerutti <neilc@norwich.edu> - 2013-08-28 18:08 +0000
Re: String splitting with exceptions Peter Otten <__peter__@web.de> - 2013-08-28 20:31 +0200
Re: String splitting with exceptions wxjmfauth@gmail.com - 2013-08-29 00:26 -0700
csiph-web