Path: csiph.com!newsfeed.hal-mli.net!feeder3.hal-mli.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!newsfeed.xs4all.nl!newsfeed4.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'python.': 0.02; 'syntax': 0.04; "'',": 0.07; 'string': 0.09; 'bits': 0.09; 'imply': 0.09; 'messing': 0.09; 'provisioning': 0.09; 'received:internal': 0.09; 'terminated': 0.09; 'python': 0.11; "(it's": 0.16; '(there': 0.16; 'adjacent': 0.16; 'brackets': 0.16; 'brackets.': 0.16; 'colons': 0.16; 'message-id:@webmail.messagingengine.com': 0.16; 'otherwise:': 0.16; 'received:10.202': 0.16; 'received:10.202.2': 0.16; 'received:66.111': 0.16; 'received:66.111.4': 0.16; 'received:messagingengine.com': 0.16; 'reliably': 0.16; 'rewriting': 0.16; 'subject:String': 0.16; 'worst': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'skip:f 30': 0.19; 'split': 0.19; 'meant': 0.20; '>>>': 0.22; 'aug': 0.22; 'string,': 0.24; "haven't": 0.24; 'question': 0.24; "i've": 0.25; 'this:': 0.26; '(for': 0.26; 'header:In-Reply-To:1': 0.27; 'record': 0.27; 'to:2**1': 0.27; "i'm": 0.30; 'obscure': 0.31; "skip:' 40": 0.31; 'front': 0.32; 'stuff': 0.32; 'addresses': 0.33; 'could': 0.34; 'received:66': 0.35; 'subject:with': 0.35; "can't": 0.35; 'except': 0.35; 'case,': 0.35; 'but': 0.35; 'are,': 0.36; 'done': 0.36; 'so,': 0.37; 'two': 0.37; 'list': 0.37; 'received:10': 0.37; 'improving': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'realize': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'either': 0.39; 'remove': 0.60; 'hope': 0.61; 'from:no real name:2**0': 0.61; 'john': 0.61; 'address': 0.63; 'header:Message- Id:1': 0.63; 'kind': 0.63; 'field': 0.63; 'pick': 0.64; 'within': 0.65; 'between': 0.67; 'close': 0.67; 'skip:r 40': 0.68; 'skip:r 30': 0.69; 'square': 0.74; 'end.': 0.84; '2013,': 0.91 DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d=fastmail.us; h= message-id:from:to:mime-version:content-transfer-encoding :content-type:subject:date:in-reply-to:references; s=mesmtp; bh= EF+M5kfEW7v0EMt+37AzK1/0HRo=; b=AvX/1uET8S4dumvRGe2cn7wq+PEtPCRR nx/PYsAxYljMIhn+iCkDhM2z8zwxL/7VTlBdJlonu2fZE75CSj7m8lYLWeSEAxqR BUcXglC+4yKMNnE3u6rzO9Q0Zoo64UmxRVL7jNnNcAfvBBmsBD88jShzmzbgy7BV 9uclV7vtTGI= DKIM-Signature: v=1; a=rsa-sha1; c=relaxed/relaxed; d= messagingengine.com; h=message-id:from:to:mime-version :content-transfer-encoding:content-type:subject:date:in-reply-to :references; s=smtpout; bh=EF+M5kfEW7v0EMt+37AzK1/0HRo=; b=OorS8 kfN4ewK4sfT5uEWv82xN1TMAkUjy/rQ0lmpm+MX0X5sKnHzTORlC5hQf8HmgGISK Tfw2lZ5G+H0QXbvbgMDocyh+m/ciPnbWYQD8tKBy+F4ukYsA3OwFNhSEfgAWrUjB xApC5gKbEHNLP03BcP/mfjLrxXyuNrty+MSFRk= X-Sasl-Enc: ZW8X0glVBLJXVKYxwfObo4IGXDDMELCZ/jFYsVVkbanv 1377710043 From: random832@fastmail.us To: John Levine , python-list@python.org MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain X-Mailer: MessagingEngine.com Webmail Interface - ajax-be0d4992 Subject: Re: String splitting with exceptions Date: Wed, 28 Aug 2013 13:14:03 -0400 In-Reply-To: References: X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 42 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1377710047 news.xs4all.nl 15965 [2001:888:2000:d::a6]:50545 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:53171 On Wed, Aug 28, 2013, at 12:44, John Levine wrote: > I have a crufty old DNS provisioning system that I'm rewriting and I > hope improving in python. (It's based on tinydns if you know what > that is.) > > The record formats are, in the worst case, like this: > > foo.[DOM]::[IP6::4361:6368:6574]:600:: > > What I would like to do is to split this string into a list like this: > > [ 'foo.[DOM]','','[IP6::4361:6368:6574]','600','' ] > > Colons are separators except when they're inside square brackets. I > have been messing around with re.split() and re.findall() and haven't > been able to come up with either a working separator pattern for > split() or a working field pattern for findall(). I came pretty > close with findall() but can't get it to reliably match the > nothing between two adjacent colons not inside brackets. > > Any suggestions? I realize I could do it in a loop where I pick stuff > off the front of the string, but yuck. > > This is in python 2.7.5. Can you have brackets within brackets? If so, this is impossible to deal with within a regex. Otherwise: >>> re.findall('((?:[^[:]|\[[^]]*\])*):?',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', '', ''] I'm not sure why _your_ list only has one empty string at the end. Is the record always terminated by a colon that is not meant to imply an empty field after it? If so, remove the question mark: >>> re.findall('((?:[^[:]|\[[^]]*\])*):',s) ['foo.[DOM]', '', '[IP6::4361:6368:6574]', '600', ''] I've done this kind of thing (for validation, not capturing) for email addresses (there are some obscure bits of email address syntax that need it) before, so it came to mind immediately.