Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.001 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'example:': 0.03; 'anyway.': 0.05; 'url:pipermail': 0.05; 'detect': 0.07; 'matches': 0.07; 'nested': 0.07; 'skip:` 10': 0.07; 'tool,': 0.07; 'string': 0.09; 'assuming': 0.09; 'span': 0.09; 'stating': 0.09; '(sorry': 0.16; 'lambda': 0.16; 'skip:} 10': 0.16; 'spamming': 0.16; 'string:': 0.16; 'url:html)': 0.16; 'weird.': 0.16; 'wrote:': 0.18; 'wed,': 0.18; "skip:' 30": 0.19; '>>>': 0.22; 'example': 0.22; 'import': 0.22; 'print': 0.22; 'form:': 0.24; "haven't": 0.24; 'question': 0.24; 'first,': 0.26; 'this:': 0.26; 'somewhere': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'skip:p 30': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'url:mailman': 0.30; 'tuples': 0.31; 'interface': 0.32; 'skip:- 30': 0.32; 'another': 0.32; 'url:python': 0.33; 'implemented': 0.33; 'skip:# 10': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; '+0200,': 0.36; 'url:listinfo': 0.36; 'doing': 0.36; "didn't": 0.36; 'url:org': 0.36; 'should': 0.36; 'list': 0.37; 'clear': 0.37; 'to:addr :python-list': 0.38; 'list,': 0.38; 'to:addr:python.org': 0.39; 'mailing': 0.39; 'url:mail': 0.40; 'how': 0.40; 'even': 0.60; 'first': 0.61; 'making': 0.63; 'email addr:gmail.com': 0.63; 'show': 0.63; 'reply': 0.66; 'realized': 0.68; 'subject:Get': 0.68; '2015': 0.84; 'pardon': 0.84; 'subject:groups': 0.84; 'obvious,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=7QW3ZkoBo1+YP0hKmKu0FJu5Wv7cAlTXkBeijAN97TI=; b=fLlwPnV8REkxn7P8clsEaZny8LUT2Po30uCfqU6vnnxiB12Up6OI7c8pXGCwFiC0ql qz1bV/XKv0armWUu3kqz/augHpvCGPY5cxvN1znU/iD1CtrtCbXIJvgRMyP3ZF+R/l3Z oW13y+1yiSN7+js0u8mN+Kcp4o0I04NRxg2WXhT9z3a4fdev8/DhzfhieWUrf66mSn8E 5sl3Cg3blfaXyOIBCrT0CY7C7nDFIRaKJ2mKfe1Qjt4iXSwnOFwxNvdiZLMv+IgQgY69 NmZUSXq6p3SJ450rbzVTXER+fg3rwXRJVYtLb/X20Hilb27Rs0jSr1rtBckbqOIpwJGI ZlUg== MIME-Version: 1.0 X-Received: by 10.107.137.218 with SMTP id t87mr43256842ioi.3.1428531342074; Wed, 08 Apr 2015 15:15:42 -0700 (PDT) In-Reply-To: References: Date: Thu, 9 Apr 2015 00:15:42 +0200 Subject: Re: Get nesting of regex groups From: Mattias Ugelvik To: python-list@python.org Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Thu, 09 Apr 2015 09:26:59 +0200 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 68 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1428564421 news.xs4all.nl 2855 [2001:888:2000:d::a6]:37281 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:88696 (sorry if I'm spamming the mailing list, my reply didn't seem to show up in the archive) I'm making a 'declarative string manipulation' tool, the interface of which should work like this: >>> rules(r'(?P(?Pa?))(?Pb?)', { ... 'separate': '.suffix', ... 'inner': 'abc', ... 'outer': lambda string: 'some-{}-manipulation'.format(string) ... }).apply('a') 'some-abc-manipulation.suffix' Since the 'inner' group is nested, it should be replaced first, then the replacement function for 'outer' will continue the replacements. When 'inner' matches the empty string and its span is identical to 'outer', then I need to know whether it is nested, or if it's outside like 'separate'. > Pardon me for stating the obvious, No problem, I can see why my question is weird. I actually implemented the interface above before I realized that these ambiguities even existed. On 08/04/2015, Denis McMahon wrote: > On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote: > >> Example: re.compile('(?P(?Pa))') >> >> How can I detect that 'inner' is a nested group of 'outer'? I know that >> 'inner' comes later, because I can use the `regex.groupindex` (thanks to >> your help earlier: >> https://mail.python.org/pipermail/python-list/2015-April/701594.html). > > Pardon me for stating the obvious, but as the person defining the re, and > assuming you haven't generated another sub-pattern somewhere in the same > re with the same name, how can inner ever not be a nested group of outer? > > Even in the contrived example below, it is clear that the list of tuples > generated by by findall is of the form: > > ()[0] = 'outer', ()[1] = 'inner' > > from the order of matches principle. > > -------------------------------- > > #!/usr/bin/python > > import re > > patt = re.compile('(?Pa+(?Pb+))') > > result = patt.findall('abaabbaaabbbaaaabbbb') > > print result > > -------------------------------- > > however if all you are doing is using .search or .find for the first > match of the pattern, then there should be no scope for confusion anyway. > > -- > Denis McMahon, denismfmcmahon@gmail.com > -- > https://mail.python.org/mailman/listinfo/python-list >