Path: csiph.com!usenet.pasdenom.info!gegeweb.org!usenet-fr.net!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'example:': 0.03; 'anyway.': 0.05; 'url:pipermail': 0.05; 'detect': 0.07; 'matches': 0.07; 'nested': 0.07; 'skip:` 10': 0.07; 'tool,': 0.07; 'string': 0.09; 'assuming': 0.09; 'span': 0.09; 'stating': 0.09; 'cc:addr :python-list': 0.11; 'lambda': 0.16; 'skip:} 10': 0.16; 'string:': 0.16; 'url:html)': 0.16; 'weird.': 0.16; 'wrote:': 0.18; 'wed,': 0.18; "skip:' 30": 0.19; '>>>': 0.22; 'example': 0.22; 'import': 0.22; 'cc:addr:python.org': 0.22; 'print': 0.22; 'form:': 0.24; "haven't": 0.24; 'question': 0.24; 'cc:2**0': 0.24; 'cc:no real name:2**0': 0.24; 'first,': 0.26; 'this:': 0.26; 'somewhere': 0.26; 'header:In-Reply-To:1': 0.27; 'function': 0.29; 'skip:p 30': 0.29; 'message-id:@mail.gmail.com': 0.30; "i'm": 0.30; 'url:mailman': 0.30; 'tuples': 0.31; 'interface': 0.32; 'skip:- 30': 0.32; 'another': 0.32; 'url:python': 0.33; 'implemented': 0.33; 'skip:# 10': 0.33; 'but': 0.35; 'received:google.com': 0.35; 'there': 0.35; '+0200,': 0.36; 'url:listinfo': 0.36; 'doing': 0.36; 'url:org': 0.36; 'should': 0.36; 'list': 0.37; 'clear': 0.37; 'url:mail': 0.40; 'how': 0.40; 'even': 0.60; 'first': 0.61; 'making': 0.63; 'email addr:gmail.com': 0.63; 'to:addr:gmail.com': 0.65; 'realized': 0.68; 'subject:Get': 0.68; '2015': 0.84; 'pardon': 0.84; 'subject:groups': 0.84; 'obvious,': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=SItNrqbMVOlhhQtHS9NL/OBWOZhXPHLtchOaFyaiB9A=; b=i7TpuVBfrZfEIW8RkBrRk0Ti1YR/G3atAPnmo1c1NHz2SzQ4RqOpAdAgzBPX6plGVv +gCQJ7haXbtajIjMNug9MYIUhX9aRXzbFp/5QTfGYYcbX7YaS2HptIBtce2iYMMtvFbV y1dVYDSH4lbUryDCEiFsoNIddVceegI9pdXYpkrDV0TBzDSvfwrL3YTA4OKjRHXUGkZ9 ueBm93g2EGgmazvt2YOgegCZChl+CfNqtFo7lP5Ke4R4K4M32Drz/aqIgf2t9GZyjfLW ucupATQWRnPc0YBD3TLH3jcZ0/uE9aE8TMED682iA69/eDZEvJ9MilCBzZNybPs8DWDG panA== MIME-Version: 1.0 X-Received: by 10.42.146.71 with SMTP id i7mr29506592icv.89.1428530339900; Wed, 08 Apr 2015 14:58:59 -0700 (PDT) In-Reply-To: References: Date: Wed, 8 Apr 2015 23:58:59 +0200 Subject: Re: Get nesting of regex groups From: Mattias Ugelvik To: Denis McMahon Cc: python-list@python.org Content-Type: text/plain; charset=UTF-8 X-Mailman-Approved-At: Thu, 09 Apr 2015 09:26:59 +0200 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 65 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1428564420 news.xs4all.nl 2854 [2001:888:2000:d::a6]:37261 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:88695 I'm making a 'declarative string manipulation' tool, the interface of which should work like this: >>> rules(r'(?P(?Pa?))(?Pb?)', { ... 'separate': '.suffix', ... 'inner': 'abc', ... 'outer': lambda string: 'some-{}-manipulation'.format(string) ... }).apply('a') 'some-abc-manipulation.suffix' Since the 'inner' group is nested, it should be replaced first, then the replacement function for 'outer' will continue the replacements. When 'inner' matches the empty string and its span is identical to 'outer', then I need to know whether it is nested, or if it's outside like 'separate'. > Pardon me for stating the obvious, No problem, I can see why my question is weird. I actually implemented the interface above before I realized that these ambiguities even existed. On 08/04/2015, Denis McMahon wrote: > On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote: > >> Example: re.compile('(?P(?Pa))') >> >> How can I detect that 'inner' is a nested group of 'outer'? I know that >> 'inner' comes later, because I can use the `regex.groupindex` (thanks to >> your help earlier: >> https://mail.python.org/pipermail/python-list/2015-April/701594.html). > > Pardon me for stating the obvious, but as the person defining the re, and > assuming you haven't generated another sub-pattern somewhere in the same > re with the same name, how can inner ever not be a nested group of outer? > > Even in the contrived example below, it is clear that the list of tuples > generated by by findall is of the form: > > ()[0] = 'outer', ()[1] = 'inner' > > from the order of matches principle. > > -------------------------------- > > #!/usr/bin/python > > import re > > patt = re.compile('(?Pa+(?Pb+))') > > result = patt.findall('abaabbaaabbbaaaabbbb') > > print result > > -------------------------------- > > however if all you are doing is using .search or .find for the first > match of the pattern, then there should be no scope for confusion anyway. > > -- > Denis McMahon, denismfmcmahon@gmail.com > -- > https://mail.python.org/mailman/listinfo/python-list >