Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #88685 > unrolled thread

Get nesting of regex groups

Started byMattias Ugelvik <uglemat@gmail.com>
First post2015-04-08 22:54 +0200
Last post2015-04-09 00:15 +0200
Articles 5 — 3 participants

Back to article view | Back to comp.lang.python


Contents

  Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-08 22:54 +0200
    Re: Get nesting of regex groups Denis McMahon <denismfmcmahon@gmail.com> - 2015-04-08 21:30 +0000
      Re: Get nesting of regex groups Cameron Simpson <cs@zip.com.au> - 2015-04-09 08:00 +1000
      Re: Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-08 23:58 +0200
      Re: Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-09 00:15 +0200

#88685 — Get nesting of regex groups

FromMattias Ugelvik <uglemat@gmail.com>
Date2015-04-08 22:54 +0200
SubjectGet nesting of regex groups
Message-ID<mailman.151.1428526500.12925.python-list@python.org>
Example: re.compile('(?P<outer>(?P<inner>a))')

How can I detect that 'inner' is a nested group of 'outer'? I know
that 'inner' comes later, because I can use the `regex.groupindex`
(thanks to your help earlier:
https://mail.python.org/pipermail/python-list/2015-April/701594.html).

After looking a bit around, I found this:

>>> sre_parse.parse('(?P<outer>(?P<inner>a))')
[('subpattern', (1, [('subpattern', (2, [('literal', 97)]))]))]

This is all I need, but this is an internal module. Though there
doesn't seem to have been changes from py2 to py3. How inadvisable is
it to use this? Would you blame me?

[toc] | [next] | [standalone]


#88687

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2015-04-08 21:30 +0000
Message-ID<mg46l7$2sr$5@dont-email.me>
In reply to#88685
On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:

> Example: re.compile('(?P<outer>(?P<inner>a))')
> 
> How can I detect that 'inner' is a nested group of 'outer'? I know that
> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
> your help earlier:
> https://mail.python.org/pipermail/python-list/2015-April/701594.html).

Pardon me for stating the obvious, but as the person defining the re, and 
assuming you haven't generated another sub-pattern somewhere in the same 
re with the same name, how can inner ever not be a nested group of outer?

Even in the contrived example below, it is clear that the list of tuples 
generated by by findall is of the form:

()[0] = 'outer', ()[1] = 'inner'

from the order of matches principle.

--------------------------------

#!/usr/bin/python

import re

patt = re.compile('(?P<outer>a+(?P<inner>b+))')

result = patt.findall('abaabbaaabbbaaaabbbb')

print result

--------------------------------

however if all you are doing is using .search or .find for the first 
match of the pattern, then there should be no scope for confusion anyway.

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#88688

FromCameron Simpson <cs@zip.com.au>
Date2015-04-09 08:00 +1000
Message-ID<mailman.152.1428530445.12925.python-list@python.org>
In reply to#88687
On 08Apr2015 21:30, Denis McMahon <denismfmcmahon@gmail.com> wrote:
>On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
>Pardon me for stating the obvious, but as the person defining the re, and
>assuming you haven't generated another sub-pattern somewhere in the same
>re with the same name, how can inner ever not be a nested group of outer?

If he has to ask this question, one might presume that he is not the source of 
the regexp. Ergo, he may not know the regexp structure ahead of time for 
whatever reason. I could invent scenarios for that, but perhaps Mattias can 
describe his situation some more.

Cheers,
Cameron Simpson <cs@zip.com.au>

In article <CF3rw3.n0x@eskimo.com> pirih@eskimo.com (Chris Pirih) writes:
| Wotsa zerk?

It's a portable hemispherical perforated spooge flange.
        - Chuck Rogers, car377@torreys.att.com

[toc] | [prev] | [next] | [standalone]


#88695

FromMattias Ugelvik <uglemat@gmail.com>
Date2015-04-08 23:58 +0200
Message-ID<mailman.156.1428564420.12925.python-list@python.org>
In reply to#88687
I'm making a 'declarative string manipulation' tool, the interface of
which should work like this:

>>> rules(r'(?P<outer>(?P<inner>a?))(?P<separate>b?)', {
...   'separate': '.suffix',
...   'inner': 'abc',
...   'outer': lambda string: 'some-{}-manipulation'.format(string)
... }).apply('a')
'some-abc-manipulation.suffix'

Since the 'inner' group is nested, it should be replaced first, then
the replacement function for 'outer' will continue the replacements.
When 'inner' matches the empty string and its span is identical to
'outer', then I need to know whether it is nested, or if it's outside
like 'separate'.

> Pardon me for stating the obvious,

No problem, I can see why my question is weird. I actually implemented
the interface above before I realized that these ambiguities even
existed.

On 08/04/2015, Denis McMahon <denismfmcmahon@gmail.com> wrote:
> On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
> Pardon me for stating the obvious, but as the person defining the re, and
> assuming you haven't generated another sub-pattern somewhere in the same
> re with the same name, how can inner ever not be a nested group of outer?
>
> Even in the contrived example below, it is clear that the list of tuples
> generated by by findall is of the form:
>
> ()[0] = 'outer', ()[1] = 'inner'
>
> from the order of matches principle.
>
> --------------------------------
>
> #!/usr/bin/python
>
> import re
>
> patt = re.compile('(?P<outer>a+(?P<inner>b+))')
>
> result = patt.findall('abaabbaaabbbaaaabbbb')
>
> print result
>
> --------------------------------
>
> however if all you are doing is using .search or .find for the first
> match of the pattern, then there should be no scope for confusion anyway.
>
> --
> Denis McMahon, denismfmcmahon@gmail.com
> --
> https://mail.python.org/mailman/listinfo/python-list
>

[toc] | [prev] | [next] | [standalone]


#88696

FromMattias Ugelvik <uglemat@gmail.com>
Date2015-04-09 00:15 +0200
Message-ID<mailman.157.1428564421.12925.python-list@python.org>
In reply to#88687
(sorry if I'm spamming the mailing list, my reply didn't seem to show
up in the archive)

I'm making a 'declarative string manipulation' tool, the interface of
which should work like this:

>>> rules(r'(?P<outer>(?P<inner>a?))(?P<separate>b?)', {
...   'separate': '.suffix',
...   'inner': 'abc',
...   'outer': lambda string: 'some-{}-manipulation'.format(string)
... }).apply('a')
'some-abc-manipulation.suffix'

Since the 'inner' group is nested, it should be replaced first, then
the replacement function for 'outer' will continue the replacements.
When 'inner' matches the empty string and its span is identical to
'outer', then I need to know whether it is nested, or if it's outside
like 'separate'.

> Pardon me for stating the obvious,

No problem, I can see why my question is weird. I actually implemented
the interface above before I realized that these ambiguities even
existed.

On 08/04/2015, Denis McMahon <denismfmcmahon@gmail.com> wrote:
> On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
> Pardon me for stating the obvious, but as the person defining the re, and
> assuming you haven't generated another sub-pattern somewhere in the same
> re with the same name, how can inner ever not be a nested group of outer?
>
> Even in the contrived example below, it is clear that the list of tuples
> generated by by findall is of the form:
>
> ()[0] = 'outer', ()[1] = 'inner'
>
> from the order of matches principle.
>
> --------------------------------
>
> #!/usr/bin/python
>
> import re
>
> patt = re.compile('(?P<outer>a+(?P<inner>b+))')
>
> result = patt.findall('abaabbaaabbbaaaabbbb')
>
> print result
>
> --------------------------------
>
> however if all you are doing is using .search or .find for the first
> match of the pattern, then there should be no scope for confusion anyway.
>
> --
> Denis McMahon, denismfmcmahon@gmail.com
> --
> https://mail.python.org/mailman/listinfo/python-list
>

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web