Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #88685 > unrolled thread
| Started by | Mattias Ugelvik <uglemat@gmail.com> |
|---|---|
| First post | 2015-04-08 22:54 +0200 |
| Last post | 2015-04-09 00:15 +0200 |
| Articles | 5 — 3 participants |
Back to article view | Back to comp.lang.python
Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-08 22:54 +0200
Re: Get nesting of regex groups Denis McMahon <denismfmcmahon@gmail.com> - 2015-04-08 21:30 +0000
Re: Get nesting of regex groups Cameron Simpson <cs@zip.com.au> - 2015-04-09 08:00 +1000
Re: Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-08 23:58 +0200
Re: Get nesting of regex groups Mattias Ugelvik <uglemat@gmail.com> - 2015-04-09 00:15 +0200
| From | Mattias Ugelvik <uglemat@gmail.com> |
|---|---|
| Date | 2015-04-08 22:54 +0200 |
| Subject | Get nesting of regex groups |
| Message-ID | <mailman.151.1428526500.12925.python-list@python.org> |
Example: re.compile('(?P<outer>(?P<inner>a))')
How can I detect that 'inner' is a nested group of 'outer'? I know
that 'inner' comes later, because I can use the `regex.groupindex`
(thanks to your help earlier:
https://mail.python.org/pipermail/python-list/2015-April/701594.html).
After looking a bit around, I found this:
>>> sre_parse.parse('(?P<outer>(?P<inner>a))')
[('subpattern', (1, [('subpattern', (2, [('literal', 97)]))]))]
This is all I need, but this is an internal module. Though there
doesn't seem to have been changes from py2 to py3. How inadvisable is
it to use this? Would you blame me?
[toc] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2015-04-08 21:30 +0000 |
| Message-ID | <mg46l7$2sr$5@dont-email.me> |
| In reply to | #88685 |
On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
> Example: re.compile('(?P<outer>(?P<inner>a))')
>
> How can I detect that 'inner' is a nested group of 'outer'? I know that
> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
> your help earlier:
> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
Pardon me for stating the obvious, but as the person defining the re, and
assuming you haven't generated another sub-pattern somewhere in the same
re with the same name, how can inner ever not be a nested group of outer?
Even in the contrived example below, it is clear that the list of tuples
generated by by findall is of the form:
()[0] = 'outer', ()[1] = 'inner'
from the order of matches principle.
--------------------------------
#!/usr/bin/python
import re
patt = re.compile('(?P<outer>a+(?P<inner>b+))')
result = patt.findall('abaabbaaabbbaaaabbbb')
print result
--------------------------------
however if all you are doing is using .search or .find for the first
match of the pattern, then there should be no scope for confusion anyway.
--
Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2015-04-09 08:00 +1000 |
| Message-ID | <mailman.152.1428530445.12925.python-list@python.org> |
| In reply to | #88687 |
On 08Apr2015 21:30, Denis McMahon <denismfmcmahon@gmail.com> wrote:
>On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
>Pardon me for stating the obvious, but as the person defining the re, and
>assuming you haven't generated another sub-pattern somewhere in the same
>re with the same name, how can inner ever not be a nested group of outer?
If he has to ask this question, one might presume that he is not the source of
the regexp. Ergo, he may not know the regexp structure ahead of time for
whatever reason. I could invent scenarios for that, but perhaps Mattias can
describe his situation some more.
Cheers,
Cameron Simpson <cs@zip.com.au>
In article <CF3rw3.n0x@eskimo.com> pirih@eskimo.com (Chris Pirih) writes:
| Wotsa zerk?
It's a portable hemispherical perforated spooge flange.
- Chuck Rogers, car377@torreys.att.com
[toc] | [prev] | [next] | [standalone]
| From | Mattias Ugelvik <uglemat@gmail.com> |
|---|---|
| Date | 2015-04-08 23:58 +0200 |
| Message-ID | <mailman.156.1428564420.12925.python-list@python.org> |
| In reply to | #88687 |
I'm making a 'declarative string manipulation' tool, the interface of
which should work like this:
>>> rules(r'(?P<outer>(?P<inner>a?))(?P<separate>b?)', {
... 'separate': '.suffix',
... 'inner': 'abc',
... 'outer': lambda string: 'some-{}-manipulation'.format(string)
... }).apply('a')
'some-abc-manipulation.suffix'
Since the 'inner' group is nested, it should be replaced first, then
the replacement function for 'outer' will continue the replacements.
When 'inner' matches the empty string and its span is identical to
'outer', then I need to know whether it is nested, or if it's outside
like 'separate'.
> Pardon me for stating the obvious,
No problem, I can see why my question is weird. I actually implemented
the interface above before I realized that these ambiguities even
existed.
On 08/04/2015, Denis McMahon <denismfmcmahon@gmail.com> wrote:
> On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
> Pardon me for stating the obvious, but as the person defining the re, and
> assuming you haven't generated another sub-pattern somewhere in the same
> re with the same name, how can inner ever not be a nested group of outer?
>
> Even in the contrived example below, it is clear that the list of tuples
> generated by by findall is of the form:
>
> ()[0] = 'outer', ()[1] = 'inner'
>
> from the order of matches principle.
>
> --------------------------------
>
> #!/usr/bin/python
>
> import re
>
> patt = re.compile('(?P<outer>a+(?P<inner>b+))')
>
> result = patt.findall('abaabbaaabbbaaaabbbb')
>
> print result
>
> --------------------------------
>
> however if all you are doing is using .search or .find for the first
> match of the pattern, then there should be no scope for confusion anyway.
>
> --
> Denis McMahon, denismfmcmahon@gmail.com
> --
> https://mail.python.org/mailman/listinfo/python-list
>
[toc] | [prev] | [next] | [standalone]
| From | Mattias Ugelvik <uglemat@gmail.com> |
|---|---|
| Date | 2015-04-09 00:15 +0200 |
| Message-ID | <mailman.157.1428564421.12925.python-list@python.org> |
| In reply to | #88687 |
(sorry if I'm spamming the mailing list, my reply didn't seem to show
up in the archive)
I'm making a 'declarative string manipulation' tool, the interface of
which should work like this:
>>> rules(r'(?P<outer>(?P<inner>a?))(?P<separate>b?)', {
... 'separate': '.suffix',
... 'inner': 'abc',
... 'outer': lambda string: 'some-{}-manipulation'.format(string)
... }).apply('a')
'some-abc-manipulation.suffix'
Since the 'inner' group is nested, it should be replaced first, then
the replacement function for 'outer' will continue the replacements.
When 'inner' matches the empty string and its span is identical to
'outer', then I need to know whether it is nested, or if it's outside
like 'separate'.
> Pardon me for stating the obvious,
No problem, I can see why my question is weird. I actually implemented
the interface above before I realized that these ambiguities even
existed.
On 08/04/2015, Denis McMahon <denismfmcmahon@gmail.com> wrote:
> On Wed, 08 Apr 2015 22:54:57 +0200, Mattias Ugelvik wrote:
>
>> Example: re.compile('(?P<outer>(?P<inner>a))')
>>
>> How can I detect that 'inner' is a nested group of 'outer'? I know that
>> 'inner' comes later, because I can use the `regex.groupindex` (thanks to
>> your help earlier:
>> https://mail.python.org/pipermail/python-list/2015-April/701594.html).
>
> Pardon me for stating the obvious, but as the person defining the re, and
> assuming you haven't generated another sub-pattern somewhere in the same
> re with the same name, how can inner ever not be a nested group of outer?
>
> Even in the contrived example below, it is clear that the list of tuples
> generated by by findall is of the form:
>
> ()[0] = 'outer', ()[1] = 'inner'
>
> from the order of matches principle.
>
> --------------------------------
>
> #!/usr/bin/python
>
> import re
>
> patt = re.compile('(?P<outer>a+(?P<inner>b+))')
>
> result = patt.findall('abaabbaaabbbaaaabbbb')
>
> print result
>
> --------------------------------
>
> however if all you are doing is using .search or .find for the first
> match of the pattern, then there should be no scope for confusion anyway.
>
> --
> Denis McMahon, denismfmcmahon@gmail.com
> --
> https://mail.python.org/mailman/listinfo/python-list
>
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web