Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #17196

Re: Regexp : repeated group identification

References <4ee88488$0$27871$426a74cc@news.free.fr>
Date 2011-12-14 12:34 +0100
Subject Re: Regexp : repeated group identification
From Vlastimil Brom <vlastimil.brom@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.3635.1323862458.27778.python-list@python.org> (permalink)

Show all headers | View raw


2011/12/14 candide <candide@free.invalid>:
> Consider the following code
>
> # ----------------------------
> import re
>
> z=re.match('(Spam\d)+', 'Spam4Spam2Spam7Spam8')
> print z.group(0)
> print z.group(1)
> # ----------------------------
>
> outputting :
>
> ----------------------------
> Spam4Spam2Spam7Spam8
> Spam8
> ----------------------------
>
> The '(Spam\d)+' regexp is tested against 'Spam4Spam2Spam7Spam8' and the
> regexp matches the string.
>
> Group numbered one within the regex '(Spam\d)+' refers to Spam\d
>
> The fours substrings
>
> Spam4   Spam2   Spam7  and  Spam8
>
> match the group numbered 1.
>
> So I don't understand why z.group(1) gives the last substring (ie Spam8 as
> the output shows), why not an another one, Spam4 for example ?
> --
> http://mail.python.org/mailman/listinfo/python-list

Hi,
you may find a tiny notice in the re docs on this:
http://docs.python.org/library/re.html#re.MatchObject.group

"If a group is contained in a part of the pattern that matched
multiple times, the last match is returned."

If you need to work with the content captured in the repeated group,
you may check the new regex implementation:
http://pypi.python.org/pypi/regex

Which has a special "captures" method of the match object for this
(beyond many other improvements):

>>> import regex
>>> m=regex.match('(Spam\d)+', 'Spam4Spam2Spam7Spam8')
>>> m.captures(1)
['Spam4', 'Spam2', 'Spam7', 'Spam8']
>>>

hth,
  vbr

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Regexp : repeated group identification candide <candide@free.invalid> - 2011-12-14 12:12 +0100
  Re: Regexp : repeated group identification Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-12-14 12:34 +0100
    Re: Regexp : repeated group identification candide <candide@free.invalid> - 2011-12-14 13:57 +0100
      Re: Regexp : repeated group identification Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-12-14 14:38 +0100

csiph-web