Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #17196
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <vlastimil.brom@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.012 |
| X-Spam-Evidence | '*H*': 0.98; '*S*': 0.00; '"if': 0.04; 'string.': 0.04; 'url:pypi': 0.08; 'substring': 0.09; 'output': 0.10; 'captured': 0.16; 'matched': 0.16; 'url:html#re': 0.16; 'this:': 0.16; '>>>': 0.18; 'repeated': 0.18; 'header:In-Reply-To:1': 0.22; 'refers': 0.23; 'subject:group': 0.23; 'times,': 0.24; 'subject: : ': 0.25; 'code': 0.25; 'import': 0.27; 'url:mailman': 0.28; 'message-id:@mail.gmail.com': 0.28; 'matches': 0.29; 'print': 0.29; 'example': 0.29; 'pattern': 0.30; 'subject:skip:i 10': 0.30; 'url:library': 0.31; 'hi,': 0.32; 'url:listinfo': 0.32; 'received:209.85.161.46': 0.32; 'received:mail- fx0-f46.google.com': 0.32; 'object': 0.33; 'match': 0.34; 'to:addr :python-list': 0.34; 'url:python': 0.36; 'received:209.85.161': 0.36; 'received:google.com': 0.37; 'another': 0.37; 'received:209.85': 0.38; 'url:docs': 0.39; 'url:org': 0.39; 'why': 0.39; 'group,': 0.40; 'received:209': 0.40; 'to:addr:python.org': 0.40; 'one,': 0.40; 'within': 0.60; 'special': 0.68; 'candide': 0.84; 'numbered': 0.84; 'substrings': 0.84 |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type:content-transfer-encoding; bh=bszkD+t5dGHaRBBoIAkTRhpTZmkf3ACNd4l3wDBZkiw=; b=L98qPVvniIveoKSaLL8ohWZPtgUlew4gEHMikPSJyVLLUtOnW2XQ6Fr+Il00WF8pvl vy72sDW9bdUtCXzbDxFxCnmwKyLmzE8avVIheqW0s0PmN9143KABuW3PiScLoq8nyBAS 8iUNGEOWB7PKsGAM+hOxawkwNEbXyGnGpA/iE= |
| MIME-Version | 1.0 |
| In-Reply-To | <4ee88488$0$27871$426a74cc@news.free.fr> |
| References | <4ee88488$0$27871$426a74cc@news.free.fr> |
| Date | Wed, 14 Dec 2011 12:34:16 +0100 |
| Subject | Re: Regexp : repeated group identification |
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
| To | python-list@python.org |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.3635.1323862458.27778.python-list@python.org> (permalink) |
| Lines | 57 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1323862458 news.xs4all.nl 6974 [2001:888:2000:d::a6]:47033 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.python:17196 |
Show key headers only | View raw
2011/12/14 candide <candide@free.invalid>:
> Consider the following code
>
> # ----------------------------
> import re
>
> z=re.match('(Spam\d)+', 'Spam4Spam2Spam7Spam8')
> print z.group(0)
> print z.group(1)
> # ----------------------------
>
> outputting :
>
> ----------------------------
> Spam4Spam2Spam7Spam8
> Spam8
> ----------------------------
>
> The '(Spam\d)+' regexp is tested against 'Spam4Spam2Spam7Spam8' and the
> regexp matches the string.
>
> Group numbered one within the regex '(Spam\d)+' refers to Spam\d
>
> The fours substrings
>
> Spam4 Spam2 Spam7 and Spam8
>
> match the group numbered 1.
>
> So I don't understand why z.group(1) gives the last substring (ie Spam8 as
> the output shows), why not an another one, Spam4 for example ?
> --
> http://mail.python.org/mailman/listinfo/python-list
Hi,
you may find a tiny notice in the re docs on this:
http://docs.python.org/library/re.html#re.MatchObject.group
"If a group is contained in a part of the pattern that matched
multiple times, the last match is returned."
If you need to work with the content captured in the repeated group,
you may check the new regex implementation:
http://pypi.python.org/pypi/regex
Which has a special "captures" method of the match object for this
(beyond many other improvements):
>>> import regex
>>> m=regex.match('(Spam\d)+', 'Spam4Spam2Spam7Spam8')
>>> m.captures(1)
['Spam4', 'Spam2', 'Spam7', 'Spam8']
>>>
hth,
vbr
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Regexp : repeated group identification candide <candide@free.invalid> - 2011-12-14 12:12 +0100
Re: Regexp : repeated group identification Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-12-14 12:34 +0100
Re: Regexp : repeated group identification candide <candide@free.invalid> - 2011-12-14 13:57 +0100
Re: Regexp : repeated group identification Vlastimil Brom <vlastimil.brom@gmail.com> - 2011-12-14 14:38 +0100
csiph-web