Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #102020 > unrolled thread
| Started by | mg <noOne@nowhere.com> |
|---|---|
| First post | 2016-01-22 15:32 +0000 |
| Last post | 2016-01-23 11:39 +0100 |
| Articles | 6 — 3 participants |
Back to article view | Back to comp.lang.python
one more question on regex mg <noOne@nowhere.com> - 2016-01-22 15:32 +0000
Re: one more question on regex Peter Otten <__peter__@web.de> - 2016-01-22 16:47 +0100
Re: one more question on regex mg <noOne@nowhere.com> - 2016-01-22 15:50 +0000
Re: one more question on regex Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-01-22 21:10 +0100
Re: one more question on regex mg <noOne@nowhere.com> - 2016-01-22 22:47 +0000
Re: one more question on regex Vlastimil Brom <vlastimil.brom@gmail.com> - 2016-01-23 11:39 +0100
| From | mg <noOne@nowhere.com> |
|---|---|
| Date | 2016-01-22 15:32 +0000 |
| Subject | one more question on regex |
| Message-ID | <n7ti39$7rt$1@gioia.aioe.org> |
python 3.4.3
import re
re.search('(ab){2}','abzzabab')
<_sre.SRE_Match object; span=(4, 8), match='abab'>
>>> re.findall('(ab){2}','abzzabab')
['ab']
Why for search() the match is 'abab' and for findall the match is 'ab'?
[toc] | [next] | [standalone]
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Date | 2016-01-22 16:47 +0100 |
| Message-ID | <mailman.171.1453477686.15297.python-list@python.org> |
| In reply to | #102020 |
mg wrote:
> python 3.4.3
>
> import re
> re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>
>>>> re.findall('(ab){2}','abzzabab')
> ['ab']
>
> Why for search() the match is 'abab' and for findall the match is 'ab'?
I suppose someone thought it was convenient for findall to return the
explicit groups if there are any. If you want the whole match aka group(0)
you can get that with
>>> re.findall('(?:ab){2}','abzzabab')
['abab']
[toc] | [prev] | [next] | [standalone]
| From | mg <noOne@nowhere.com> |
|---|---|
| Date | 2016-01-22 15:50 +0000 |
| Message-ID | <n7tj3j$9ra$1@gioia.aioe.org> |
| In reply to | #102020 |
Il Fri, 22 Jan 2016 15:32:57 +0000, mg ha scritto:
> python 3.4.3
>
> import re re.search('(ab){2}','abzzabab')
> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>
>>>> re.findall('(ab){2}','abzzabab')
> ['ab']
>
> Why for search() the match is 'abab' and for findall the match is 'ab'?
finditer seems to be consistent with search:
regex = re.compile('(ab){2}')
for match in regex.finditer('abzzababab'):
print ("%s: %s" % (match.start(), match.span() ))
...
4: (4, 8)
[toc] | [prev] | [next] | [standalone]
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
|---|---|
| Date | 2016-01-22 21:10 +0100 |
| Message-ID | <mailman.173.1453493453.15297.python-list@python.org> |
| In reply to | #102022 |
2016-01-22 16:50 GMT+01:00 mg <noOne@nowhere.com>:
> Il Fri, 22 Jan 2016 15:32:57 +0000, mg ha scritto:
>
>> python 3.4.3
>>
>> import re re.search('(ab){2}','abzzabab')
>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>
>>>>> re.findall('(ab){2}','abzzabab')
>> ['ab']
>>
>> Why for search() the match is 'abab' and for findall the match is 'ab'?
>
> finditer seems to be consistent with search:
> regex = re.compile('(ab){2}')
>
> for match in regex.finditer('abzzababab'):
> print ("%s: %s" % (match.start(), match.span() ))
> ...
> 4: (4, 8)
>
> --
> https://mail.python.org/mailman/listinfo/python-list
Hi,
as was already pointed out, findall "collects" the content of the
capturing groups (if present), rather than the whole matching text;
for repeated captures the last content of them is taken discarding the
previous ones; cf.:
>>> re.findall('(?i)(a)x(b)+','axbB')
[('a', 'B')]
>>>
(for multiple capturing groups in the pattern, a tuple of captured
parts are collected)
or with your example with differenciated parts of the string using
upper/lower case:
>>> re.findall('(?i)(ab){2}','aBzzAbAB')
['AB']
>>>
hth,
vbr
[toc] | [prev] | [next] | [standalone]
| From | mg <noOne@nowhere.com> |
|---|---|
| Date | 2016-01-22 22:47 +0000 |
| Message-ID | <n7ubhk$k9f$1@gioia.aioe.org> |
| In reply to | #102025 |
Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto:
> 2016-01-22 16:50 GMT+01:00 mg <noOne@nowhere.com>:
>> Il Fri, 22 Jan 2016 15:32:57 +0000, mg ha scritto:
>>
>>> python 3.4.3
>>>
>>> import re re.search('(ab){2}','abzzabab')
>>> <_sre.SRE_Match object; span=(4, 8), match='abab'>
>>>
>>>>>> re.findall('(ab){2}','abzzabab')
>>> ['ab']
>>>
>>> Why for search() the match is 'abab' and for findall the match is
>>> 'ab'?
>>
>> finditer seems to be consistent with search:
>> regex = re.compile('(ab){2}')
>>
>> for match in regex.finditer('abzzababab'):
>> print ("%s: %s" % (match.start(), match.span() ))
>> ...
>> 4: (4, 8)
>>
>> -- https://mail.python.org/mailman/listinfo/python-list
>
> Hi,
> as was already pointed out, findall "collects" the content of the
> capturing groups (if present), rather than the whole matching text;
>
> for repeated captures the last content of them is taken discarding the
> previous ones; cf.:
>
>>>> re.findall('(?i)(a)x(b)+','axbB')
> [('a', 'B')]
>>>>
> (for multiple capturing groups in the pattern, a tuple of captured parts
> are collected)
>
> or with your example with differenciated parts of the string using
> upper/lower case:
>>>> re.findall('(?i)(ab){2}','aBzzAbAB')
> ['AB']
>>>>
>>>>
> hth,
> vbr
You explanation of re.findall() results is correct. My point is that the
documentation states:
re.findall(pattern, string, flags=0)
Return all non-overlapping matches of pattern in string, as a list of
strings
and this is not what re.findall does. IMHO it should be more reasonable
to get back the whole matches, since this seems to me the most useful
information for the user. In any case I'll go with finditer, that returns
in match object all the infos that anyone can look for.
[toc] | [prev] | [next] | [standalone]
| From | Vlastimil Brom <vlastimil.brom@gmail.com> |
|---|---|
| Date | 2016-01-23 11:39 +0100 |
| Message-ID | <mailman.174.1453545581.15297.python-list@python.org> |
| In reply to | #102026 |
2016-01-22 23:47 GMT+01:00 mg <noOne@nowhere.com>: > Il Fri, 22 Jan 2016 21:10:44 +0100, Vlastimil Brom ha scritto: > >> [...] > > You explanation of re.findall() results is correct. My point is that the > documentation states: > > re.findall(pattern, string, flags=0) > Return all non-overlapping matches of pattern in string, as a list of > strings > > and this is not what re.findall does. IMHO it should be more reasonable > to get back the whole matches, since this seems to me the most useful > information for the user. In any case I'll go with finditer, that returns > in match object all the infos that anyone can look for. > -- > https://mail.python.org/mailman/listinfo/python-list Hi, I don't know the reasoning for this special behaviour of findall, but it seems to be documented explicitly: https://docs.python.org/3/library/re.html#re.findall "... If one or more groups are present in the pattern, return a list of groups; this will be a list of tuples if the pattern has more than one group. finditer is clearly much more robust for general usage. I only use findall for quick one-line tests (and there one has to account for this specificities - either by using non capturing groups or enclosing the whole pattern in a "main" group and use the first items in the resulting tuples. vbr
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web