Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #74045 > unrolled thread
| Started by | rxjwg98@gmail.com |
|---|---|
| First post | 2014-07-06 11:57 -0700 |
| Last post | 2014-07-07 10:18 -0600 |
| Articles | 6 — 4 participants |
Back to article view | Back to comp.lang.python
How to write this repeat matching? rxjwg98@gmail.com - 2014-07-06 11:57 -0700
Re: How to write this repeat matching? MRAB <python@mrabarnett.plus.com> - 2014-07-06 20:19 +0100
Re: How to write this repeat matching? Ian Kelly <ian.g.kelly@gmail.com> - 2014-07-06 13:26 -0600
Re: How to write this repeat matching? rxjwg98@gmail.com - 2014-07-07 06:30 -0700
Re: How to write this repeat matching? Anssi Saari <as@sci.fi> - 2014-07-07 18:48 +0300
Re: How to write this repeat matching? Ian Kelly <ian.g.kelly@gmail.com> - 2014-07-07 10:18 -0600
| From | rxjwg98@gmail.com |
|---|---|
| Date | 2014-07-06 11:57 -0700 |
| Subject | How to write this repeat matching? |
| Message-ID | <93a40570-00ed-4507-aa16-221d7e500468@googlegroups.com> |
Hi, On Python website, it says that the following match can reach 'abcb' in 6 steps: ............. A step-by-step example will make this more obvious. Let's consider the expression a[bcd]*b. This matches the letter 'a', zero or more letters from the class [bcd], and finally ends with a 'b'. Now imagine matching this RE against the string abcbd. The end of the RE has now been reached, and it has matched abcb. This demonstrates how the matching engine goes as far as it can at first, and if no match is found it will then progressively back up and retry the rest of the RE again and again. It will back up until it has tried zero matches for [bcd]*, and if that subsequently fails, the engine will conclude that the string doesn't match the RE at all. ............. I write the following code: ....... import re line = "abcdb" matchObj = re.match( 'a[bcd]*b', line) if matchObj: print "matchObj.group() : ", matchObj.group() print "matchObj.group(0) : ", matchObj.group() print "matchObj.group(1) : ", matchObj.group(1) print "matchObj.group(2) : ", matchObj.group(2) else: print "No match!!" ......... In which I have used its match pattern, but the result is not 'abcb' Only matchObj.group(0): abcdb displays. All other group(s) have no content. How to write this greedy search? Thanks,
[toc] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2014-07-06 20:19 +0100 |
| Message-ID | <mailman.11555.1404674388.18130.python-list@python.org> |
| In reply to | #74045 |
On 2014-07-06 19:57, rxjwg98@gmail.com wrote:
> Hi,
> On Python website, it says that the following match can reach 'abcb' in 6 steps:
>
> .............
> A step-by-step example will make this more obvious. Let's consider the expression
> a[bcd]*b. This matches the letter 'a', zero or more letters from the class [bcd],
> and finally ends with a 'b'. Now imagine matching this RE against the string
> abcbd.
>
> The end of the RE has now been reached, and it has matched abcb. This
> demonstrates how the matching engine goes as far as it can at first, and if no
> match is found it will then progressively back up and retry the rest of the RE
> again and again. It will back up until it has tried zero matches for [bcd]*, and
> if that subsequently fails, the engine will conclude that the string doesn't
> match the RE at all.
> .............
>
> I write the following code:
>
> .......
> import re
>
> line = "abcdb"
>
> matchObj = re.match( 'a[bcd]*b', line)
>
> if matchObj:
> print "matchObj.group() : ", matchObj.group()
> print "matchObj.group(0) : ", matchObj.group()
> print "matchObj.group(1) : ", matchObj.group(1)
> print "matchObj.group(2) : ", matchObj.group(2)
> else:
> print "No match!!"
> .........
>
> In which I have used its match pattern, but the result is not 'abcb'
>
That's because the example has 'abcb', but you have:
line = "abcdb"
(You've put a 'd' in it.)
> Only matchObj.group(0): abcdb
>
> displays. All other group(s) have no content.
>
There are no capture groups in your regex, only group 0 (the entire
matched part).
> How to write this greedy search?
>
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-07-06 13:26 -0600 |
| Message-ID | <mailman.11559.1404675307.18130.python-list@python.org> |
| In reply to | #74045 |
On Sun, Jul 6, 2014 at 12:57 PM, <rxjwg98@gmail.com> wrote:
> I write the following code:
>
> .......
> import re
>
> line = "abcdb"
>
> matchObj = re.match( 'a[bcd]*b', line)
>
> if matchObj:
> print "matchObj.group() : ", matchObj.group()
> print "matchObj.group(0) : ", matchObj.group()
> print "matchObj.group(1) : ", matchObj.group(1)
> print "matchObj.group(2) : ", matchObj.group(2)
> else:
> print "No match!!"
> .........
>
> In which I have used its match pattern, but the result is not 'abcb'
You're never going to get a match of 'abcb' on that string, because
'abcb' is not found anywhere in that string.
There are two possible matches for the given pattern over that string:
'abcdb' and 'ab'. The first one matches the [bcd]* three times, and
the second one matches it zero times. Because the matching is greedy,
you get the result that matches three times. It cannot match one, two
or four times because then there would be no 'b' following the [bcd]*
portion as required by the pattern.
>
> Only matchObj.group(0): abcdb
>
> displays. All other group(s) have no content.
Calling match.group(0) is equivalent to calling match.group without
arguments. In that case it returns the matched string of the entire
regular expression. match.group(1) and match.group(2) will return the
value of the first and second matching group respectively, but the
pattern does not have any matching groups. If you want a matching
group, then enclose the part that you want it to match in parentheses.
For example, if you change the pattern to:
matchObj = re.match('a([bcd]*)b', line)
then the value of matchObj.group(1) will be 'bcd'
[toc] | [prev] | [next] | [standalone]
| From | rxjwg98@gmail.com |
|---|---|
| Date | 2014-07-07 06:30 -0700 |
| Message-ID | <3840e655-b202-4a8d-b432-77c2d3cd58a4@googlegroups.com> |
| In reply to | #74054 |
On Sunday, July 6, 2014 3:26:44 PM UTC-4, Ian wrote:
> On Sun, Jul 6, 2014 at 12:57 PM, <rxjwg98@gmail.com> wrote:
>
> > I write the following code:
>
> >
>
> > .......
>
> > import re
>
> >
>
> > line = "abcdb"
>
> >
>
> > matchObj = re.match( 'a[bcd]*b', line)
>
> >
>
> > if matchObj:
>
> > print "matchObj.group() : ", matchObj.group()
>
> > print "matchObj.group(0) : ", matchObj.group()
>
> > print "matchObj.group(1) : ", matchObj.group(1)
>
> > print "matchObj.group(2) : ", matchObj.group(2)
>
> > else:
>
> > print "No match!!"
>
> > .........
>
> >
>
> > In which I have used its match pattern, but the result is not 'abcb'
>
>
>
> You're never going to get a match of 'abcb' on that string, because
>
> 'abcb' is not found anywhere in that string.
>
>
>
> There are two possible matches for the given pattern over that string:
>
> 'abcdb' and 'ab'. The first one matches the [bcd]* three times, and
>
> the second one matches it zero times. Because the matching is greedy,
>
> you get the result that matches three times. It cannot match one, two
>
> or four times because then there would be no 'b' following the [bcd]*
>
> portion as required by the pattern.
>
>
>
> >
>
> > Only matchObj.group(0): abcdb
>
> >
>
> > displays. All other group(s) have no content.
>
>
>
> Calling match.group(0) is equivalent to calling match.group without
>
> arguments. In that case it returns the matched string of the entire
>
> regular expression. match.group(1) and match.group(2) will return the
>
> value of the first and second matching group respectively, but the
>
> pattern does not have any matching groups. If you want a matching
>
> group, then enclose the part that you want it to match in parentheses.
>
> For example, if you change the pattern to:
>
>
>
> matchObj = re.match('a([bcd]*)b', line)
>
>
>
> then the value of matchObj.group(1) will be 'bcd'
Because I am new to Python, I may not describe the question clearly. Could you
read the original problem on web:
https://docs.python.org/2/howto/regex.html
It says that it gets 'abcb'. Could you explain it to me? Thanks again
A step-by-step example will make this more obvious. Let's consider the
expression a[bcd]*b. This matches the letter 'a', zero or more letters from
the class [bcd], and finally ends with a 'b'. Now imagine matching this RE
against the string abcbd.
Step Matched Explanation
1 a The a in the RE matches.
2 abcbd The engine matches [bcd]*, going as far as it can, which is to the end
of the string.
3 Failure The engine tries to match b, but the current position is at the end
of the string, so it fails.
4 abcb Back up, so that [bcd]* matches one less character.
5 Failure Try b again, but the current position is at the last character, which
is a 'd'.
6 abc Back up again, so that [bcd]* is only matching bc.
6 abcb Try b again. This time the character at the current position is 'b', so
it succeeds.
[toc] | [prev] | [next] | [standalone]
| From | Anssi Saari <as@sci.fi> |
|---|---|
| Date | 2014-07-07 18:48 +0300 |
| Message-ID | <vg3mwclmaol.fsf@coffee.modeemi.fi> |
| In reply to | #74107 |
rxjwg98@gmail.com writes: > Because I am new to Python, I may not describe the question clearly. Could you > read the original problem on web: > > https://docs.python.org/2/howto/regex.html > > It says that it gets 'abcb'. Could you explain it to me? Thanks again Actually, it tries to explain how * works in the regular expression engine. Do you feel that's a crucial thing for a beginner to understand about Python? Hopefully your answer is no and you can move on.
[toc] | [prev] | [next] | [standalone]
| From | Ian Kelly <ian.g.kelly@gmail.com> |
|---|---|
| Date | 2014-07-07 10:18 -0600 |
| Message-ID | <mailman.11599.1404749958.18130.python-list@python.org> |
| In reply to | #74107 |
On Mon, Jul 7, 2014 at 7:30 AM, <rxjwg98@gmail.com> wrote: > Because I am new to Python, I may not describe the question clearly. Could you > read the original problem on web: > > https://docs.python.org/2/howto/regex.html > > It says that it gets 'abcb'. Could you explain it to me? Thanks again The string being matched in the explanation at that link is 'abcbd', not 'abcdb'. The 'a' in the pattern matches the 'a' in the string, the '[bcd]*' in the pattern matches the 'bc' in the string (with a repeat count of 2), and finally the 'b' in the pattern matches the 'b' following that in the string.
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web