Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70608 > unrolled thread

Re: possible bug in re expression?

Started byMRAB <python@mrabarnett.plus.com>
First post2014-04-25 18:57 +0100
Last post2014-04-25 18:57 +0100
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: possible bug in re expression? MRAB <python@mrabarnett.plus.com> - 2014-04-25 18:57 +0100

#70608 — Re: possible bug in re expression?

FromMRAB <python@mrabarnett.plus.com>
Date2014-04-25 18:57 +0100
SubjectRe: possible bug in re expression?
Message-ID<mailman.9504.1398448647.18130.python-list@python.org>
On 2014-04-25 17:55, Chris Angelico wrote:
> On Sat, Apr 26, 2014 at 2:30 AM, Robin Becker <robin@reportlab.com> wrote:
>> Whilst translating some javascript code I find that this
>>
>> A=re.compile('.{1,+3}').findall(p)
>>
>> doesn't give any error, but doesn't manage to find the strings in p that I
>> want len(A)==>0, the correct translation should have been
>>
>> A=re.compile('.{1,3}').findall(p)
>>
>> which works fine.
>>
>> should
>>
>> re.compile('.{1,+3}')
>>
>> raise an error? It doesn't on python 2.7 or 3.3.
>
> I would say the surprising part is that your js code doesn't mind an
> extraneous character in the regex. In a brace like that, negative
> numbers have no meaning, so I would expect the definition of the regex
> to look for digits, not "anything that can be parsed as a number". So
> you've uncovered a bug in your code that just happened to work in js.
>
> Should it raise an error? Good question. Quite possibly it should,
> unless that has some other meaning that I'm not familiar with. Do you
> know how it's being interpreted? I'm not entirely sure what you mean
> by "len(A)==>0", as ==> isn't an operator in Python or JS. Best way to
> continue, I think, would be to use regular expression matching (rather
> than findall'ing) and something other than dot, and tabulate input
> strings, expected result (match or no match), what JS does, and what
> Python does. For instance:
>
> Regex: "^a{1,3}$"
>
> "": Not expected, not Python
> "a": Expected, Python
> "aa": Expected, Python
> "aaa": Expected, Python
> "aaaa": Not expected, not Python
>
> Just what we'd expect. Now try the same thing with the plus in there.
> I'm finding that none of the above strings yields a match. Maybe
> there's something else being matched?
>
The DEBUG flag helps to show what's happening:

 >>> r = re.compile('.{1,+3}', flags=re.DEBUG)
any None
literal 123
literal 49
max_repeat 1 4294967295
   literal 44
literal 51
literal 125

When it's parsing the pattern it's doing this:

.    OK, match any character
{    Looks like the start of a quantifier
1    OK, the minimum count
,    OK, the maximum count probably follows
+    Error; it looks like the '{' was a literal

Trying again from the brace:

{    Literal
1    Literal
,    Literal
+    Repeat the previous item one or more times
3    Literal
}    Literal

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web