Groups > comp.lang.python > #70672 > unrolled thread

Re: possible bug in re expression?

Started by	Robin Becker <robin@reportlab.com>
First post	2014-04-28 10:47 +0100
Last post	2014-04-28 14:06 +0100
Articles	3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.

  Re: possible bug in re expression? Robin Becker <robin@reportlab.com> - 2014-04-28 10:47 +0100
    Re: possible bug in re expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-04-28 11:49 +0000
      Re: possible bug in re expression? Robin Becker <robin@reportlab.com> - 2014-04-28 14:06 +0100

#70672 — Re: possible bug in re expression?

From	Robin Becker <robin@reportlab.com>
Date	2014-04-28 10:47 +0100
Subject	Re: possible bug in re expression?
Message-ID	<mailman.9545.1398678492.18130.python-list@python.org>

On 25/04/2014 19:32, Terry Reedy wrote:
..........
> I suppose that one could argue that '{' alone should be treated as special
> immediately, and not just when a matching '}' is found, and should disable other
> special meanings. I wonder what JS does if there is no matching '}'?
>
well in fact I suspect this is my mistranslation of the original

new RegExp('.{1,' + (+size) + '}', 'g')

my hacked up translator doesn't know what that means. I suspect that (+size) is 
an attempt to force size to an integer prior to it being forced to a string. I 
used to believe that conversion was always written 0-x, but experimentally 
(+"3") ends up as 3 not "3".

Naively, I imagined that re would complain about ambiguous regular expressions, 
but in the regexp world n problems --> n+1 problems almost surely so I should 
have anticipated it :)

Does this in fact that almost any broken regexp specification will silently fail 
because re will reset and consider any metacharacter as literal?
-- 
Robin Becker

[toc] | [next] | [standalone]

#70675

From	Steven D'Aprano <steve+comp.lang.python@pearwood.info>
Date	2014-04-28 11:49 +0000
Message-ID	<535e4037$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to	#70672

On Mon, 28 Apr 2014 10:47:54 +0100, Robin Becker wrote:

> Does this in fact that almost any broken regexp specification will
> silently fail because re will reset and consider any metacharacter as
> literal?

Well, I don't know about "almost any", but at least some broken regexes 
will explicitly fail:



py> import re
py> re.search('*', "123*4")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/re.py", line 161, in search
    return _compile(pattern, flags).search(string)
  [...]
  File "/usr/local/lib/python3.3/sre_parse.py", line 552, in _parse
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat

(For brevity I have abbreviated the traceback.)

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]

#70680

From	Robin Becker <robin@reportlab.com>
Date	2014-04-28 14:06 +0100
Message-ID	<mailman.9548.1398690389.18130.python-list@python.org>
In reply to	#70675

On 28/04/2014 12:49, Steven D'Aprano wrote:
......
>
> Well, I don't know about "almost any", but at least some broken regexes
> will explicitly fail:
>
>
>
> py> import re
........
> sre_constants.error: nothing to repeat
>
> (For brevity I have abbreviated the traceback.)
>
so there is intent to catch some specification errors.

I've abandoned this translation anyhow as all that was intended was to split the 
string into non-overlapping strings of size at most k. I find this works faster 
than the regexp even if the regexp is pre-compiled.

[p[i:i+k] for i in xrange(0,len(p),k)]
-- 
Robin Becker

[toc] | [prev] | [standalone]

csiph-web

Re: possible bug in re expression?

Contents

#70672 — Re: possible bug in re expression?

#70675

#70680