Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #70672 > unrolled thread

Re: possible bug in re expression?

Started byRobin Becker <robin@reportlab.com>
First post2014-04-28 10:47 +0100
Last post2014-04-28 14:06 +0100
Articles 3 — 2 participants

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: possible bug in re expression? Robin Becker <robin@reportlab.com> - 2014-04-28 10:47 +0100
    Re: possible bug in re expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2014-04-28 11:49 +0000
      Re: possible bug in re expression? Robin Becker <robin@reportlab.com> - 2014-04-28 14:06 +0100

#70672 — Re: possible bug in re expression?

FromRobin Becker <robin@reportlab.com>
Date2014-04-28 10:47 +0100
SubjectRe: possible bug in re expression?
Message-ID<mailman.9545.1398678492.18130.python-list@python.org>
On 25/04/2014 19:32, Terry Reedy wrote:
..........
> I suppose that one could argue that '{' alone should be treated as special
> immediately, and not just when a matching '}' is found, and should disable other
> special meanings. I wonder what JS does if there is no matching '}'?
>
well in fact I suspect this is my mistranslation of the original

new RegExp('.{1,' + (+size) + '}', 'g')

my hacked up translator doesn't know what that means. I suspect that (+size) is 
an attempt to force size to an integer prior to it being forced to a string. I 
used to believe that conversion was always written 0-x, but experimentally 
(+"3") ends up as 3 not "3".

Naively, I imagined that re would complain about ambiguous regular expressions, 
but in the regexp world n problems --> n+1 problems almost surely so I should 
have anticipated it :)

Does this in fact that almost any broken regexp specification will silently fail 
because re will reset and consider any metacharacter as literal?
-- 
Robin Becker

[toc] | [next] | [standalone]


#70675

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2014-04-28 11:49 +0000
Message-ID<535e4037$0$29965$c3e8da3$5496439d@news.astraweb.com>
In reply to#70672
On Mon, 28 Apr 2014 10:47:54 +0100, Robin Becker wrote:

> Does this in fact that almost any broken regexp specification will
> silently fail because re will reset and consider any metacharacter as
> literal?

Well, I don't know about "almost any", but at least some broken regexes 
will explicitly fail:



py> import re
py> re.search('*', "123*4")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.3/re.py", line 161, in search
    return _compile(pattern, flags).search(string)
  [...]
  File "/usr/local/lib/python3.3/sre_parse.py", line 552, in _parse
    raise error("nothing to repeat")
sre_constants.error: nothing to repeat

(For brevity I have abbreviated the traceback.)

-- 
Steven D'Aprano
http://import-that.dreamwidth.org/

[toc] | [prev] | [next] | [standalone]


#70680

FromRobin Becker <robin@reportlab.com>
Date2014-04-28 14:06 +0100
Message-ID<mailman.9548.1398690389.18130.python-list@python.org>
In reply to#70675
On 28/04/2014 12:49, Steven D'Aprano wrote:
......
>
> Well, I don't know about "almost any", but at least some broken regexes
> will explicitly fail:
>
>
>
> py> import re
........
> sre_constants.error: nothing to repeat
>
> (For brevity I have abbreviated the traceback.)
>
so there is intent to catch some specification errors.

I've abandoned this translation anyhow as all that was intended was to split the 
string into non-overlapping strings of size at most k. I find this works faster 
than the regexp even if the regexp is pre-compiled.

[p[i:i+k] for i in xrange(0,len(p),k)]
-- 
Robin Becker

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web