Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #6536

Re: Weird problem matching with REs

Date 2011-05-29 08:41 -0500
From Andrew Berg <bahamutzero8825@gmail.com>
Subject Re: Weird problem matching with REs
References <mailman.2220.1306669538.9059.python-list@python.org> <4de2459b$0$29996$c3e8da3$5496439d@news.astraweb.com>
Newsgroups comp.lang.python
Message-ID <mailman.2222.1306676482.9059.python-list@python.org> (permalink)

Show all headers | View raw


On 2011.05.29 08:09 AM, Steven D'Aprano wrote:
> On Sun, 29 May 2011 06:45:30 -0500, Andrew Berg wrote:
>
> > I have an RE that should work (it even works in Kodos [1], but not in my
> > code), but it keeps failing to match characters after a newline.
>
> Not all regexes are the same. Different regex engines accept different 
> symbols, and sometimes behave differently, or have different default 
> behavior. That your regex works in Kodos but not Python might mean you're 
> writing a Kodus regex instead of a Python regex.
Kodos is written in Python and uses Python's regex engine. In fact, it
is specifically intended to debug Python regexes.
> Firstly, most of the code you show is irrelevant to the problem. Please 
> simplify it to the shortest, most simple example you can give. That would 
> be a simplified piece of text (not the entire web page!), the regex, and 
> the failed attempt to use it. The rest of your code is just noise for the 
> purposes of solving this problem.
I wasn't sure how much would be relevant since it could've been a
problem with other code. I do apologize for not putting more effort into
trimming it down, though.
> Secondly, you probably should use a proper HTML parser, rather than a 
> regex. Resist the temptation to use regexes to rip out bits of text from 
> HTML, it almost always goes wrong eventually.
I find this a much simpler approach, especially since I'm dealing with
broken HTML. I guess I don't see how the effort put into learning a
parser and adding the extra code to use it pays off in this particular
endeavor.
> > I was able to make a regex that matches in my code, but it shouldn't:
> > http://x264.nl/x264/64bit/8bit_depth/revision.\n{1,3}[0-9]{4}.\n{1,3}/
> x264.\n{1,3}.\n{1,3}.exe
>
> What makes you think it shouldn't match?
AFAIK, dots aren't supposed to match carriage returns or any other
whitespace characters.
> By the way, you probably should escape the dots, otherwise it will match 
> strings containing any arbitrary character, rather than *just* dots:
You're right; I overlooked the dots in the URL.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 06:45 -0500
  Re: Weird problem matching with REs Ben Finney <ben+python@benfinney.id.au> - 2011-05-29 23:00 +1000
    Re: Weird problem matching with REs Ben Finney <ben+python@benfinney.id.au> - 2011-05-29 23:03 +1000
    Re: Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 08:29 -0500
  Re: Weird problem matching with REs Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-05-29 13:09 +0000
    Re: Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 08:41 -0500
      Re: Weird problem matching with REs Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-05-29 14:18 +0000
        Re: Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 09:35 -0500
          Re: Weird problem matching with REs John S <jstrickler@gmail.com> - 2011-05-29 08:48 -0700
            Re: Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 11:16 -0500
              Re: Weird problem matching with REs John S <jstrickler@gmail.com> - 2011-05-29 09:45 -0700
            Re: Weird problem matching with REs Chris Angelico <rosuav@gmail.com> - 2011-05-30 03:57 +1000
      Re: Weird problem matching with REs Roy Smith <roy@panix.com> - 2011-05-29 11:19 -0400
        Re: Weird problem matching with REs Andrew Berg <bahamutzero8825@gmail.com> - 2011-05-29 10:31 -0500
          Re: Weird problem matching with REs Thomas 'PointedEars' Lahn <PointedEars@web.de> - 2011-05-29 21:06 +0200

csiph-web