Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #21280

Re: What's the best way to write this regular expression?

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!tudelft.nl!txtfeed1.tudelft.nl!newsfeed20.multikabel.net!multikabel.net!newsfeed10.multikabel.net!xlned.com!feeder5.xlned.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <chris@rebertia.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.071
X-Spam-Evidence '*H*': 0.86; '*S*': 0.00; 'python:': 0.05; 'parsing': 0.09; 'exception': 0.12; 'subject:expression': 0.16; 'subject:regular': 0.16; 'url:lxml': 0.16; 'cc:addr:python-list': 0.16; 'wrote:': 0.18; 'skip:[ 20': 0.19; 'cheers,': 0.20; 'cc:no real name:2**0': 0.21; 'header:In-Reply-To:1': 0.22; 'extract': 0.24; 'received:209.85.220': 0.25; 'cc:2**0': 0.26; 'script': 0.28; 'message-id:@mail.gmail.com': 0.29; 'expressions': 0.29; 'cc:addr:python.org': 0.29; 'pm,': 0.29; 'chris': 0.30; '(as': 0.31; 'specified': 0.31; 'subject:?': 0.31; 'tue,': 0.32; 'sort': 0.33; 'it.': 0.33; 'subject:What': 0.34; 'regular': 0.35; 'but': 0.37; 'received:google.com': 0.37; 'using': 0.37; 'received:209.85': 0.38; 'first.': 0.39; 'received:209': 0.39; 'point': 0.40; "you'll": 0.61; 'john': 0.61; 'header:Received:6': 0.61; 'website': 0.65; 'subject:best': 0.67; 'today': 0.70; 'encountered': 0.73; 'song': 0.73; 'subject:this': 0.74; 'sender:addr:chris': 0.84; 'subject:write': 0.84
Received-SPF pass (google.com: domain of chris@rebertia.com designates 10.52.72.107 as permitted sender) client-ip=10.52.72.107;
Authentication-Results mr.google.com; spf=pass (google.com: domain of chris@rebertia.com designates 10.52.72.107 as permitted sender) smtp.mail=chris@rebertia.com; dkim=pass header.i=chris@rebertia.com
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=rebertia.com; s=google; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZXt7/Bm+1uQ7cej+Uibzq2b6j2sU716/7jkh+KlqidQ=; b=Kvo14CVL/b7Mm4GCqwbVybs8lmpU9gi/5Jvd4Q6JSVxTw1M6Mt4ZTtxaYkFY3C4xVj /1Mwg764SKMHPAaEkP3W1FQM+qxvIrfki+MCs+S2Yiv4mxH5NpFPvhvWhXuNiyIUTnJv OriwZ5HhTezmPZl4zdIkFWo1IP9hLtvVzvvis=
MIME-Version 1.0
Sender chris@rebertia.com
In-Reply-To <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4>
References <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4>
Date Tue, 6 Mar 2012 14:52:10 -0800
X-Google-Sender-Auth OUueWC72D1cTs01H7dHMKBUXwEE
Subject Re: What's the best way to write this regular expression?
From Chris Rebert <clp2@rebertia.com>
To John Salerno <johnjsal@gmail.com>
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding quoted-printable
X-Gm-Message-State ALoCoQnk0vbrEpH59bbyI4IVgHZ2UGuIhw0DVkZhYilixfcTW1LiYxntak0H80jeHEWJmBx6BRYv
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.442.1331074333.3037.python-list@python.org> (permalink)
Lines 19
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1331074333 news.xs4all.nl 6856 [2001:888:2000:d::a6]:54928
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:21280

Show key headers only | View raw


On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnjsal@gmail.com> wrote:
> I sort of have to work with what the website gives me (as you'll see below), but today I encountered an exception to my RE. Let me just give all the specific information first. The point of my script is to go to the specified URL and extract song information from it.
>
> This is my RE:
>
> song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)

I would advise against using regular expressions to "parse" HTML:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags

lxml is a popular choice for parsing HTML in Python: http://lxml.de

Cheers,
Chris

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 14:43 -0800
  Re: What's the best way to write this regular expression? Chris Rebert <clp2@rebertia.com> - 2012-03-06 14:52 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:25 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-06 16:35 -0700
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 17:39 -0600
        Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:04 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-06 23:44 +0000
          Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:57 -0800
            RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 00:04 +0000
            Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:06 -0500
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
  Re: What's the best way to write this regular expression? Roy Smith <roy@panix.com> - 2012-03-06 20:26 -0500
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 23:02 -0800
    Re: What's the best way to write this regular expression? Paul Rubin <no.email@nospam.invalid> - 2012-03-07 02:36 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 12:39 -0800
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:01 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:11 -0600
      Re: What's the best way to write this regular expression? alex23 <wuwei23@gmail.com> - 2012-03-08 19:38 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 19:52 -0800
    Re: What's the best way to write this regular expression? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-03-07 16:27 -0500
    RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 21:31 +0000
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:34 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:44 -0600
    Re: RE: What's the best way to write this regular expression? Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-07 16:02 -0600
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:26 -0800
    Re: What's the best way to write this regular expression? Chris Angelico <rosuav@gmail.com> - 2012-03-08 16:03 +1100
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:25 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:33 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:40 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:52 -0800
      Re: What's the best way to write this regular expression? John Gordon <gordon@panix.com> - 2012-03-08 21:54 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 17:19 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 16:25 -0600
      RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-08 23:02 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 18:23 -0500
      Re: What's the best way to write this regular expression? Ethan Furman <ethan@stoneleaf.us> - 2012-03-08 14:52 -0800
        Re: What's the best way to write this regular expression? jkn <jkn_gg@nicorp.f9.co.uk> - 2012-03-09 02:45 -0800

csiph-web