Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #21280
| Path | csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!tudelft.nl!txtfeed1.tudelft.nl!newsfeed20.multikabel.net!multikabel.net!newsfeed10.multikabel.net!xlned.com!feeder5.xlned.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <chris@rebertia.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.071 |
| X-Spam-Evidence | '*H*': 0.86; '*S*': 0.00; 'python:': 0.05; 'parsing': 0.09; 'exception': 0.12; 'subject:expression': 0.16; 'subject:regular': 0.16; 'url:lxml': 0.16; 'cc:addr:python-list': 0.16; 'wrote:': 0.18; 'skip:[ 20': 0.19; 'cheers,': 0.20; 'cc:no real name:2**0': 0.21; 'header:In-Reply-To:1': 0.22; 'extract': 0.24; 'received:209.85.220': 0.25; 'cc:2**0': 0.26; 'script': 0.28; 'message-id:@mail.gmail.com': 0.29; 'expressions': 0.29; 'cc:addr:python.org': 0.29; 'pm,': 0.29; 'chris': 0.30; '(as': 0.31; 'specified': 0.31; 'subject:?': 0.31; 'tue,': 0.32; 'sort': 0.33; 'it.': 0.33; 'subject:What': 0.34; 'regular': 0.35; 'but': 0.37; 'received:google.com': 0.37; 'using': 0.37; 'received:209.85': 0.38; 'first.': 0.39; 'received:209': 0.39; 'point': 0.40; "you'll": 0.61; 'john': 0.61; 'header:Received:6': 0.61; 'website': 0.65; 'subject:best': 0.67; 'today': 0.70; 'encountered': 0.73; 'song': 0.73; 'subject:this': 0.74; 'sender:addr:chris': 0.84; 'subject:write': 0.84 |
| Received-SPF | pass (google.com: domain of chris@rebertia.com designates 10.52.72.107 as permitted sender) client-ip=10.52.72.107; |
| Authentication-Results | mr.google.com; spf=pass (google.com: domain of chris@rebertia.com designates 10.52.72.107 as permitted sender) smtp.mail=chris@rebertia.com; dkim=pass header.i=chris@rebertia.com |
| DKIM-Signature | v=1; a=rsa-sha256; c=relaxed/relaxed; d=rebertia.com; s=google; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; bh=ZXt7/Bm+1uQ7cej+Uibzq2b6j2sU716/7jkh+KlqidQ=; b=Kvo14CVL/b7Mm4GCqwbVybs8lmpU9gi/5Jvd4Q6JSVxTw1M6Mt4ZTtxaYkFY3C4xVj /1Mwg764SKMHPAaEkP3W1FQM+qxvIrfki+MCs+S2Yiv4mxH5NpFPvhvWhXuNiyIUTnJv OriwZ5HhTezmPZl4zdIkFWo1IP9hLtvVzvvis= |
| MIME-Version | 1.0 |
| Sender | chris@rebertia.com |
| In-Reply-To | <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4> |
| References | <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4> |
| Date | Tue, 6 Mar 2012 14:52:10 -0800 |
| X-Google-Sender-Auth | OUueWC72D1cTs01H7dHMKBUXwEE |
| Subject | Re: What's the best way to write this regular expression? |
| From | Chris Rebert <clp2@rebertia.com> |
| To | John Salerno <johnjsal@gmail.com> |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | quoted-printable |
| X-Gm-Message-State | ALoCoQnk0vbrEpH59bbyI4IVgHZ2UGuIhw0DVkZhYilixfcTW1LiYxntak0H80jeHEWJmBx6BRYv |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.12 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.442.1331074333.3037.python-list@python.org> (permalink) |
| Lines | 19 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1331074333 news.xs4all.nl 6856 [2001:888:2000:d::a6]:54928 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:21280 |
Show key headers only | View raw
On Tue, Mar 6, 2012 at 2:43 PM, John Salerno <johnjsal@gmail.com> wrote:
> I sort of have to work with what the website gives me (as you'll see below), but today I encountered an exception to my RE. Let me just give all the specific information first. The point of my script is to go to the specified URL and extract song information from it.
>
> This is my RE:
>
> song_pattern = re.compile(r'([0-9]{1,2}:[0-9]{2} [a|p].m.).*?<a.*?>(.*?)</a>.*?<a.*?>(.*?)</a>', re.DOTALL)
I would advise against using regular expressions to "parse" HTML:
http://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags
lxml is a popular choice for parsing HTML in Python: http://lxml.de
Cheers,
Chris
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 14:43 -0800
Re: What's the best way to write this regular expression? Chris Rebert <clp2@rebertia.com> - 2012-03-06 14:52 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:25 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-06 16:35 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 17:39 -0600
Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:04 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
Re: What's the best way to write this regular expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-06 23:44 +0000
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:57 -0800
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 00:04 +0000
Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:06 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
Re: What's the best way to write this regular expression? Roy Smith <roy@panix.com> - 2012-03-06 20:26 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 23:02 -0800
Re: What's the best way to write this regular expression? Paul Rubin <no.email@nospam.invalid> - 2012-03-07 02:36 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 12:39 -0800
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:01 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:11 -0600
Re: What's the best way to write this regular expression? alex23 <wuwei23@gmail.com> - 2012-03-08 19:38 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 19:52 -0800
Re: What's the best way to write this regular expression? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-03-07 16:27 -0500
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 21:31 +0000
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:34 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:44 -0600
Re: RE: What's the best way to write this regular expression? Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-07 16:02 -0600
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:26 -0800
Re: What's the best way to write this regular expression? Chris Angelico <rosuav@gmail.com> - 2012-03-08 16:03 +1100
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:25 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:33 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:40 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:52 -0800
Re: What's the best way to write this regular expression? John Gordon <gordon@panix.com> - 2012-03-08 21:54 +0000
Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 17:19 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 16:25 -0600
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-08 23:02 +0000
Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 18:23 -0500
Re: What's the best way to write this regular expression? Ethan Furman <ethan@stoneleaf.us> - 2012-03-08 14:52 -0800
Re: What's the best way to write this regular expression? jkn <jkn_gg@nicorp.f9.co.uk> - 2012-03-09 02:45 -0800
csiph-web