Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #21289

Re: What's the best way to write this regular expression?

Path csiph.com!usenet.pasdenom.info!gegeweb.org!de-l.enfer-du-nord.net!feeder2.enfer-du-nord.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail
Return-Path <ian.g.kelly@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.019
X-Spam-Evidence '*H*': 0.96; '*S*': 0.00; 'correspond': 0.07; 'python': 0.08; 'basic,': 0.09; 'hierarchical': 0.09; 'manipulate': 0.09; 'received:mail-lpp01m010-f46.google.com': 0.09; 'tags,': 0.09; 'output': 0.10; 'meaningful': 0.13; 'res': 0.16; 'subject:expression': 0.16; 'subject:regular': 0.16; 'cc:addr:python-list': 0.16; 'looked': 0.16; 'wrote:': 0.18; 'cheers,': 0.20; 'cc:no real name:2**0': 0.21; 'header:In-Reply- To:1': 0.22; '(or': 0.22; 'converts': 0.23; 'extract': 0.24; 'structure': 0.26; 'cc:2**0': 0.26; 'message-id:@mail.gmail.com': 0.29; 'alternatives': 0.29; 'cc:addr:python.org': 0.29; 'pm,': 0.29; 'use?': 0.30; 'pretty': 0.31; 'subject:?': 0.31; 'actually': 0.31; 'received:209.85.215.46': 0.32; 'sufficient': 0.32; 'actual': 0.32; 'tue,': 0.32; 'idea': 0.32; 'there': 0.33; 'anything': 0.34; 'subject:What': 0.34; 'something': 0.35; 'received:google.com': 0.37; 'received:209.85': 0.38; 'allows': 0.38; 'could': 0.38; 'should': 0.38; 'data': 0.38; 'easier': 0.38; 'received:209.85.215': 0.39; 'received:209': 0.39; 'called': 0.40; 'john': 0.61; 'header:Received:6': 0.61; 'your': 0.61; 'forward': 0.63; 'guarantee': 0.66; 'subject:best': 0.67; 'news,': 0.73; 'subject:this': 0.74; 'stream': 0.77; 'messed': 0.84; 'subject:write': 0.84; 'that)': 0.84; 'from.': 0.93
Received-SPF pass (google.com: domain of ian.g.kelly@gmail.com designates 10.112.85.199 as permitted sender) client-ip=10.112.85.199;
Authentication-Results mr.google.com; spf=pass (google.com: domain of ian.g.kelly@gmail.com designates 10.112.85.199 as permitted sender) smtp.mail=ian.g.kelly@gmail.com; dkim=pass header.i=ian.g.kelly@gmail.com
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=R4eq0gGEIF3/WvsHn8RshWRGUaoKLQWjQn/iIL0edRk=; b=Ex46qJfSgNenG8w8kmOHQCe/XYogfxfzS2Hoa8ibNu8mf3xm9eBgJzqCk3NA6ECfPF j1ocgIsCeaCQLA/2udP4l9uhr/ru2OUMtZEK6VSjrcRzZGRET50zAjEQ0OyW6XYJ46pP mMnvu0F+qfjF+Q4+pL3BVYy8JZIWXqbZikkvMuU3hdDeniPZEwnj3uOsjMEExhZpR8Gm vII7vkgBye8JqLeWiIVkjC1VsJBdgM/c/ikbcZr0qw1jjrSkt4bg6sFFe+PXxhaiRgl+ zsPnqM3qhbvACvYP2aB179p6fVObXOCh7w3Ex3/P5VIaY6sQbYtfsz7Ls/wEAbLuF+4a e6iA==
MIME-Version 1.0
In-Reply-To <28285433.1413.1331075139309.JavaMail.geo-discussion-forums@ynbq18>
References <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4> <mailman.442.1331074333.3037.python-list@python.org> <mailman.443.1331074966.3037.python-list@python.org> <28285433.1413.1331075139309.JavaMail.geo-discussion-forums@ynbq18>
From Ian Kelly <ian.g.kelly@gmail.com>
Date Tue, 6 Mar 2012 16:35:32 -0700
Subject Re: What's the best way to write this regular expression?
To John Salerno <johnjsal@gmail.com>
Content-Type text/plain; charset=ISO-8859-1
Cc python-list@python.org
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.12
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.447.1331076963.3037.python-list@python.org> (permalink)
Lines 14
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1331076963 news.xs4all.nl 6985 [2001:888:2000:d::a6]:50640
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:21289

Show key headers only | View raw


On Tue, Mar 6, 2012 at 4:05 PM, John Salerno <johnjsal@gmail.com> wrote:
>> Anything that allows me NOT to use REs is welcome news, so I look forward to learning about something new! :)
>
> I should ask though...are there alternatives already bundled with Python that I could use? Now that you mention it, I remember something called HTMLParser (or something like that) and I have no idea why I never looked into that before I messed with REs.

HTMLParser is pretty basic, although it may be sufficient for your
needs.  It just converts an html document into a stream of start tags,
end tags, and text, with no guarantee that the tags will actually
correspond in any meaningful way.  lxml can be used to output an
actual hierarchical structure that may be easier to manipulate and
extract data from.

Cheers,
Ian

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 14:43 -0800
  Re: What's the best way to write this regular expression? Chris Rebert <clp2@rebertia.com> - 2012-03-06 14:52 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:25 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-06 16:35 -0700
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 17:39 -0600
        Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:04 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-06 23:44 +0000
          Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:57 -0800
            RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 00:04 +0000
            Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:06 -0500
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
  Re: What's the best way to write this regular expression? Roy Smith <roy@panix.com> - 2012-03-06 20:26 -0500
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 23:02 -0800
    Re: What's the best way to write this regular expression? Paul Rubin <no.email@nospam.invalid> - 2012-03-07 02:36 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 12:39 -0800
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:01 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:11 -0600
      Re: What's the best way to write this regular expression? alex23 <wuwei23@gmail.com> - 2012-03-08 19:38 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 19:52 -0800
    Re: What's the best way to write this regular expression? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-03-07 16:27 -0500
    RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 21:31 +0000
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:34 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:44 -0600
    Re: RE: What's the best way to write this regular expression? Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-07 16:02 -0600
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:26 -0800
    Re: What's the best way to write this regular expression? Chris Angelico <rosuav@gmail.com> - 2012-03-08 16:03 +1100
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:25 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:33 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:40 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:52 -0800
      Re: What's the best way to write this regular expression? John Gordon <gordon@panix.com> - 2012-03-08 21:54 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 17:19 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 16:25 -0600
      RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-08 23:02 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 18:23 -0500
      Re: What's the best way to write this regular expression? Ethan Furman <ethan@stoneleaf.us> - 2012-03-08 14:52 -0800
        Re: What's the best way to write this regular expression? jkn <jkn_gg@nicorp.f9.co.uk> - 2012-03-09 02:45 -0800

csiph-web