Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #21313

Re: What's the best way to write this regular expression?

From Paul Rubin <no.email@nospam.invalid>
Newsgroups comp.lang.python
Subject Re: What's the best way to write this regular expression?
References <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4> <0c1a1890-dc80-41b6-abea-f90324dd7d75@2g2000yqk.googlegroups.com>
Date 2012-03-07 02:36 -0800
Message-ID <7x7gywofzh.fsf@ruckus.brouhaha.com> (permalink)
Organization Nightsong/Fort GNOX

Show all headers | View raw


John Salerno <johnjsal@gmail.com> writes:
> The Beautiful Soup 4 documentation was very clear, and BS4 itself is
> so simple and Pythonic. And best of all, since version 4 no longer
> does the parsing itself, you can choose your own parser, and it works
> with lxml, so I'll still be using lxml, but with a nice, clean overlay
> for navigating the tree structure.

I haven't used BS4 but have made good use of earlier versions.

Main thing to understand is that an awful lot of HTML in the real world
is malformed and will break an XML parser or anything that expects
syntactically invalid HTML.  People tend to write HTML that works well
enough to render decently in browsers, whose parsers therefore have to
be tolerant of bad errors.  Beautiful Soup also tries to make sense of
crappy, malformed, HTML.  Partly as a result, it's dog slow compared to
any serious XML parser.  But it works very well if you don't mind the
low speed.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 14:43 -0800
  Re: What's the best way to write this regular expression? Chris Rebert <clp2@rebertia.com> - 2012-03-06 14:52 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:25 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
        Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-06 16:35 -0700
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 17:39 -0600
        Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:04 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
        Re: What's the best way to write this regular expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-06 23:44 +0000
          Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:57 -0800
            RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 00:04 +0000
            Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:06 -0500
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
  Re: What's the best way to write this regular expression? Roy Smith <roy@panix.com> - 2012-03-06 20:26 -0500
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 23:02 -0800
    Re: What's the best way to write this regular expression? Paul Rubin <no.email@nospam.invalid> - 2012-03-07 02:36 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 12:39 -0800
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:01 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:11 -0600
      Re: What's the best way to write this regular expression? alex23 <wuwei23@gmail.com> - 2012-03-08 19:38 -0800
        Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 19:52 -0800
    Re: What's the best way to write this regular expression? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-03-07 16:27 -0500
    RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 21:31 +0000
    Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:34 -0700
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:44 -0600
    Re: RE: What's the best way to write this regular expression? Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-07 16:02 -0600
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:26 -0800
    Re: What's the best way to write this regular expression? Chris Angelico <rosuav@gmail.com> - 2012-03-08 16:03 +1100
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:25 -0800
  Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:33 -0800
    Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:40 -0800
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:52 -0800
      Re: What's the best way to write this regular expression? John Gordon <gordon@panix.com> - 2012-03-08 21:54 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 17:19 -0500
      Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 16:25 -0600
      RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-08 23:02 +0000
      Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 18:23 -0500
      Re: What's the best way to write this regular expression? Ethan Furman <ethan@stoneleaf.us> - 2012-03-08 14:52 -0800
        Re: What's the best way to write this regular expression? jkn <jkn_gg@nicorp.f9.co.uk> - 2012-03-09 02:45 -0800

csiph-web