Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #21313
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: What's the best way to write this regular expression? |
| References | <12783654.1174.1331073814011.JavaMail.geo-discussion-forums@yner4> <0c1a1890-dc80-41b6-abea-f90324dd7d75@2g2000yqk.googlegroups.com> |
| Date | 2012-03-07 02:36 -0800 |
| Message-ID | <7x7gywofzh.fsf@ruckus.brouhaha.com> (permalink) |
| Organization | Nightsong/Fort GNOX |
John Salerno <johnjsal@gmail.com> writes: > The Beautiful Soup 4 documentation was very clear, and BS4 itself is > so simple and Pythonic. And best of all, since version 4 no longer > does the parsing itself, you can choose your own parser, and it works > with lxml, so I'll still be using lxml, but with a nice, clean overlay > for navigating the tree structure. I haven't used BS4 but have made good use of earlier versions. Main thing to understand is that an awful lot of HTML in the real world is malformed and will break an XML parser or anything that expects syntactically invalid HTML. People tend to write HTML that works well enough to render decently in browsers, whose parsers therefore have to be tolerant of bad errors. Beautiful Soup also tries to make sense of crappy, malformed, HTML. Partly as a result, it's dog slow compared to any serious XML parser. But it works very well if you don't mind the low speed.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 14:43 -0800
Re: What's the best way to write this regular expression? Chris Rebert <clp2@rebertia.com> - 2012-03-06 14:52 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:25 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:33 -0800
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-06 16:35 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 17:39 -0600
Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:04 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:05 -0800
Re: What's the best way to write this regular expression? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-03-06 23:44 +0000
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:57 -0800
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 00:04 +0000
Re: What's the best way to write this regular expression? Terry Reedy <tjreedy@udel.edu> - 2012-03-06 20:06 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 15:02 -0800
Re: What's the best way to write this regular expression? Roy Smith <roy@panix.com> - 2012-03-06 20:26 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-06 23:02 -0800
Re: What's the best way to write this regular expression? Paul Rubin <no.email@nospam.invalid> - 2012-03-07 02:36 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 12:39 -0800
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:01 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:11 -0600
Re: What's the best way to write this regular expression? alex23 <wuwei23@gmail.com> - 2012-03-08 19:38 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 19:52 -0800
Re: What's the best way to write this regular expression? Benjamin Kaplan <benjamin.kaplan@case.edu> - 2012-03-07 16:27 -0500
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-07 21:31 +0000
Re: What's the best way to write this regular expression? Ian Kelly <ian.g.kelly@gmail.com> - 2012-03-07 14:34 -0700
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 15:44 -0600
Re: RE: What's the best way to write this regular expression? Evan Driscoll <driscoll@cs.wisc.edu> - 2012-03-07 16:02 -0600
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:26 -0800
Re: What's the best way to write this regular expression? Chris Angelico <rosuav@gmail.com> - 2012-03-08 16:03 +1100
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-07 23:25 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:33 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:40 -0800
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 13:52 -0800
Re: What's the best way to write this regular expression? John Gordon <gordon@panix.com> - 2012-03-08 21:54 +0000
Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 17:19 -0500
Re: What's the best way to write this regular expression? John Salerno <johnjsal@gmail.com> - 2012-03-08 16:25 -0600
RE: What's the best way to write this regular expression? "Prasad, Ramit" <ramit.prasad@jpmorgan.com> - 2012-03-08 23:02 +0000
Re: What's the best way to write this regular expression? Dave Angel <d@davea.name> - 2012-03-08 18:23 -0500
Re: What's the best way to write this regular expression? Ethan Furman <ethan@stoneleaf.us> - 2012-03-08 14:52 -0800
Re: What's the best way to write this regular expression? jkn <jkn_gg@nicorp.f9.co.uk> - 2012-03-09 02:45 -0800
csiph-web