Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #99652
| From | Paul Rubin <no.email@nospam.invalid> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Find relative url in mixed text/html |
| Date | 2015-11-27 21:11 -0800 |
| Organization | A noiseless patient Spider |
| Message-ID | <8737vqyag1.fsf@jester.gateway.pace.com> (permalink) |
| References | <mailman.182.1448678122.20593.python-list@python.org> |
Rob Hills <rhills@medimorphosis.com.au> writes: > Note, in the beginning of this project, I looked at using "Beautiful > Soup" but my reading and limited testing lead me to believe that it is > designed for well-formed HTML/XML and therefore was unsuitable for the > text/html soup I have. If that belief is incorrect, I'd be grateful for > general tips about using Beautiful Soup in this scenario... Beautiful Soup can deal with badly formed HTML pretty well, or at least it could in earlier versions. It gives you several different parsing options to choose from now. I think the default is lxml which is fast but maybe more strict. Check what the others are and see if a loose slow one is still there. It really is pretty slow so plan on a big computation task if you're converting a large forum. phpBB gets a bad rap that's maybe well-deserved but I don't know what to suggest instead.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-28 10:35 +0800
Re: Find relative url in mixed text/html Paul Rubin <no.email@nospam.invalid> - 2015-11-27 21:11 -0800
Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 00:25 +0800
Re: Find relative url in mixed text/html Laura Creighton <lac@openend.se> - 2015-11-28 18:04 +0100
Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 01:40 +0800
Re: Find relative url in mixed text/html Paul Rubin <no.email@nospam.invalid> - 2015-11-28 10:10 -0800
Re: Find relative url in mixed text/html Grobu <snailcoder@retrosite.invalid> - 2015-11-28 08:07 +0100
Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 01:44 +0800
csiph-web