Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #99652

Re: Find relative url in mixed text/html

From Paul Rubin <no.email@nospam.invalid>
Newsgroups comp.lang.python
Subject Re: Find relative url in mixed text/html
Date 2015-11-27 21:11 -0800
Organization A noiseless patient Spider
Message-ID <8737vqyag1.fsf@jester.gateway.pace.com> (permalink)
References <mailman.182.1448678122.20593.python-list@python.org>

Show all headers | View raw


Rob Hills <rhills@medimorphosis.com.au> writes:
> Note, in the beginning of this project, I looked at using "Beautiful
> Soup" but my reading and limited testing lead me to believe that it is
> designed for well-formed HTML/XML and therefore was unsuitable for the
> text/html soup I have.  If that belief is incorrect, I'd be grateful for
> general tips about using Beautiful Soup in this scenario...

Beautiful Soup can deal with badly formed HTML pretty well, or at least
it could in earlier versions.  It gives you several different parsing
options to choose from now.  I think the default is lxml which is fast
but maybe more strict.  Check what the others are and see if a loose
slow one is still there.  It really is pretty slow so plan on a big
computation task if you're converting a large forum.

phpBB gets a bad rap that's maybe well-deserved but I don't know what to
suggest instead.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-28 10:35 +0800
  Re: Find relative url in mixed text/html Paul Rubin <no.email@nospam.invalid> - 2015-11-27 21:11 -0800
    Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 00:25 +0800
    Re: Find relative url in mixed text/html Laura Creighton <lac@openend.se> - 2015-11-28 18:04 +0100
    Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 01:40 +0800
      Re: Find relative url in mixed text/html Paul Rubin <no.email@nospam.invalid> - 2015-11-28 10:10 -0800
  Re: Find relative url in mixed text/html Grobu <snailcoder@retrosite.invalid> - 2015-11-28 08:07 +0100
    Re: Find relative url in mixed text/html Rob Hills <rhills@medimorphosis.com.au> - 2015-11-29 01:44 +0800

csiph-web