Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #24441

RE: vBulletin scraper -- feasible?

From Nick Cash <nick.cash@npcinternational.com>
Subject RE: vBulletin scraper -- feasible?
Date 2012-06-25 19:44 +0000
References <jsad2l$2eu$1@dont-email.me>
Newsgroups comp.lang.python
Message-ID <mailman.1495.1340654376.4697.python-list@python.org> (permalink)

Show all headers | View raw


You may want to look into http://www.crummy.com/software/BeautifulSoup/
It's made for parsing (potentially bad) HTML, and is quite easy to use. I'd say it's quite feasible.

Thanks,
Nick Cash
NPC International

-----Original Message-----
From: python-list-bounces+nick.cash=npcinternational.com@python.org [mailto:python-list-bounces+nick.cash=npcinternational.com@python.org] On Behalf Of Andrew D'Angelo
Sent: Monday, June 25, 2012 14:10
To: python-list@python.org
Subject: vBulletin scraper -- feasible?

Taking a look through vBulletin's HTML, I was wondering whether it would be overly difficult to parse it into nice, manipulatible data.
I'd suppose my ultimate goal would be to dynamically parse a vBulletin and feed it into a locally hosted NNTP server. 


--
http://mail.python.org/mailman/listinfo/python-list

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

vBulletin scraper -- feasible? "Andrew D'Angelo" <excel@pharcyde.org> - 2012-06-25 14:10 -0500
  RE: vBulletin scraper -- feasible? Nick Cash <nick.cash@npcinternational.com> - 2012-06-25 19:44 +0000
    Re: vBulletin scraper -- feasible? "Andrew D'Angelo" <excel@pharcyde.org> - 2012-06-26 10:20 -0500

csiph-web