Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #6107
| From | Bent C Dalager <bcd@pvv.ntnu.no> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Looking for Java web crawler api |
| Date | 2011-07-12 09:44 +0000 |
| Organization | Norwegian university of science and technology |
| Message-ID | <slrnj1o5s6.e9i.bcd@microbel.pvv.ntnu.no> (permalink) |
| References | <4e1bf464$0$314$14726298@news.sunsite.dk> |
I found JSoup (jsoup.org) to be a fine library for web scraping. It
lets you easily set cookies and headers, fetches the URL for you, and
converts the tangled mess of HTML you tend to receive into a
well-formed XML document model.
Cheers,
Bent D.
--
Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd
powered by emacs
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar
Looking for Java web crawler api pm <el_durango@yah00.c0m> - 2011-07-12 07:14 +0000
Re: Looking for Java web crawler api Bent C Dalager <bcd@pvv.ntnu.no> - 2011-07-12 09:44 +0000
Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-13 06:12 +0000
Re: Looking for Java web crawler api Roedy Green <see_website@mindprod.com.invalid> - 2011-07-14 12:52 -0700
Re: Looking for Java web crawler api iadb <freeinternetarticles@gmail.com> - 2011-07-18 16:06 -0700
Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-21 05:49 +0000
Re: Looking for Java web crawler api Arne Vajhøj <arne@vajhoej.dk> - 2011-07-21 17:10 -0400
csiph-web