Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #6150

Re: Looking for Java web crawler api

Path csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!xlned.com!feeder7.xlned.com!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!news.tele.dk!news.tele.dk!small.news.tele.dk!feed118.news.tele.dk!dotsrc.org!filter.dotsrc.org!news.dotsrc.org!not-for-mail
From Durango2011 <el_durango@yah00.c0m>
Subject Re: Looking for Java web crawler api
Newsgroups comp.lang.java.programmer
References <4e1bf464$0$314$14726298@news.sunsite.dk> <slrnj1o5s6.e9i.bcd@microbel.pvv.ntnu.no>
User-Agent Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT 30dc37b master)
MIME-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 8bit
Date 13 Jul 2011 06:12:20 GMT
Lines 11
Message-ID <4e1d3744$0$312$14726298@news.sunsite.dk> (permalink)
Organization SunSITE.dk - Supporting Open source
NNTP-Posting-Host 66.229.12.209
X-Trace news.sunsite.dk DXC=TX]Wo<9la@1RV[Kf06ReB9YSB=nbEKnk;0F39eVnbh>2;L11_OIPWI9F9e3NA7Te_65[f9[bmol[?PP@?gfKK362]^<>^jnVP>7
X-Complaints-To staff@sunsite.dk
Xref x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:6150

Show key headers only | View raw


On Tue, 12 Jul 2011 09:44:38 +0000, Bent C Dalager wrote:

> I found JSoup (jsoup.org) to be a fine library for web scraping. It lets
> you easily set cookies and headers, fetches the URL for you, and
> converts the tangled mess of HTML you tend to receive into a well-formed
> XML document model.
> 
> Cheers,
> 	Bent D.

Thank you very much that looks like what I am looking for.

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Looking for Java web crawler api pm <el_durango@yah00.c0m> - 2011-07-12 07:14 +0000
  Re: Looking for Java web crawler api Bent C Dalager <bcd@pvv.ntnu.no> - 2011-07-12 09:44 +0000
    Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-13 06:12 +0000
  Re: Looking for Java web crawler api Roedy Green <see_website@mindprod.com.invalid> - 2011-07-14 12:52 -0700
  Re: Looking for Java web crawler api iadb <freeinternetarticles@gmail.com> - 2011-07-18 16:06 -0700
  Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-21 05:49 +0000
  Re: Looking for Java web crawler api Arne Vajhøj <arne@vajhoej.dk> - 2011-07-21 17:10 -0400

csiph-web