Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #6107

Re: Looking for Java web crawler api

From Bent C Dalager <bcd@pvv.ntnu.no>
Newsgroups comp.lang.java.programmer
Subject Re: Looking for Java web crawler api
Date 2011-07-12 09:44 +0000
Organization Norwegian university of science and technology
Message-ID <slrnj1o5s6.e9i.bcd@microbel.pvv.ntnu.no> (permalink)
References <4e1bf464$0$314$14726298@news.sunsite.dk>

Show all headers | View raw


I found JSoup (jsoup.org) to be a fine library for web scraping. It
lets you easily set cookies and headers, fetches the URL for you, and
converts the tangled mess of HTML you tend to receive into a
well-formed XML document model.

Cheers,
	Bent D.
-- 
Bent Dalager - bcd@pvv.org - http://www.pvv.org/~bcd
                                    powered by emacs

Back to comp.lang.java.programmer | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Looking for Java web crawler api pm <el_durango@yah00.c0m> - 2011-07-12 07:14 +0000
  Re: Looking for Java web crawler api Bent C Dalager <bcd@pvv.ntnu.no> - 2011-07-12 09:44 +0000
    Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-13 06:12 +0000
  Re: Looking for Java web crawler api Roedy Green <see_website@mindprod.com.invalid> - 2011-07-14 12:52 -0700
  Re: Looking for Java web crawler api iadb <freeinternetarticles@gmail.com> - 2011-07-18 16:06 -0700
  Re: Looking for Java web crawler api Durango2011 <el_durango@yah00.c0m> - 2011-07-21 05:49 +0000
  Re: Looking for Java web crawler api Arne Vajhøj <arne@vajhoej.dk> - 2011-07-21 17:10 -0400

csiph-web