Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.softwaretools > #69 > unrolled thread

Apache Nutch orygina html

Started byHollow Quincy <hollow.quincy@gmail.com>
First post2011-11-07 08:19 -0800
Last post2011-11-07 08:19 -0800
Articles 1 — 1 participant

Back to article view | Back to comp.lang.java.softwaretools


Contents

  Apache Nutch orygina html Hollow Quincy <hollow.quincy@gmail.com> - 2011-11-07 08:19 -0800

#69 — Apache Nutch orygina html

FromHollow Quincy <hollow.quincy@gmail.com>
Date2011-11-07 08:19 -0800
SubjectApache Nutch orygina html
Message-ID<136102cb-c1ab-45dc-91e8-27af37603857@g1g2000vbd.googlegroups.com>
Hi,

I installed Apache Nutch (http://nutch.apache.org/) on my Linux and I
did everything that was written in tutorial:
http://wiki.apache.org/nutch/NutchTutorial
I crawled some pages, writting command:
bin/nutch crawl urls -dir crawl -depth 3 -topN 5
after some minutes I had statement that it has been finished.

I would like to get oryginal html crawled files from files of Apache
Nutch.
How to get oryginal content from Apache Nutch ?

Thank you for help

[toc] | [standalone]


Back to top | Article view | comp.lang.java.softwaretools


csiph-web