Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.softwaretools > #69 > unrolled thread
| Started by | Hollow Quincy <hollow.quincy@gmail.com> |
|---|---|
| First post | 2011-11-07 08:19 -0800 |
| Last post | 2011-11-07 08:19 -0800 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.java.softwaretools
Apache Nutch orygina html Hollow Quincy <hollow.quincy@gmail.com> - 2011-11-07 08:19 -0800
| From | Hollow Quincy <hollow.quincy@gmail.com> |
|---|---|
| Date | 2011-11-07 08:19 -0800 |
| Subject | Apache Nutch orygina html |
| Message-ID | <136102cb-c1ab-45dc-91e8-27af37603857@g1g2000vbd.googlegroups.com> |
Hi, I installed Apache Nutch (http://nutch.apache.org/) on my Linux and I did everything that was written in tutorial: http://wiki.apache.org/nutch/NutchTutorial I crawled some pages, writting command: bin/nutch crawl urls -dir crawl -depth 3 -topN 5 after some minutes I had statement that it has been finished. I would like to get oryginal html crawled files from files of Apache Nutch. How to get oryginal content from Apache Nutch ? Thank you for help
Back to top | Article view | comp.lang.java.softwaretools
csiph-web