Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!news.stack.nl!.POSTED!ipv6.urchin.earth.li!twic From: Tom Anderson Newsgroups: comp.lang.java.programmer Subject: Re: JavaScript and Screenscraping Date: Thu, 31 Mar 2011 00:28:57 +0100 Organization: Stack Usenet News Service Lines: 40 Message-ID: References: NNTP-Posting-Host: ipv6.urchin.earth.li Mime-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed X-Trace: mud.stack.nl 1301527737 12407 2001:ba8:0:1b4::6 (30 Mar 2011 23:28:57 GMT) X-Complaints-To: abuse@stack.nl NNTP-Posting-Date: Wed, 30 Mar 2011 23:28:57 +0000 (UTC) User-Agent: Alpine 2.00 (DEB 1167 2008-08-23) In-Reply-To: Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:2616 On Wed, 30 Mar 2011, Michal Kleczek wrote: > Roedy Green wrote: > >> I am working on a screenscraping project that is turning out to much >> more time-consuming that I thought it would be. I am trying to gather >> a database of information about all the motherboards sold my major >> manufacturers. The idea is to eventually create a comparison shopper >> to help you narrow down models that fit your needs. >> >> Oddly motherboard manufacturers don't use a database and generate >> their specification pages. These are all hand-compiled with theme and >> a dozen variations on every field. This is can handle. >> >> However, Asus decided to obfuscate their web pages with JavaScript. >> There are no data on them. >> >> I wondered if there exists a tool that is like browser in that it will >> read a page and render the JavaScript, but unlike a browser, it would >> not show the information on the screen, just dump the generated HTML >> or raw text and accept a script of pages to analyse. > > http://htmlunit.sourceforge.net/ Finally, someone else who knows about it! tom -- For the first few years I ate lunch with he mathematicians. I soon found that they were more interested in fun and games than in serious work, so I shifted to eating with the physics table. There I stayed for a number of years until the Nobel Prize, promotions, and offers from other companies, removed most of the interesting people. So I shifted to the corresponding chemistry table where I had a friend. At first I asked what were the important problems in chemistry, then what important problems they were working on, or problems that might lead to important results. One day I asked, "if what they were working on was not important, and was not likely to lead to important things, they why were they working on them?" After that I had to eat with the engineers! -- R. W. Hamming