Path: csiph.com!x330-a1.tempe.blueboxinc.net!newsfeed.hal-mli.net!feeder1.hal-mli.net!weretis.net!feeder5.news.weretis.net!feeder.news-service.com!94.75.214.39.MISMATCH!aioe.org!.POSTED!not-for-mail From: Roedy Green Newsgroups: comp.lang.java.programmer Subject: JavaScript and Screenscraping Date: Wed, 30 Mar 2011 06:51:29 -0700 Organization: Canadian Mind Products Lines: 24 Message-ID: Reply-To: Roedy Green NNTP-Posting-Host: RCd/Ul4tyxGUBII8WGwa5g.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Complaints-To: abuse@aioe.org X-Notice: Filtered by postfilter v. 0.8.2 X-Newsreader: Forte Agent 6.00/32.1186 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:2588 I am working on a screenscraping project that is turning out to much more time-consuming that I thought it would be. I am trying to gather a database of information about all the motherboards sold my major manufacturers. The idea is to eventually create a comparison shopper to help you narrow down models that fit your needs. Oddly motherboard manufacturers don't use a database and generate their specification pages. These are all hand-compiled with theme and a dozen variations on every field. This is can handle. However, Asus decided to obfuscate their web pages with JavaScript. There are no data on them. I wondered if there exists a tool that is like browser in that it will read a page and render the JavaScript, but unlike a browser, it would not show the information on the screen, just dump the generated HTML or raw text and accept a script of pages to analyse. -- Roedy Green Canadian Mind Products http://mindprod.com There are only two industries that refer to their customers as "users". ~ Edward Tufte