Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.albasani.net!feeder.news-service.com!94.75.214.39.MISMATCH!aioe.org!.POSTED!not-for-mail From: Dr J R Stockton Newsgroups: comp.lang.java.programmer Subject: Re: JavaScript and Screenscraping Date: Sun, 3 Apr 2011 17:27:03 +0100 Organization: Home Lines: 29 Message-ID: References: NNTP-Posting-Host: JfH6RRPWQh8XzArFI0xgxA.user.speranza.aioe.org Mime-Version: 1.0 Content-Type: text/plain;charset=us-ascii X-Complaints-To: abuse@aioe.org User-Agent: Turnpike/6.05-S () X-Notice: Filtered by postfilter v. 0.8.2 Xref: x330-a1.tempe.blueboxinc.net comp.lang.java.programmer:2817 In comp.lang.java.programmer message , Fri, 1 Apr 2011 20:00:27, Roedy Green posted: >On Fri, 1 Apr 2011 23:39:32 +0100, Dr J R Stockton > wrote, quoted or indirectly quoted >someone who said : > >>But JavaScript used as you describe does not necessarily generate HTML, >>but can manipulate the DOM tree directly. >> >>Or are you thinking of server-side scripting with .php? > >I am just trying to go to motherboard manufacturer websites and >collect specs from the webpages. The webpages often contain a lot of >Javascript. The data does not appear in any form. Presumably the Java >script loads more Java script or resources then formats it. Probably but not entirely presumably; if using an iframe, there could be no need for reformatting. Given a URL or two as examples, and a clear indication of what is to be scraped, one might be able to understand the situation better. -- (c) John Stockton, nr London, UK. ?@merlyn.demon.co.uk Turnpike v6.05. Website - w. FAQish topics, links, acronyms PAS EXE etc. : - see in 00index.htm Dates - miscdate.htm estrdate.htm js-dates.htm pas-time.htm critdate.htm etc.