Re: Parsing XML pages

From	Peter Flynn <peter@silmaril.ie>
Newsgroups	comp.text.xml
Subject	Re: Parsing XML pages
Date	2016-08-01 13:30 +0100
Organization	Silmaril Consultants
Message-ID	<e08tmuF72r5U1@mid.individual.net> (permalink)
References	<dab6c93a-17f7-4edb-bad5-213db52ee0fd@googlegroups.com>

Show all headers | View raw

On 24/07/16 03:59, paolopiace@gmail.com wrote:
> This url
> 
> http://finance.yahoo.com/quote/GE/history?period1=0&period2=1469170800&interval=div|split&filter=split&frequency=1d
> 
> outputs a page which at its bottom has this content:
> 
> https://1drv.ms/i/s!AhvJcZiY8TTdhWx_35S5R2hZ99BX
> 
> I save the source page html and search some strings in it.
> I search "3/1", "Stock Split", "May 16, 1994" and so on.
> 
> Well, nothing like this is in the source page!

Unsurprising, given that it's financial.

> Where the hell are those info? 

They are being inserted in real time from an external source, probably
via Javascript.

> If I see them on the browser, they must be stored somewhere. 

Or they might be being calculated on-the-fly, from *data* stored elsewhere.

> If not in the html source page, where are they?

They have been deliberately obfuscated so that you can't steal them.

> May I have some directions, please?

Use a browser which has an Inspection mode. Right-click one of the
values and look at the pseudo-HTML:

<td class="Ta(c) Py(10px)" colspan="5"
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1"><strong
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.0">3/1</strong><span
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.1">
</span><span
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.2">Stock
Split</span></td>

etc. Now go find the code which recognises this, unscramble it, and find
out what machine it's coming from. Then break into the machine to get at
the source data (just kidding, NSA :-)

Good luck...

///Peter

Back to comp.text.xml | Previous | Next — Previous in thread | Next in thread | Find similar

Thread

Parsing XML pages paolopiace@gmail.com - 2016-07-23 19:59 -0700
  Re: Parsing XML pages Peter Flynn <peter@silmaril.ie> - 2016-08-01 13:30 +0100
  Re: Parsing XML pages Luuk <luuk@invalid.lan> - 2016-08-06 11:43 +0200

csiph-web