Groups | Search | Server Info | Keyboard shortcuts | Login | Register


Groups > comp.text.xml > #868

Re: Parsing XML pages

From Peter Flynn <peter@silmaril.ie>
Newsgroups comp.text.xml
Subject Re: Parsing XML pages
Date 2016-08-01 13:30 +0100
Organization Silmaril Consultants
Message-ID <e08tmuF72r5U1@mid.individual.net> (permalink)
References <dab6c93a-17f7-4edb-bad5-213db52ee0fd@googlegroups.com>

Show all headers | View raw


On 24/07/16 03:59, paolopiace@gmail.com wrote:
> This url
> 
> http://finance.yahoo.com/quote/GE/history?period1=0&period2=1469170800&interval=div|split&filter=split&frequency=1d
> 
> outputs a page which at its bottom has this content:
> 
> https://1drv.ms/i/s!AhvJcZiY8TTdhWx_35S5R2hZ99BX
> 
> I save the source page html and search some strings in it.
> I search "3/1", "Stock Split", "May 16, 1994" and so on.
> 
> Well, nothing like this is in the source page!

Unsurprising, given that it's financial.

> Where the hell are those info? 

They are being inserted in real time from an external source, probably
via Javascript.

> If I see them on the browser, they must be stored somewhere. 

Or they might be being calculated on-the-fly, from *data* stored elsewhere.

> If not in the html source page, where are they?

They have been deliberately obfuscated so that you can't steal them.

> May I have some directions, please?

Use a browser which has an Inspection mode. Right-click one of the
values and look at the pseudo-HTML:

<td class="Ta(c) Py(10px)" colspan="5"
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1"><strong
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.0">3/1</strong><span
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.1">
</span><span
data-reactid=".1kvth1ckyua.1.$0.0.0.3.1.$main-0-Quote-Proxy.$main-0-Quote.0.2.0.2.$history-table.1.$0.1.2">Stock
Split</span></td>

etc. Now go find the code which recognises this, unscramble it, and find
out what machine it's coming from. Then break into the machine to get at
the source data (just kidding, NSA :-)

Good luck...

///Peter

Back to comp.text.xml | Previous | NextPrevious in thread | Next in thread | Find similar


Thread

Parsing XML pages paolopiace@gmail.com - 2016-07-23 19:59 -0700
  Re: Parsing XML pages Peter Flynn <peter@silmaril.ie> - 2016-08-01 13:30 +0100
  Re: Parsing XML pages Luuk <luuk@invalid.lan> - 2016-08-06 11:43 +0200

csiph-web