Path: csiph.com!weretis.net!feeder4.news.weretis.net!storethat.news.telefonica.de!feedme.news.telefonica.de!telefonica.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail From: Peter Flynn Newsgroups: comp.text.xml Subject: Re: Parsing XML pages Date: Mon, 1 Aug 2016 13:30:22 +0100 Organization: Silmaril Consultants Lines: 50 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-Trace: individual.net jw26XWarToJ3YFvOo8jibAiQyARDnyhlPvrjJgxEz/gu7URZBt Cancel-Lock: sha1:31v5ebrnu5XUEFlNfsXw7Uqth5s= User-Agent: Mozilla/5.0 (X11; Linux i686; rv:38.0) Gecko/20100101 Thunderbird/38.8.0 In-Reply-To: Xref: csiph.com comp.text.xml:868 On 24/07/16 03:59, paolopiace@gmail.com wrote: > This url > > http://finance.yahoo.com/quote/GE/history?period1=0&period2=1469170800&interval=div|split&filter=split&frequency=1d > > outputs a page which at its bottom has this content: > > https://1drv.ms/i/s!AhvJcZiY8TTdhWx_35S5R2hZ99BX > > I save the source page html and search some strings in it. > I search "3/1", "Stock Split", "May 16, 1994" and so on. > > Well, nothing like this is in the source page! Unsurprising, given that it's financial. > Where the hell are those info? They are being inserted in real time from an external source, probably via Javascript. > If I see them on the browser, they must be stored somewhere. Or they might be being calculated on-the-fly, from *data* stored elsewhere. > If not in the html source page, where are they? They have been deliberately obfuscated so that you can't steal them. > May I have some directions, please? Use a browser which has an Inspection mode. Right-click one of the values and look at the pseudo-HTML: 3/1 Stock Split etc. Now go find the code which recognises this, unscramble it, and find out what machine it's coming from. Then break into the machine to get at the source data (just kidding, NSA :-) Good luck... ///Peter