Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #91107
| References | <CAAXuHoeJ-YMQDB85qLDJ_o+9CrsfwLvm9wuOaRtbSj-i9kBaFA@mail.gmail.com> |
|---|---|
| Date | 2015-05-23 19:01 +1000 |
| Subject | Re: Extract email address from Java script in html source using python |
| From | Chris Angelico <rosuav@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.267.1432371718.17265.python-list@python.org> (permalink) |
On Sat, May 23, 2015 at 4:46 PM, savitha devi <savithad8@gmail.com> wrote:
> I am developing a web scraper code using HTMLParser. I need to extract
> text/email address from java script with in the HTMLCode.I am beginner level
> in python coding and totally lost here. Need some help on this. The java
> script code is as below:
>
> <script type='text/javascript'>
> //<!--
> document.getElementById('cloak48218').innerHTML = '';
> var prefix = 'ma' + 'il' + 'to';
> var path = 'hr' + 'ef' + '=';
> var addy48218 = 'info' + '@';
> addy48218 = addy48218 + 'tsv-neuried' + '.' +
> 'de';
> document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' +
> prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>';
> //-->
This is deliberately being done to prevent scripted usage. What
exactly are you needing to do this for?
You're basically going to have to execute the entire block of
JavaScript code, and then decode the entities to get to what you want.
Doing it manually is pretty easy; doing it automatically will
virtually require a language interpreter.
ChrisA
Back to comp.lang.python | Previous | Next — Next in thread | Find similar | Unroll thread
Re: Extract email address from Java script in html source using python Chris Angelico <rosuav@gmail.com> - 2015-05-23 19:01 +1000
Re: Extract email address from Java script in html source using python Steve Hayes <hayesstw@telkomsa.net> - 2015-05-24 03:04 +0200
Re: Extract email address from Java script in html source using python VanguardLH <V@nguard.LH> - 2015-05-24 01:17 -0500
csiph-web