Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #91131 > unrolled thread

Re: Extract email address from Java script in html source using python

Started bysavitha devi <savithad8@gmail.com>
First post2015-05-23 19:45 +0530
Last post2015-05-23 19:45 +0530
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Extract email address from Java script in html source using python savitha devi <savithad8@gmail.com> - 2015-05-23 19:45 +0530

#91131 — Re: Extract email address from Java script in html source using python

Fromsavitha devi <savithad8@gmail.com>
Date2015-05-23 19:45 +0530
SubjectRe: Extract email address from Java script in html source using python
Message-ID<mailman.276.1432390523.17265.python-list@python.org>

[Multipart message — attachments visible in raw view] — view raw

What I exactly want is the java script is in the html code. I am trying for
a regular expression to find the email address embedded with in the java
script.

On Sat, May 23, 2015 at 2:31 PM, Chris Angelico <rosuav@gmail.com> wrote:

> On Sat, May 23, 2015 at 4:46 PM, savitha devi <savithad8@gmail.com> wrote:
> > I am developing a web scraper code using HTMLParser. I need to extract
> > text/email address from java script with in the HTMLCode.I am beginner
> level
> > in python coding and totally lost here. Need some help on this. The java
> > script code is as below:
> >
> > <script type='text/javascript'>
> >  //<!--
> >  document.getElementById('cloak48218').innerHTML = '';
> >  var prefix = '&#109;a' + 'i&#108;' + '&#116;o';
> >  var path = 'hr' + 'ef' + '=';
> >  var addy48218 = '&#105;nf&#111;' + '&#64;';
> >  addy48218 = addy48218 + 'tsv-n&#101;&#117;r&#105;&#101;d' + '&#46;' +
> > 'd&#101;';
> >  document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' +
> > prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>';
> >  //-->
>
> This is deliberately being done to prevent scripted usage. What
> exactly are you needing to do this for?
>
> You're basically going to have to execute the entire block of
> JavaScript code, and then decode the entities to get to what you want.
> Doing it manually is pretty easy; doing it automatically will
> virtually require a language interpreter.
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web