Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #91107

Re: Extract email address from Java script in html source using python

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!newsfeed.datemas.de!feeder.erje.net!1.eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <rosuav@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.009
X-Spam-Evidence '*H*': 0.98; '*S*': 0.00; 'interpreter.': 0.07; 'subject:script': 0.09; 'subject:using': 0.09; 'cc:addr:python- list': 0.10; 'python': 0.11; 'subject:python': 0.14; '23,': 0.16; 'decode': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'subject:Java': 0.16; 'wrote:': 0.16; 'basically': 0.18; 'beginner': 0.18; 'language': 0.19; 'prevent': 0.20; 'cc:2**0': 0.21; 'cc:addr:python.org': 0.21; 'java': 0.22; 'code,': 0.23; '2015': 0.23; 'for?': 0.23; 'sat,': 0.23; 'header :In-Reply-To:1': 0.24; 'developing': 0.25; 'script': 0.25; 'coding': 0.27; 'message-id:@mail.gmail.com': 0.28; 'this.': 0.28; 'code': 0.31; 'extract': 0.33; 'received:google.com': 0.34; 'done': 0.35; 'being': 0.36; 'totally': 0.36; 'subject:: ': 0.37; 'level': 0.37; 'doing': 0.38; 'pm,': 0.39; 'some': 0.40; 'address': 0.61; 'entire': 0.61; 'here.': 0.61; 'needing': 0.63; 'virtually': 0.66; 'below:': 0.75; 'chrisa': 0.84; 'subject:source': 0.84; 'to:none': 0.90
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=nbtr9nXR9z9XSlXSXsfCCRQMihpNc7yi5Er7m0iT/3E=; b=LPU2HeJucKuVFkhPN/ISoFkXL4dY72B1KlpGIMq+3LBuq8ncDZRxtIFWKztZW1f+ih dLXlA3tDFC4TOlhaTH2mB2Hj9NUKdRViL9n+J5bgrTeNVSyXD5Zn6GpN+6Fi67EA1Nug KhtogTuxGYw9uzC13bN8IKYifKCdqfQEHy1sfLKwX9QGlAIDInOEpgxgcVmuoVoUX1kb q3/1rPJauuUaysATjpUWvdjkqIRrapkDXZoKvOVdkTf3i6ClpoZ1M9sjsBMtLD+WMBSj 3NoIl3XvI5JDR38UZpzKepsr5aboocXLmi310L9GH0fjk4HwN49iDvpndQaoy4VB7I/X re6w==
MIME-Version 1.0
X-Received by 10.43.0.67 with SMTP id nl3mr13489301icb.59.1432371715498; Sat, 23 May 2015 02:01:55 -0700 (PDT)
In-Reply-To <CAAXuHoeJ-YMQDB85qLDJ_o+9CrsfwLvm9wuOaRtbSj-i9kBaFA@mail.gmail.com>
References <CAAXuHoeJ-YMQDB85qLDJ_o+9CrsfwLvm9wuOaRtbSj-i9kBaFA@mail.gmail.com>
Date Sat, 23 May 2015 19:01:55 +1000
Subject Re: Extract email address from Java script in html source using python
From Chris Angelico <rosuav@gmail.com>
Cc "python-list@python.org" <python-list@python.org>
Content-Type text/plain; charset=UTF-8
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.20+
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.267.1432371718.17265.python-list@python.org> (permalink)
Lines 27
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1432371718 news.xs4all.nl 2878 [2001:888:2000:d::a6]:57855
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:91107

Show key headers only | View raw


On Sat, May 23, 2015 at 4:46 PM, savitha devi <savithad8@gmail.com> wrote:
> I am developing a web scraper code using HTMLParser. I need to extract
> text/email address from java script with in the HTMLCode.I am beginner level
> in python coding and totally lost here. Need some help on this. The java
> script code is as below:
>
> <script type='text/javascript'>
>  //<!--
>  document.getElementById('cloak48218').innerHTML = '';
>  var prefix = '&#109;a' + 'i&#108;' + '&#116;o';
>  var path = 'hr' + 'ef' + '=';
>  var addy48218 = '&#105;nf&#111;' + '&#64;';
>  addy48218 = addy48218 + 'tsv-n&#101;&#117;r&#105;&#101;d' + '&#46;' +
> 'd&#101;';
>  document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' +
> prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>';
>  //-->

This is deliberately being done to prevent scripted usage. What
exactly are you needing to do this for?

You're basically going to have to execute the entire block of
JavaScript code, and then decode the entities to get to what you want.
Doing it manually is pretty easy; doing it automatically will
virtually require a language interpreter.

ChrisA

Back to comp.lang.python | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Re: Extract email address from Java script in html source using python Chris Angelico <rosuav@gmail.com> - 2015-05-23 19:01 +1000
  Re: Extract email address from Java script in html source using python Steve Hayes <hayesstw@telkomsa.net> - 2015-05-24 03:04 +0200
    Re: Extract email address from Java script in html source using python VanguardLH <V@nguard.LH> - 2015-05-24 01:17 -0500

csiph-web