Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #91159

Re: Extract email address from Java script in html source using python

From VanguardLH <V@nguard.LH>
Newsgroups comp.lang.python, alt.spam
Subject Re: Extract email address from Java script in html source using python
Date 2015-05-24 01:17 -0500
Organization Usenet denizen
Message-ID <csd8neFidb7U1@mid.individual.net> (permalink)
References <CAAXuHoeJ-YMQDB85qLDJ_o+9CrsfwLvm9wuOaRtbSj-i9kBaFA@mail.gmail.com> <mailman.267.1432371718.17265.python-list@python.org> <cq82ma9d4pqufo7u11m52eh4ca2hgmi25a@4ax.com>

Cross-posted to 2 groups.

Show all headers | View raw


Steve Hayes wrote:

> On Sat, 23 May 2015 19:01:55 +1000, Chris Angelico <rosuav@gmail.com>
> wrote:
> 
>>On Sat, May 23, 2015 at 4:46 PM, savitha devi <savithad8@gmail.com> wrote:
>>> I am developing a web scraper code using HTMLParser. I need to extract
>>> text/email address from java script with in the HTMLCode.I am beginner level
>>> in python coding and totally lost here. Need some help on this. The java
>>> script code is as below:
>>>
>>> <script type='text/javascript'>
>>>  //<!--
>>>  document.getElementById('cloak48218').innerHTML = '';
>>>  var prefix = '&#109;a' + 'i&#108;' + '&#116;o';
>>>  var path = 'hr' + 'ef' + '=';
>>>  var addy48218 = '&#105;nf&#111;' + '&#64;';
>>>  addy48218 = addy48218 + 'tsv-n&#101;&#117;r&#105;&#101;d' + '&#46;' +
>>> 'd&#101;';
>>>  document.getElementById('cloak48218').innerHTML += '<a ' + path + '\'' +
>>> prefix + ':' + addy48218 + '\'>' + addy48218+'<\/a>';
>>>  //-->
>>
>>This is deliberately being done to prevent scripted usage. What
>>exactly are you needing to do this for?
> 
> To sell addresses to spammers, of course.

The boob that uses this javascripted obfuscation (by slicing up the URL
across variables and using concatenation within a variable) hasn't a
clue that the javascript or user clicking on a URL will still have to
eventually go to the destination so it will still get blocked.  Duh!
Nothing is actually cloaked by the javascript (it's just another means
of building up the <A> tag) and the URL string (even if it used a
decimal value instead of IP-dotted) still has to connect to somewhere
and that gets detected and blocked.  Slicing up a URL across variables
and concantenation within a variable is a child's ploy to obfuscate.
Apparently savitha can't even distinguish between an e-mail address and
a URL string.

Back to comp.lang.python | Previous | NextPrevious in thread | Find similar | Unroll thread


Thread

Re: Extract email address from Java script in html source using python Chris Angelico <rosuav@gmail.com> - 2015-05-23 19:01 +1000
  Re: Extract email address from Java script in html source using python Steve Hayes <hayesstw@telkomsa.net> - 2015-05-24 03:04 +0200
    Re: Extract email address from Java script in html source using python VanguardLH <V@nguard.LH> - 2015-05-24 01:17 -0500

csiph-web