Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #62538

Re: BLANK PAGE when i try Filtering Adsense with abpy

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!novso.com!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <torriem+gmail@torriefamily.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.004
X-Spam-Evidence '*H*': 0.99; '*S*': 0.00; 'string.': 0.05; 'modify': 0.07; 'output,': 0.09; 'sanity': 0.09; 'itself.': 0.14; 'check.': 0.16; 'code?': 0.16; 'command,': 0.16; 'from:addr:torriem': 0.16; 'from:name:michael torrie': 0.16; 'somehow.': 0.16; 'stdout': 0.16; 'subject:when': 0.16; 'urllib': 0.16; 'skip:# 20': 0.16; 'wrote:': 0.18; 'looked': 0.18; 'all,': 0.19; 'bit': 0.19; 'seems': 0.21; 'code,': 0.22; 'import': 0.22; 'rules': 0.22; 'print': 0.22; 'header:User-Agent:1': 0.23; 'filtering': 0.24; 'replace': 0.24; 'tells': 0.24; "haven't": 0.24; 'source': 0.25; 'right.': 0.26; 'least': 0.26; 'code:': 0.26; 'header:In-Reply- To:1': 0.27; 'appear': 0.29; 'am,': 0.29; "doesn't": 0.30; "i'm": 0.30; 'code': 0.31; 'getting': 0.31; "skip:' 10": 0.31; 'object.': 0.31; 'prints': 0.31; 'though.': 0.31; 'supposed': 0.32; 'run': 0.32; 'text': 0.33; 'comment': 0.34; 'maybe': 0.34; 'could': 0.34; 'subject:with': 0.35; 'skip:u 20': 0.35; 'but': 0.35; 'add': 0.35; 'doing': 0.36; 'method': 0.36; 'should': 0.36; 'skip:o 20': 0.38; 'message-id:@gmail.com': 0.38; 'displays': 0.38; 'filter': 0.38; 'skip:[ 10': 0.38; 'to:addr:python-list': 0.38; 'rather': 0.38; 'does': 0.39; 'bad': 0.39; 'itself': 0.39; 'sure': 0.39; 'to:addr:python.org': 0.39; 'skip:p 20': 0.39; 'received:org': 0.40; 'skip:u 10': 0.60; "you're": 0.61; 'provide': 0.64; 'bothered': 0.68; 'glance': 0.84; 'subject:try': 0.84; 'thing,': 0.91
X-Virus-Scanned amavisd-new at torriefamily.org
Date Sun, 22 Dec 2013 11:08:39 -0700
From Michael Torrie <torriem@gmail.com>
User-Agent Mozilla/5.0 (X11; Linux i686; rv:10.0.12) Gecko/20130105 Thunderbird/10.0.12
MIME-Version 1.0
To python-list@python.org
Subject Re: BLANK PAGE when i try Filtering Adsense with abpy
References <bf7b5838-9f24-4a94-9c14-0976a21578ff@googlegroups.com>
In-Reply-To <bf7b5838-9f24-4a94-9c14-0976a21578ff@googlegroups.com>
Content-Type text/plain; charset=ISO-8859-1
Content-Transfer-Encoding 7bit
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.4497.1387735747.18130.python-list@python.org> (permalink)
Lines 48
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1387735747 news.xs4all.nl 2832 [2001:888:2000:d::a6]:36412
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:62538

Show key headers only | View raw


On 12/22/2013 10:20 AM, em rexhepi wrote:
> When I use my code it just displays nothing
> 
> My code:
> #!/usr/local/bin/python3.1
> 
> import cgitb;cgitb.enable()
> 
> import urllib.request
> response = urllib.request.build_opener()
> response.addheaders = [('User-agent', 'Mozilla/5.0')]
> response = urllib.request.urlopen("www.youtube.com";)
> 
> html = response.read()
> 
> from abpy import Filter
> with open("easylist.txt") as f:
> f = Filter(file('easylist.txt'))
> f.match(html)

What happens when you comment out the above four lines?  Does the web
page print without the filtering?  Just as a sanity check.  My hunch is
that html has no data in it.

Also what is "f.match(html)" supposed to return? Is it supposed to
mutate html (seems unlikely) or does it return something? Looking at the
source code, match() does not return anything, but prints to stdout,
which is weird, but at least that tells us that it doesn't actually
change the html object.

> print("Content-type: text/html")
> print()
> print (html)

I'm not sure you're doing this right.  adpy seems a bit goofy, but since
f.match() does not appear to change html at all, you should get the same
html out that urllib grabbed.  So if you're not getting any output, that
means you're not getting the original html somehow.  Also if f.match()
is doing its thing, I don't think you want to print out html after the
command, because f.match itself is printing to stdout itself.

Have you looked over the adpy source code?  I haven't bothered to run
it, but a glance through the code would seem to indicate that it doesn't
actually do the filtering at all, but rather just prints out the rules
that the html code you provide would match.  I bet you could modify it
to do filtering though.  Maybe add a method that uses rule.sub to
replace the bad text with an empty string.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

BLANK PAGE when i try Filtering Adsense with abpy em rexhepi <em.rexhepi@gmail.com> - 2013-12-22 09:20 -0800
  Re: BLANK PAGE when i try Filtering Adsense with abpy Chris Angelico <rosuav@gmail.com> - 2013-12-23 04:58 +1100
  Re: BLANK PAGE when i try Filtering Adsense with abpy Michael Torrie <torriem@gmail.com> - 2013-12-22 11:08 -0700
  Re: BLANK PAGE when i try Filtering Adsense with abpy Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-22 18:25 +0000
  Re: BLANK PAGE when i try Filtering Adsense with abpy MRAB <python@mrabarnett.plus.com> - 2013-12-22 20:28 +0000
  Re: BLANK PAGE when i try Filtering Adsense with abpy Terry Reedy <tjreedy@udel.edu> - 2013-12-22 16:13 -0500

csiph-web