Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed2.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Date: Sun, 22 Dec 2013 20:28:21 +0000
From: MRAB <python@mrabarnett.plus.com>
User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:24.0) Gecko/20100101 Thunderbird/24.2.0
MIME-Version: 1.0
To: python-list@python.org
Subject: Re: BLANK PAGE when i try Filtering Adsense with abpy
References: <bf7b5838-9f24-4a94-9c14-0976a21578ff@googlegroups.com> <52B72AA7.4050609@gmail.com>
In-Reply-To: <52B72AA7.4050609@gmail.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Precedence: list
Reply-To: python-list@python.org
Newsgroups: comp.lang.python
Message-ID: <mailman.4508.1387744110.18130.python-list@python.org>
Lines: 51
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:62551

On 22/12/2013 18:08, Michael Torrie wrote:
> On 12/22/2013 10:20 AM, em rexhepi wrote:
>> When I use my code it just displays nothing
>>
>> My code:
>> #!/usr/local/bin/python3.1
>>
>> import cgitb;cgitb.enable()
>>
>> import urllib.request
>> response = urllib.request.build_opener()
>> response.addheaders = [('User-agent', 'Mozilla/5.0')]
>> response = urllib.request.urlopen("www.youtube.com";)
>>
>> html = response.read()
>>
>> from abpy import Filter
>> with open("easylist.txt") as f:
>> f = Filter(file('easylist.txt'))
>> f.match(html)
>
> What happens when you comment out the above four lines?  Does the web
> page print without the filtering?  Just as a sanity check.  My hunch is
> that html has no data in it.
>
> Also what is "f.match(html)" supposed to return? Is it supposed to
> mutate html (seems unlikely) or does it return something? Looking at the
> source code, match() does not return anything, but prints to stdout,
> which is weird, but at least that tells us that it doesn't actually
> change the html object.
>
>> print("Content-type: text/html")
>> print()
>> print (html)
>
> I'm not sure you're doing this right.  adpy seems a bit goofy, but since
> f.match() does not appear to change html at all, you should get the same
> html out that urllib grabbed.  So if you're not getting any output, that
> means you're not getting the original html somehow.  Also if f.match()
> is doing its thing, I don't think you want to print out html after the
> command, because f.match itself is printing to stdout itself.
>
> Have you looked over the adpy source code?  I haven't bothered to run
> it, but a glance through the code would seem to indicate that it doesn't
> actually do the filtering at all, but rather just prints out the rules
> that the html code you provide would match.  I bet you could modify it
> to do filtering though.  Maybe add a method that uses rule.sub to
> replace the bad text with an empty string.
>
The urlopen call also contains a stray semicolon.