Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #107966 > unrolled thread

You gotta love a 2-line python solution

Started byDFS <nospam@dfs.com>
First post2016-05-01 23:39 -0400
Last post2016-05-03 08:14 -0400
Articles 6 on this page of 26 — 11 participants

Back to article view | Back to comp.lang.python


Contents

  You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-01 23:39 -0400
    Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 21:31 -0700
      Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 00:51 -0400
        Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:02 -0700
          Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 01:08 -0400
            Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:21 -0700
              Re: You gotta love a 2-line python solution Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 15:51 +1000
          Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 01:23 -0400
            Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:37 -0700
              Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 02:13 -0400
    Re: You gotta love a 2-line python solution Terry Reedy <tjreedy@udel.edu> - 2016-05-02 02:46 -0400
    Re: You gotta love a 2-line python solution BartC <bc@freeuk.com> - 2016-05-02 10:26 +0100
      Re: You gotta love a 2-line python solution Marko Rauhamaa <marko@pacujo.net> - 2016-05-02 13:12 +0300
        Re: You gotta love a 2-line python solution Steven D'Aprano <steve@pearwood.info> - 2016-05-02 22:05 +1000
      Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 11:15 -0400
        Re: You gotta love a 2-line python solution Larry Martell <larry.martell@gmail.com> - 2016-05-02 11:24 -0400
        Re: You gotta love a 2-line python solution Manolo Martínez <manolo@austrohungaro.com> - 2016-05-02 17:32 +0200
    Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 17:45 -0700
      Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 21:12 -0400
        Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 20:27 -0700
          Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-02 20:49 -0700
            Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 20:57 -0700
              Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-03 09:09 -0700
          Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 23:56 -0400
            Re: You gotta love a 2-line python solution Steven D'Aprano <steve@pearwood.info> - 2016-05-04 11:20 +1000
          Re: You gotta love a 2-line python solution Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-05-03 08:14 -0400

Page 2 of 2 — ← Prev page 1 [2]


#108044

FromStephen Hansen <me+python@ixokai.io>
Date2016-05-02 20:49 -0700
Message-ID<mailman.334.1462247348.32212.python-list@python.org>
In reply to#108043
On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote:
> But when I try to get this forum page, it does get a html file but can't
> be viewed normally.

What does that mean?

-- 
Stephen Hansen
  m e @ i x o k a i . i o

[toc] | [prev] | [next] | [standalone]


#108046

Fromjfong@ms4.hinet.net
Date2016-05-02 20:57 -0700
Message-ID<21c6d3df-3346-49b5-ac66-76b977c5aef5@googlegroups.com>
In reply to#108044
Stephen Hansen at 2016/5/3 11:49:22AM wrote:
> On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote:
> > But when I try to get this forum page, it does get a html file but can't
> > be viewed normally.
> 
> What does that mean?
> 
> -- 
> Stephen Hansen
>   m e @ i x o k a i . i o

The page we are looking at:-)
https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A

[toc] | [prev] | [next] | [standalone]


#108080

FromStephen Hansen <me+python@ixokai.io>
Date2016-05-03 09:09 -0700
Message-ID<mailman.349.1462291780.32212.python-list@python.org>
In reply to#108046
On Mon, May 2, 2016, at 08:57 PM, jfong@ms4.hinet.net wrote:
> Stephen Hansen at 2016/5/3 11:49:22AM wrote:
> > On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote:
> > > But when I try to get this forum page, it does get a html file but can't
> > > be viewed normally.
> > 
> > What does that mean?
> > 
> > -- 
> > Stephen Hansen
> >   m e @ i x o k a i . i o
> 
> The page we are looking at:-)
> https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A

Try scraping gmane. Google Groups is one big javascript application.

-- 
Stephen Hansen
  m e @ i x o k a i . i o

[toc] | [prev] | [next] | [standalone]


#108045

FromDFS <nospam@dfs.com>
Date2016-05-02 23:56 -0400
Message-ID<ng97bn$4fi$1@dont-email.me>
In reply to#108043
On 5/2/2016 11:27 PM, jfong@ms4.hinet.net wrote:
> DFS at 2016/5/3 9:12:24AM wrote:
>> try
>>
>> from urllib.request import urlretrieve
>>
>> http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>
>>
>> I'm running python 2.7.11 (32-bit)
>
> Alright, it works...someway.
>
> I try to get a zip file. It works, the file can be unzipped correctly.
>
>>>> from urllib.request import urlretrieve
>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip", "d:\\temp\\temp.zip")
> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>
>
> But when I try to get this forum page, it does get a html file but can't be viewed normally.
>
>>>> urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
> bmR7A", "d:\\temp\\temp.html")
> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>
>
> I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-)


Who knows what Google has done... it won't open in Opera.  The tab title 
shows up, but after 20-30 seconds the screen just stays blank and the 
cursor quits loading.

It's a mess - try running it thru BeautifulSoup.prettify() and it looks 
better.

------------------------------------------------------------
import BeautifulSoup
from urllib.request import urlretrieve
webfile = "D:\\afile.html"
urllib.urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A",webfile)
f = open(webfile)
soup = BeautifulSoup.BeautifulSoup(f)
f.close()
print soup.prettify()
------------------------------------------------------------


[toc] | [prev] | [next] | [standalone]


#108108

FromSteven D'Aprano <steve@pearwood.info>
Date2016-05-04 11:20 +1000
Message-ID<57294e5d$0$1606$c3e8da3$5496439d@news.astraweb.com>
In reply to#108045
On Tue, 3 May 2016 01:56 pm, DFS wrote:

> On 5/2/2016 11:27 PM, jfong@ms4.hinet.net wrote:
>> DFS at 2016/5/3 9:12:24AM wrote:
>>> try
>>>
>>> from urllib.request import urlretrieve
>>>
>>>
http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>>
>>>
>>> I'm running python 2.7.11 (32-bit)
>>
>> Alright, it works...someway.
>>
>> I try to get a zip file. It works, the file can be unzipped correctly.
>>
>>>>> from urllib.request import urlretrieve
>>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip",
>>>>> "d:\\temp\\temp.zip")
>> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>>
>>
>> But when I try to get this forum page, it does get a html file but can't
>> be viewed normally.
>>
>>>>>
urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
>> bmR7A", "d:\\temp\\temp.html")
>> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>>
>>
>> I suppose the html is a much complex situation where more processes need
>> to be done before it can be opened by a web browser:-)
> 
> 
> Who knows what Google has done... it won't open in Opera.  The tab title
> shows up, but after 20-30 seconds the screen just stays blank and the
> cursor quits loading.


Dennis has given the answer to this, but since he has X-No-Archive=Yes, his
useful and well-written answer will be lost forever.

So I've taken the liberty of copying his answer here:

Dennis Lee Bieber says:

        There's practically no HTML in that page -- just miles of
Javascript.
The one obvious item is:

-=-=-=-=-=-
<script type="text/javascript" language="javascript"
src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js">
        
      </script>
-=-=-=-=-=-

which is a RELATIVE path. If you copied the file to your machine and then
load it in a browser, it will be looking for

/forum/C53652DA8B67255A46256B72F0D65A40.cache.js 

to be on your machine in a subdirectory of where you saved the main file.

        You'd have to recreate most of the Google environment and fetch
anything that was referenced through a relative path first, to get the
content to display. Of course, you may find, for example, that the
Javascript at some point is doing a database lookup -- and you'd maybe have
to now duplicate the database...



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#108064

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2016-05-03 08:14 -0400
Message-ID<mailman.343.1462277706.32212.python-list@python.org>
In reply to#108043
On Mon, 2 May 2016 20:27:45 -0700 (PDT), jfong@ms4.hinet.net declaimed the
following:

>I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-)

	There's practically no HTML in that page -- just miles of Javascript.
The one obvious item is:

-=-=-=-=-=-
<script type="text/javascript" language="javascript"
src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js">
        
      </script>
-=-=-=-=-=-

which is a RELATIVE path. If you copied the file to your machine and then
load it in a browser, it will be looking for

/forum/C53652DA8B67255A46256B72F0D65A40.cache.js 

to be on your machine in a subdirectory of where you saved the main file.

	You'd have to recreate most of the Google environment and fetch
anything that was referenced through a relative path first, to get the
content to display. Of course, you may find, for example, that the
Javascript at some point is doing a database lookup -- and you'd maybe have
to now duplicate the database...
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.python


csiph-web