Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #107966 > unrolled thread
| Started by | DFS <nospam@dfs.com> |
|---|---|
| First post | 2016-05-01 23:39 -0400 |
| Last post | 2016-05-03 08:14 -0400 |
| Articles | 6 on this page of 26 — 11 participants |
Back to article view | Back to comp.lang.python
You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-01 23:39 -0400
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 21:31 -0700
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 00:51 -0400
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:02 -0700
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 01:08 -0400
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:21 -0700
Re: You gotta love a 2-line python solution Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2016-05-02 15:51 +1000
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 01:23 -0400
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-01 22:37 -0700
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 02:13 -0400
Re: You gotta love a 2-line python solution Terry Reedy <tjreedy@udel.edu> - 2016-05-02 02:46 -0400
Re: You gotta love a 2-line python solution BartC <bc@freeuk.com> - 2016-05-02 10:26 +0100
Re: You gotta love a 2-line python solution Marko Rauhamaa <marko@pacujo.net> - 2016-05-02 13:12 +0300
Re: You gotta love a 2-line python solution Steven D'Aprano <steve@pearwood.info> - 2016-05-02 22:05 +1000
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 11:15 -0400
Re: You gotta love a 2-line python solution Larry Martell <larry.martell@gmail.com> - 2016-05-02 11:24 -0400
Re: You gotta love a 2-line python solution Manolo Martínez <manolo@austrohungaro.com> - 2016-05-02 17:32 +0200
Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 17:45 -0700
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 21:12 -0400
Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 20:27 -0700
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-02 20:49 -0700
Re: You gotta love a 2-line python solution jfong@ms4.hinet.net - 2016-05-02 20:57 -0700
Re: You gotta love a 2-line python solution Stephen Hansen <me+python@ixokai.io> - 2016-05-03 09:09 -0700
Re: You gotta love a 2-line python solution DFS <nospam@dfs.com> - 2016-05-02 23:56 -0400
Re: You gotta love a 2-line python solution Steven D'Aprano <steve@pearwood.info> - 2016-05-04 11:20 +1000
Re: You gotta love a 2-line python solution Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2016-05-03 08:14 -0400
Page 2 of 2 — ← Prev page 1 [2]
| From | Stephen Hansen <me+python@ixokai.io> |
|---|---|
| Date | 2016-05-02 20:49 -0700 |
| Message-ID | <mailman.334.1462247348.32212.python-list@python.org> |
| In reply to | #108043 |
On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote: > But when I try to get this forum page, it does get a html file but can't > be viewed normally. What does that mean? -- Stephen Hansen m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
| From | jfong@ms4.hinet.net |
|---|---|
| Date | 2016-05-02 20:57 -0700 |
| Message-ID | <21c6d3df-3346-49b5-ac66-76b977c5aef5@googlegroups.com> |
| In reply to | #108044 |
Stephen Hansen at 2016/5/3 11:49:22AM wrote: > On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote: > > But when I try to get this forum page, it does get a html file but can't > > be viewed normally. > > What does that mean? > > -- > Stephen Hansen > m e @ i x o k a i . i o The page we are looking at:-) https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A
[toc] | [prev] | [next] | [standalone]
| From | Stephen Hansen <me+python@ixokai.io> |
|---|---|
| Date | 2016-05-03 09:09 -0700 |
| Message-ID | <mailman.349.1462291780.32212.python-list@python.org> |
| In reply to | #108046 |
On Mon, May 2, 2016, at 08:57 PM, jfong@ms4.hinet.net wrote: > Stephen Hansen at 2016/5/3 11:49:22AM wrote: > > On Mon, May 2, 2016, at 08:27 PM, jfong@ms4.hinet.net wrote: > > > But when I try to get this forum page, it does get a html file but can't > > > be viewed normally. > > > > What does that mean? > > > > -- > > Stephen Hansen > > m e @ i x o k a i . i o > > The page we are looking at:-) > https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A Try scraping gmane. Google Groups is one big javascript application. -- Stephen Hansen m e @ i x o k a i . i o
[toc] | [prev] | [next] | [standalone]
| From | DFS <nospam@dfs.com> |
|---|---|
| Date | 2016-05-02 23:56 -0400 |
| Message-ID | <ng97bn$4fi$1@dont-email.me> |
| In reply to | #108043 |
On 5/2/2016 11:27 PM, jfong@ms4.hinet.net wrote:
> DFS at 2016/5/3 9:12:24AM wrote:
>> try
>>
>> from urllib.request import urlretrieve
>>
>> http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>
>>
>> I'm running python 2.7.11 (32-bit)
>
> Alright, it works...someway.
>
> I try to get a zip file. It works, the file can be unzipped correctly.
>
>>>> from urllib.request import urlretrieve
>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip", "d:\\temp\\temp.zip")
> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>
>
> But when I try to get this forum page, it does get a html file but can't be viewed normally.
>
>>>> urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
> bmR7A", "d:\\temp\\temp.html")
> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>
>
> I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-)
Who knows what Google has done... it won't open in Opera. The tab title
shows up, but after 20-30 seconds the screen just stays blank and the
cursor quits loading.
It's a mess - try running it thru BeautifulSoup.prettify() and it looks
better.
------------------------------------------------------------
import BeautifulSoup
from urllib.request import urlretrieve
webfile = "D:\\afile.html"
urllib.urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJbmR7A",webfile)
f = open(webfile)
soup = BeautifulSoup.BeautifulSoup(f)
f.close()
print soup.prettify()
------------------------------------------------------------
[toc] | [prev] | [next] | [standalone]
| From | Steven D'Aprano <steve@pearwood.info> |
|---|---|
| Date | 2016-05-04 11:20 +1000 |
| Message-ID | <57294e5d$0$1606$c3e8da3$5496439d@news.astraweb.com> |
| In reply to | #108045 |
On Tue, 3 May 2016 01:56 pm, DFS wrote:
> On 5/2/2016 11:27 PM, jfong@ms4.hinet.net wrote:
>> DFS at 2016/5/3 9:12:24AM wrote:
>>> try
>>>
>>> from urllib.request import urlretrieve
>>>
>>>
http://stackoverflow.com/questions/21171718/urllib-urlretrieve-file-python-3-3
>>>
>>>
>>> I'm running python 2.7.11 (32-bit)
>>
>> Alright, it works...someway.
>>
>> I try to get a zip file. It works, the file can be unzipped correctly.
>>
>>>>> from urllib.request import urlretrieve
>>>>> urlretrieve("http://www.caprilion.com.tw/fed.zip",
>>>>> "d:\\temp\\temp.zip")
>> ('d:\\temp\\temp.zip', <http.client.HTTPMessage object at 0x03102C50>)
>>>>>
>>
>> But when I try to get this forum page, it does get a html file but can't
>> be viewed normally.
>>
>>>>>
urlretrieve("https://groups.google.com/forum/#!topic/comp.lang.python/jFl3GJ
>> bmR7A", "d:\\temp\\temp.html")
>> ('d:\\temp\\temp.html', <http.client.HTTPMessage object at 0x03102A90>)
>>>>>
>>
>> I suppose the html is a much complex situation where more processes need
>> to be done before it can be opened by a web browser:-)
>
>
> Who knows what Google has done... it won't open in Opera. The tab title
> shows up, but after 20-30 seconds the screen just stays blank and the
> cursor quits loading.
Dennis has given the answer to this, but since he has X-No-Archive=Yes, his
useful and well-written answer will be lost forever.
So I've taken the liberty of copying his answer here:
Dennis Lee Bieber says:
There's practically no HTML in that page -- just miles of
Javascript.
The one obvious item is:
-=-=-=-=-=-
<script type="text/javascript" language="javascript"
src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js">
</script>
-=-=-=-=-=-
which is a RELATIVE path. If you copied the file to your machine and then
load it in a browser, it will be looking for
/forum/C53652DA8B67255A46256B72F0D65A40.cache.js
to be on your machine in a subdirectory of where you saved the main file.
You'd have to recreate most of the Google environment and fetch
anything that was referenced through a relative path first, to get the
content to display. Of course, you may find, for example, that the
Javascript at some point is doing a database lookup -- and you'd maybe have
to now duplicate the database...
--
Steven
[toc] | [prev] | [next] | [standalone]
| From | Dennis Lee Bieber <wlfraed@ix.netcom.com> |
|---|---|
| Date | 2016-05-03 08:14 -0400 |
| Message-ID | <mailman.343.1462277706.32212.python-list@python.org> |
| In reply to | #108043 |
On Mon, 2 May 2016 20:27:45 -0700 (PDT), jfong@ms4.hinet.net declaimed the
following:
>I suppose the html is a much complex situation where more processes need to be done before it can be opened by a web browser:-)
There's practically no HTML in that page -- just miles of Javascript.
The one obvious item is:
-=-=-=-=-=-
<script type="text/javascript" language="javascript"
src="/forum/C53652DA8B67255A46256B72F0D65A40.cache.js">
</script>
-=-=-=-=-=-
which is a RELATIVE path. If you copied the file to your machine and then
load it in a browser, it will be looking for
/forum/C53652DA8B67255A46256B72F0D65A40.cache.js
to be on your machine in a subdirectory of where you saved the main file.
You'd have to recreate most of the Google environment and fetch
anything that was referenced through a relative path first, to get the
content to display. Of course, you may find, for example, that the
Javascript at some point is doing a database lookup -- and you'd maybe have
to now duplicate the database...
--
Wulfraed Dennis Lee Bieber AF6VN
wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/
[toc] | [prev] | [standalone]
Page 2 of 2 — ← Prev page 1 [2]
Back to top | Article view | comp.lang.python
csiph-web