Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #51565 > unrolled thread
| Started by | cool1574@gmail.com |
|---|---|
| First post | 2013-07-30 07:49 -0700 |
| Last post | 2013-08-05 10:30 +1000 |
| Articles | 20 on this page of 24 — 12 participants |
Back to article view | Back to comp.lang.python
Python script help cool1574@gmail.com - 2013-07-30 07:49 -0700
Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:38 +0100
Re: Python script help cool1574@gmail.com - 2013-07-30 08:49 -0700
Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:58 +0100
Re: Python script help cool1574@gmail.com - 2013-07-30 09:10 -0700
Re: Python script help cool1574@gmail.com - 2013-07-30 09:12 -0700
Re: Python script help Cameron Simpson <cs@zip.com.au> - 2013-07-31 07:47 +1000
Re: Python script help Joshua Landau <joshua@landau.ws> - 2013-07-31 07:24 +0100
Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 17:22 +0100
Re: Python script help Vincent Vande Vyvre <vincent.vandevyvre@swing.be> - 2013-07-30 18:58 +0200
Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-07-30 17:32 +0200
Re: Python script help Denis McMahon <denismfmcmahon@gmail.com> - 2013-07-31 05:08 +0000
Re: Python script help cool1574@gmail.com - 2013-07-31 01:15 -0700
Re: Python script help alex23 <wuwei23@gmail.com> - 2013-08-01 10:57 +1000
Re: Python script help Alister <alister.ware@ntlworld.com> - 2013-08-01 10:39 +0000
Re: Python script help Piet van Oostrum <piet@vanoostrum.org> - 2013-08-23 22:37 -0400
Re: Python script help cool1574@gmail.com - 2013-08-01 09:02 -0700
Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-08-02 10:44 +0200
Re: Python script help cool1574@gmail.com - 2013-08-02 02:46 -0700
Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-02 11:01 +0100
Re: Python script help cool1574@gmail.com - 2013-08-04 08:57 -0700
Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-04 17:20 +0100
Re: Python script help Michael Torrie <torriem@gmail.com> - 2013-08-04 16:58 -0600
Re: Python script help Jake Angulo <jake.angulo@gmail.com> - 2013-08-05 10:30 +1000
Page 1 of 2 [1] 2 Next page →
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-07-30 07:49 -0700 |
| Subject | Python script help |
| Message-ID | <4566d0e7-2576-4d09-83f5-fca3b370710a@googlegroups.com> |
Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically. I appreciate your help, Thank you.
[toc] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-30 16:38 +0100 |
| Message-ID | <mailman.5315.1375198746.3114.python-list@python.org> |
| In reply to | #51565 |
On Tue, Jul 30, 2013 at 3:49 PM, <cool1574@gmail.com> wrote:
> Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
> I appreciate your help,
> Thank you.
baseurl = "http://........"
options = "....."
os.system("wget "+options+" "+baseurl)
Sometimes the right tool for the job isn't Python.
ChrisA
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-07-30 08:49 -0700 |
| Message-ID | <c382dc59-d52c-47e6-8e96-16d4ef07e640@googlegroups.com> |
| In reply to | #51569 |
I know but I think using Python in this situation is good...is that the full script?
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-30 16:58 +0100 |
| Message-ID | <mailman.5318.1375199904.3114.python-list@python.org> |
| In reply to | #51571 |
On Tue, Jul 30, 2013 at 4:49 PM, <cool1574@gmail.com> wrote: > I know but I think using Python in this situation is good...is that the full script? That script just drops out to the system and lets wget do it. So don't bother with it. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-07-30 09:10 -0700 |
| Message-ID | <ea0a2387-4ce3-4441-ac8d-64eae6ca0aee@googlegroups.com> |
| In reply to | #51573 |
What if I want to use only Python? is that possible? using lib and lib2?
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-07-30 09:12 -0700 |
| Message-ID | <7e9a50d5-3340-4cf0-a3b8-ef1109837529@googlegroups.com> |
| In reply to | #51576 |
** urlib, urlib2
[toc] | [prev] | [next] | [standalone]
| From | Cameron Simpson <cs@zip.com.au> |
|---|---|
| Date | 2013-07-31 07:47 +1000 |
| Message-ID | <mailman.5345.1375220892.3114.python-list@python.org> |
| In reply to | #51577 |
On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote:
| ** urlib, urlib2
Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
So: urllib[2] to fetch the document and BS to parse it for links,
then urllib[2] to fetch the links you want.
http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/
Cheers,
--
Cameron Simpson <cs@zip.com.au>
You can be psychotic and still be competent.
- John L. Young, American Academy of Psychiatry and the Law on Ted
Kaczynski, and probably most internet users
[toc] | [prev] | [next] | [standalone]
| From | Joshua Landau <joshua@landau.ws> |
|---|---|
| Date | 2013-07-31 07:24 +0100 |
| Message-ID | <mailman.5352.1375251900.3114.python-list@python.org> |
| In reply to | #51577 |
[Multipart message — attachments visible in raw view] — view raw
On 30 July 2013 22:47, Cameron Simpson <cs@zip.com.au> wrote: > On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote: > | ** urlib, urlib2 > > Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that. > So: urllib[2] to fetch the document and BS to parse it for links, > then urllib[2] to fetch the links you want. > > http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/ Personally BeautifulSoup + requests is a great combination. Maybe I'm just lazy ;).
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-07-30 17:22 +0100 |
| Message-ID | <mailman.5322.1375201340.3114.python-list@python.org> |
| In reply to | #51576 |
On Tue, Jul 30, 2013 at 5:10 PM, <cool1574@gmail.com> wrote: > What if I want to use only Python? is that possible? using lib and lib2? > -- > http://mail.python.org/mailman/listinfo/python-list Sure, anything's possible. And a lot easier if you quote context in your posts. But why do it? wget is exactly what you need. ChrisA
[toc] | [prev] | [next] | [standalone]
| From | Vincent Vande Vyvre <vincent.vandevyvre@swing.be> |
|---|---|
| Date | 2013-07-30 18:58 +0200 |
| Message-ID | <mailman.5325.1375203985.3114.python-list@python.org> |
| In reply to | #51576 |
Le 30/07/2013 18:10, cool1574@gmail.com a écrit : > What if I want to use only Python? is that possible? using lib and lib2? Have a look here: http://bazaar.launchpad.net/~vincent-vandevyvre/qarte/trunk/view/head:/parsers.py This script get a web page and parse it to find downloadable objects. -- Vincent V.V. Oqapy <https://launchpad.net/oqapy> . Qarte <https://launchpad.net/qarte> . PaQager <https://launchpad.net/paqager>
[toc] | [prev] | [next] | [standalone]
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2013-07-30 17:32 +0200 |
| Message-ID | <it2lca-4sl.ln1@satorlaser.homedns.org> |
| In reply to | #51565 |
Am 30.07.2013 16:49, schrieb cool1574@gmail.com: > Hello, I am looking for a script that will be able to search an > online document (by giving the script the URL) and find all the > downloadable links in the document and then download them > automatically. Well, that's actually pretty simple. Using the URL, download the document. Then, parse it in order to extract embedded URLs and finally download the resulting URLs. If you have specific problems, please provide more info which part exactly you're having problems with, along with what you already tried etc. In short, show some effort yourself. In the meantime, I'd suggest reading a Python tutorial and Eric Raymonds essay on asking smart questions. Greetings! Uli
[toc] | [prev] | [next] | [standalone]
| From | Denis McMahon <denismfmcmahon@gmail.com> |
|---|---|
| Date | 2013-07-31 05:08 +0000 |
| Message-ID | <kta64v$1ci$4@dont-email.me> |
| In reply to | #51565 |
On Tue, 30 Jul 2013 07:49:04 -0700, cool1574 wrote: > Hello, I am looking for a script that will be able to search an online > document (by giving the script the URL) and find all the downloadable > links in the document and then download them automatically. > I appreciate your help, Why use Python? Just: wget -m url -- Denis McMahon, denismfmcmahon@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-07-31 01:15 -0700 |
| Message-ID | <e47a83a9-14cc-4596-b17c-d38c5f300151@googlegroups.com> |
| In reply to | #51565 |
Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)
p.s: can I set a destination folder for the downloads?
urllib.urlopen("http://....")
possible_urls = re.findall(r'\S+:\S+', text)
import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()
[toc] | [prev] | [next] | [standalone]
| From | alex23 <wuwei23@gmail.com> |
|---|---|
| Date | 2013-08-01 10:57 +1000 |
| Message-ID | <ktcbcl$gc1$1@dont-email.me> |
| In reply to | #51631 |
On 31/07/2013 6:15 PM, cool1574@gmail.com wrote: > Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it) 1. Think about the requirements. 2. Write some code. 3. Test it. 4. Repeat until requirements are met. > p.s: can I set a destination folder for the downloads? Yes. Show us you're actively trying to solve this yourself rather than just asking us to write the code for you.
[toc] | [prev] | [next] | [standalone]
| From | Alister <alister.ware@ntlworld.com> |
|---|---|
| Date | 2013-08-01 10:39 +0000 |
| Message-ID | <wVqKt.32469$FS6.25619@fx10.am4> |
| In reply to | #51710 |
On Thu, 01 Aug 2013 10:57:01 +1000, alex23 wrote: > On 31/07/2013 6:15 PM, cool1574@gmail.com wrote: >> Here are some scripts, how do I put them together to create the script >> I want? (to search a online document and download all the links in it) > > 1. Think about the requirements. > 2. Write some code. > 3. Test it. > 4. Repeat until requirements are met. > >> p.s: can I set a destination folder for the downloads? > > Yes. > > Show us you're actively trying to solve this yourself rather than just > asking us to write the code for you. alternatively i can provide a quotation to produce a product to your specification. (My rates are extremely high) -- Hand me a pair of leather pants and a CASIO keyboard -- I'm living for today!
[toc] | [prev] | [next] | [standalone]
| From | Piet van Oostrum <piet@vanoostrum.org> |
|---|---|
| Date | 2013-08-23 22:37 -0400 |
| Message-ID | <m2ioyv3jc5.fsf@cochabamba.vanoostrum.org> |
| In reply to | #51631 |
cool1574@gmail.com writes:
> Here are some scripts, how do I put them together to create the script
> I want? (to search a online document and download all the links in it)
> p.s: can I set a destination folder for the downloads?
You can use os.chdir to go to the desired folder.
>
> urllib.urlopen("http://....")
>
> possible_urls = re.findall(r'\S+:\S+', text)
>
> import urllib2
> response = urllib2.urlopen('http://www.example.com/')
> html = response.read()
If you insist on not using wget, here is a simple script with
BeautifulSoup (v4):
########################################################################
from bs4 import BeautifulSoup
from urllib2 import urlopen
from urlparse import urljoin
import os
import re
os.chdir('OUT')
def generate_filename(url):
url = re.sub('^[a-zA-Z0-9+.-]+:/*', '', url)
return url.replace('/', '_')
URL = "http://www.example.com/"
soup = BeautifulSoup(urlopen(URL).read())
links = soup.select('a[href]')
for link in links:
url = urljoin(URL, link['href'])
print url
html = urlopen(url).read()
fn = generate_filename(url)
with open(fn, 'wb') as outfile:
outfile.write(html)
########################################################################
You should add a more intelligent filename generator, filter out mail:
urls and possibly others and add exception handling for HTTP errors.
--
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-08-01 09:02 -0700 |
| Message-ID | <f6842c80-369b-4fcc-a555-1c953cc1a865@googlegroups.com> |
| In reply to | #51565 |
I know I should be testing out the script myself but I did, I tried and since I am new in python and I work for a security firm that ask me to scan hundreds of documents a day for unsafe links (by opening them) I thought writing a script will be much easier. I do not know how to combine those three scripts together (the ones I wrote in my previous replay) that is why I cam to here for help. please help me build a working script that will do the job. Thanks in advance. you can contact me at cool1574@gmail.com
[toc] | [prev] | [next] | [standalone]
| From | Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> |
|---|---|
| Date | 2013-08-02 10:44 +0200 |
| Message-ID | <a58sca-4n7.ln1@satorlaser.homedns.org> |
| In reply to | #51741 |
Am 01.08.2013 18:02, schrieb cool1574@gmail.com: > I know I should be testing out the script myself but I did, I tried > and since I am new in python and I work for a security firm that ask > me to scan hundreds of documents a day for unsafe links (by opening > them) I thought writing a script will be much easier. I do not know > how to combine those three scripts together (the ones I wrote in my > previous replay) that is why I cam to here for help. please help me > build a working script that will do the job. This first option is to hire a programmer, which should give you the quickest results. If the most important thing is getting the job done, then this should be your #1 approach. Now, if you really want to do it yourself, you will have to do some learning yourself. Start with http://docs.python.org, which includes tutorials, references and a bunch of other links, in particular go through the tutorials. Make sure you pick the documentation corresponding to your Python version though, versions 2 and 3 are subtly different! Then, read http://www.catb.org/esr/faqs/smart-questions.html. This is a a bit metatopical but still important, and while this doesn't make you a programmer in an afternoon, it will help you understand various reactions you received here. hope that gets you started Uli
[toc] | [prev] | [next] | [standalone]
| From | cool1574@gmail.com |
|---|---|
| Date | 2013-08-02 02:46 -0700 |
| Message-ID | <1faf05ad-2cd4-497e-a605-db4650c04103@googlegroups.com> |
| In reply to | #51565 |
I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...
[toc] | [prev] | [next] | [standalone]
| From | Chris Angelico <rosuav@gmail.com> |
|---|---|
| Date | 2013-08-02 11:01 +0100 |
| Message-ID | <mailman.111.1375437723.1251.python-list@python.org> |
| In reply to | #51790 |
On Fri, Aug 2, 2013 at 10:46 AM, <cool1574@gmail.com> wrote: > I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that... Be aware that you might be paying money for that. If you know "some" carpentry but not enough to put together a bookcase, and you ask a professional carpenter to make you a bookcase, you'll have to pay him. The same is true in programming, except that there are more people willing to work for nothing, hence the vague "might be" rather than the inevitable "shall" or the mighty "must" [1]. To get people to work for you for free, you have to make them (us) want to, which in the geeky arts generally means making it an interesting problem. Achieving this is described well in esr's essay on asking smart questions [2], which Ulrich also just pointed you to. We do this sort of thing for fun, for love, so if you make your problem appeal to us, there's a high chance that someone will provide you with code. [1] http://www.youtube.com/watch?v=GVVTYII422k [2] http://www.catb.org/esr/faqs/smart-questions.html ChrisA
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.python
csiph-web