Groups > comp.lang.python > #51565 > unrolled thread

Python script help

Started by	cool1574@gmail.com
First post	2013-07-30 07:49 -0700
Last post	2013-08-05 10:30 +1000
Articles	20 on this page of 24 — 12 participants

Back to article view | Back to comp.lang.python

  Python script help cool1574@gmail.com - 2013-07-30 07:49 -0700
    Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:38 +0100
      Re: Python script help cool1574@gmail.com - 2013-07-30 08:49 -0700
        Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:58 +0100
          Re: Python script help cool1574@gmail.com - 2013-07-30 09:10 -0700
            Re: Python script help cool1574@gmail.com - 2013-07-30 09:12 -0700
              Re: Python script help Cameron Simpson <cs@zip.com.au> - 2013-07-31 07:47 +1000
              Re: Python script help Joshua Landau <joshua@landau.ws> - 2013-07-31 07:24 +0100
            Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 17:22 +0100
            Re: Python script help Vincent Vande Vyvre <vincent.vandevyvre@swing.be> - 2013-07-30 18:58 +0200
    Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-07-30 17:32 +0200
    Re: Python script help Denis McMahon <denismfmcmahon@gmail.com> - 2013-07-31 05:08 +0000
    Re: Python script help cool1574@gmail.com - 2013-07-31 01:15 -0700
      Re: Python script help alex23 <wuwei23@gmail.com> - 2013-08-01 10:57 +1000
        Re: Python script help Alister <alister.ware@ntlworld.com> - 2013-08-01 10:39 +0000
      Re: Python script help Piet van Oostrum <piet@vanoostrum.org> - 2013-08-23 22:37 -0400
    Re: Python script help cool1574@gmail.com - 2013-08-01 09:02 -0700
      Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-08-02 10:44 +0200
    Re: Python script help cool1574@gmail.com - 2013-08-02 02:46 -0700
      Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-02 11:01 +0100
        Re: Python script help cool1574@gmail.com - 2013-08-04 08:57 -0700
          Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-04 17:20 +0100
      Re: Python script help Michael Torrie <torriem@gmail.com> - 2013-08-04 16:58 -0600
      Re: Python script help Jake Angulo <jake.angulo@gmail.com> - 2013-08-05 10:30 +1000

Page 1 of 2 [1] 2 Next page →

#51565 — Python script help

From	cool1574@gmail.com
Date	2013-07-30 07:49 -0700
Subject	Python script help
Message-ID	<4566d0e7-2576-4d09-83f5-fca3b370710a@googlegroups.com>

Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
I appreciate your help,
Thank you.

[toc] | [next] | [standalone]

#51569

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-30 16:38 +0100
Message-ID	<mailman.5315.1375198746.3114.python-list@python.org>
In reply to	#51565

On Tue, Jul 30, 2013 at 3:49 PM,  <cool1574@gmail.com> wrote:
> Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
> I appreciate your help,
> Thank you.

baseurl = "http://........"
options = "....."
os.system("wget "+options+" "+baseurl)

Sometimes the right tool for the job isn't Python.

ChrisA

[toc] | [prev] | [next] | [standalone]

#51571

From	cool1574@gmail.com
Date	2013-07-30 08:49 -0700
Message-ID	<c382dc59-d52c-47e6-8e96-16d4ef07e640@googlegroups.com>
In reply to	#51569

I know but I think using Python in this situation is good...is that the full script?

[toc] | [prev] | [next] | [standalone]

#51573

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-30 16:58 +0100
Message-ID	<mailman.5318.1375199904.3114.python-list@python.org>
In reply to	#51571

On Tue, Jul 30, 2013 at 4:49 PM,  <cool1574@gmail.com> wrote:
> I know but I think using Python in this situation is good...is that the full script?

That script just drops out to the system and lets wget do it. So don't
bother with it.

ChrisA

[toc] | [prev] | [next] | [standalone]

#51576

From	cool1574@gmail.com
Date	2013-07-30 09:10 -0700
Message-ID	<ea0a2387-4ce3-4441-ac8d-64eae6ca0aee@googlegroups.com>
In reply to	#51573

What if I want to use only Python? is that possible? using lib and lib2?

[toc] | [prev] | [next] | [standalone]

#51577

From	cool1574@gmail.com
Date	2013-07-30 09:12 -0700
Message-ID	<7e9a50d5-3340-4cf0-a3b8-ef1109837529@googlegroups.com>
In reply to	#51576

** urlib, urlib2

[toc] | [prev] | [next] | [standalone]

#51610

From	Cameron Simpson <cs@zip.com.au>
Date	2013-07-31 07:47 +1000
Message-ID	<mailman.5345.1375220892.3114.python-list@python.org>
In reply to	#51577

On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote:
| ** urlib, urlib2

Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
So: urllib[2] to fetch the document and BS to parse it for links,
then urllib[2] to fetch the links you want.

http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/

Cheers,
-- 
Cameron Simpson <cs@zip.com.au>

You can be psychotic and still be competent.
        - John L. Young, American Academy of Psychiatry and the Law on Ted
          Kaczynski, and probably most internet users

[toc] | [prev] | [next] | [standalone]

#51625

From	Joshua Landau <joshua@landau.ws>
Date	2013-07-31 07:24 +0100
Message-ID	<mailman.5352.1375251900.3114.python-list@python.org>
In reply to	#51577

[Multipart message — attachments visible in raw view] — view raw

On 30 July 2013 22:47, Cameron Simpson <cs@zip.com.au> wrote:

> On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote:
> | ** urlib, urlib2
>
> Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
> So: urllib[2] to fetch the document and BS to parse it for links,
> then urllib[2] to fetch the links you want.
>
> http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/


Personally BeautifulSoup + requests is a great combination. Maybe I'm just
lazy ;).

[toc] | [prev] | [next] | [standalone]

#51580

From	Chris Angelico <rosuav@gmail.com>
Date	2013-07-30 17:22 +0100
Message-ID	<mailman.5322.1375201340.3114.python-list@python.org>
In reply to	#51576

On Tue, Jul 30, 2013 at 5:10 PM,  <cool1574@gmail.com> wrote:
> What if I want to use only Python? is that possible? using lib and lib2?
> --
> http://mail.python.org/mailman/listinfo/python-list

Sure, anything's possible. And a lot easier if you quote context in
your posts. But why do it? wget is exactly what you need.

ChrisA

[toc] | [prev] | [next] | [standalone]

#51583

From	Vincent Vande Vyvre <vincent.vandevyvre@swing.be>
Date	2013-07-30 18:58 +0200
Message-ID	<mailman.5325.1375203985.3114.python-list@python.org>
In reply to	#51576

Le 30/07/2013 18:10, cool1574@gmail.com a écrit :
> What if I want to use only Python? is that possible? using lib and lib2?

Have a look here:

http://bazaar.launchpad.net/~vincent-vandevyvre/qarte/trunk/view/head:/parsers.py

This script get a web page and parse it to find downloadable objects.
-- 
Vincent V.V.
Oqapy <https://launchpad.net/oqapy> . Qarte 
<https://launchpad.net/qarte> . PaQager <https://launchpad.net/paqager>

[toc] | [prev] | [next] | [standalone]

#51579

From	Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date	2013-07-30 17:32 +0200
Message-ID	<it2lca-4sl.ln1@satorlaser.homedns.org>
In reply to	#51565

Am 30.07.2013 16:49, schrieb cool1574@gmail.com:
> Hello, I am looking for a script that will be able to search an
> online document (by giving the script the URL) and find all the
> downloadable links in the document and then download them
> automatically.

Well, that's actually pretty simple. Using the URL, download the 
document. Then, parse it in order to extract embedded URLs and finally 
download the resulting URLs.

If you have specific problems, please provide more info which part 
exactly you're having problems with, along with what you already tried 
etc. In short, show some effort yourself. In the meantime, I'd suggest 
reading a Python tutorial and Eric Raymonds essay on asking smart questions.

Greetings!

Uli

[toc] | [prev] | [next] | [standalone]

#51620

From	Denis McMahon <denismfmcmahon@gmail.com>
Date	2013-07-31 05:08 +0000
Message-ID	<kta64v$1ci$4@dont-email.me>
In reply to	#51565

On Tue, 30 Jul 2013 07:49:04 -0700, cool1574 wrote:

> Hello, I am looking for a script that will be able to search an online
> document (by giving the script the URL) and find all the downloadable
> links in the document and then download them automatically.
> I appreciate your help,

Why use Python? Just:

wget -m url

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]

#51631

From	cool1574@gmail.com
Date	2013-07-31 01:15 -0700
Message-ID	<e47a83a9-14cc-4596-b17c-d38c5f300151@googlegroups.com>
In reply to	#51565

Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)
p.s: can I set a destination folder for the downloads?

urllib.urlopen("http://....")

possible_urls = re.findall(r'\S+:\S+', text)

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

[toc] | [prev] | [next] | [standalone]

#51710

From	alex23 <wuwei23@gmail.com>
Date	2013-08-01 10:57 +1000
Message-ID	<ktcbcl$gc1$1@dont-email.me>
In reply to	#51631

On 31/07/2013 6:15 PM, cool1574@gmail.com wrote:
> Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)

1. Think about the requirements.
2. Write some code.
3. Test it.
4. Repeat until requirements are met.

> p.s: can I set a destination folder for the downloads?

Yes.

Show us you're actively trying to solve this yourself rather than just 
asking us to write the code for you.

[toc] | [prev] | [next] | [standalone]

#51727

From	Alister <alister.ware@ntlworld.com>
Date	2013-08-01 10:39 +0000
Message-ID	<wVqKt.32469$FS6.25619@fx10.am4>
In reply to	#51710

On Thu, 01 Aug 2013 10:57:01 +1000, alex23 wrote:

> On 31/07/2013 6:15 PM, cool1574@gmail.com wrote:
>> Here are some scripts, how do I put them together to create the script
>> I want? (to search a online document and download all the links in it)
> 
> 1. Think about the requirements.
> 2. Write some code.
> 3. Test it.
> 4. Repeat until requirements are met.
> 
>> p.s: can I set a destination folder for the downloads?
> 
> Yes.
> 
> Show us you're actively trying to solve this yourself rather than just
> asking us to write the code for you.

alternatively i can provide a quotation to produce a product to your 
specification.
(My rates are extremely high)



-- 
Hand me a pair of leather pants and a CASIO keyboard -- I'm living for 
today!

[toc] | [prev] | [next] | [standalone]

#52923

From	Piet van Oostrum <piet@vanoostrum.org>
Date	2013-08-23 22:37 -0400
Message-ID	<m2ioyv3jc5.fsf@cochabamba.vanoostrum.org>
In reply to	#51631

cool1574@gmail.com writes:

> Here are some scripts, how do I put them together to create the script
> I want? (to search a online document and download all the links in it)
> p.s: can I set a destination folder for the downloads?

You can use os.chdir to go to the desired folder.
>
> urllib.urlopen("http://....")
>
> possible_urls = re.findall(r'\S+:\S+', text)
>
> import urllib2
> response = urllib2.urlopen('http://www.example.com/')
> html = response.read()

If you insist on not using wget, here is a simple script with
BeautifulSoup (v4):

########################################################################
from bs4 import BeautifulSoup
from urllib2 import urlopen
from urlparse import urljoin
import os
import re

os.chdir('OUT')

def generate_filename(url):
    url = re.sub('^[a-zA-Z0-9+.-]+:/*', '', url)
    return url.replace('/', '_')

URL = "http://www.example.com/"
soup = BeautifulSoup(urlopen(URL).read())

links = soup.select('a[href]')
for link in links:
    url = urljoin(URL, link['href'])
    print url
    html = urlopen(url).read()
    fn = generate_filename(url)
    with open(fn, 'wb') as outfile:
        outfile.write(html)
########################################################################

You should add a more intelligent filename generator, filter out mail:
urls and possibly others and add exception handling for HTTP errors.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [next] | [standalone]

#51741

From	cool1574@gmail.com
Date	2013-08-01 09:02 -0700
Message-ID	<f6842c80-369b-4fcc-a555-1c953cc1a865@googlegroups.com>
In reply to	#51565


I know I should be testing out the script myself but I did, I tried and since I am new in python and I work for a security firm that ask me to scan hundreds of documents a day for unsafe links (by opening them) I thought writing a script will be much easier. I do not know how to combine those three scripts together (the ones I wrote in my previous replay) that is why I cam to here for help. please help me build a working script that will do the job.
Thanks in advance.
you can contact me at cool1574@gmail.com

[toc] | [prev] | [next] | [standalone]

#51787

From	Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date	2013-08-02 10:44 +0200
Message-ID	<a58sca-4n7.ln1@satorlaser.homedns.org>
In reply to	#51741

Am 01.08.2013 18:02, schrieb cool1574@gmail.com:
> I know I should be testing out the script myself but I did, I tried
> and since I am new in python and I work for a security firm that ask
> me to scan hundreds of documents a day for unsafe links (by opening
> them) I thought writing a script will be much easier. I do not know
> how to combine those three scripts together (the ones I wrote in my
> previous replay) that is why I cam to here for help. please help me
> build a working script that will do the job.

This first option is to hire a programmer, which should give you the 
quickest results. If the most important thing is getting the job done, 
then this should be your #1 approach.

Now, if you really want to do it yourself, you will have to do some 
learning yourself. Start with http://docs.python.org, which includes 
tutorials, references and a bunch of other links, in particular go 
through the tutorials. Make sure you pick the documentation 
corresponding to your Python version though, versions 2 and 3 are subtly 
different!

Then, read http://www.catb.org/esr/faqs/smart-questions.html. This is a 
a bit metatopical but still important, and while this doesn't make you a 
programmer in an afternoon, it will help you understand various 
reactions you received here.

hope that gets you started

Uli

[toc] | [prev] | [next] | [standalone]

#51790

From	cool1574@gmail.com
Date	2013-08-02 02:46 -0700
Message-ID	<1faf05ad-2cd4-497e-a605-db4650c04103@googlegroups.com>
In reply to	#51565

I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...

[toc] | [prev] | [next] | [standalone]

#51791

From	Chris Angelico <rosuav@gmail.com>
Date	2013-08-02 11:01 +0100
Message-ID	<mailman.111.1375437723.1251.python-list@python.org>
In reply to	#51790

On Fri, Aug 2, 2013 at 10:46 AM,  <cool1574@gmail.com> wrote:
> I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...

Be aware that you might be paying money for that. If you know "some"
carpentry but not enough to put together a bookcase, and you ask a
professional carpenter to make you a bookcase, you'll have to pay him.
The same is true in programming, except that there are more people
willing to work for nothing, hence the vague "might be" rather than
the inevitable "shall" or the mighty "must" [1]. To get people to work
for you for free, you have to make them (us) want to, which in the
geeky arts generally means making it an interesting problem. Achieving
this is described well in esr's essay on asking smart questions [2],
which Ulrich also just pointed you to. We do this sort of thing for
fun, for love, so if you make your problem appeal to us, there's a
high chance that someone will provide you with code.

[1] http://www.youtube.com/watch?v=GVVTYII422k
[2] http://www.catb.org/esr/faqs/smart-questions.html

ChrisA

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Python script help

Contents

#51565 — Python script help

#51569

#51571

#51573

#51576

#51577

#51610

#51625

#51580

#51583

#51579

#51620

#51631

#51710

#51727

#52923

#51741

#51787

#51790

#51791