Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #51565 > unrolled thread

Python script help

Started bycool1574@gmail.com
First post2013-07-30 07:49 -0700
Last post2013-08-05 10:30 +1000
Articles 20 on this page of 24 — 12 participants

Back to article view | Back to comp.lang.python


Contents

  Python script help cool1574@gmail.com - 2013-07-30 07:49 -0700
    Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:38 +0100
      Re: Python script help cool1574@gmail.com - 2013-07-30 08:49 -0700
        Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 16:58 +0100
          Re: Python script help cool1574@gmail.com - 2013-07-30 09:10 -0700
            Re: Python script help cool1574@gmail.com - 2013-07-30 09:12 -0700
              Re: Python script help Cameron Simpson <cs@zip.com.au> - 2013-07-31 07:47 +1000
              Re: Python script help Joshua Landau <joshua@landau.ws> - 2013-07-31 07:24 +0100
            Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-07-30 17:22 +0100
            Re: Python script help Vincent Vande Vyvre <vincent.vandevyvre@swing.be> - 2013-07-30 18:58 +0200
    Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-07-30 17:32 +0200
    Re: Python script help Denis McMahon <denismfmcmahon@gmail.com> - 2013-07-31 05:08 +0000
    Re: Python script help cool1574@gmail.com - 2013-07-31 01:15 -0700
      Re: Python script help alex23 <wuwei23@gmail.com> - 2013-08-01 10:57 +1000
        Re: Python script help Alister <alister.ware@ntlworld.com> - 2013-08-01 10:39 +0000
      Re: Python script help Piet van Oostrum <piet@vanoostrum.org> - 2013-08-23 22:37 -0400
    Re: Python script help cool1574@gmail.com - 2013-08-01 09:02 -0700
      Re: Python script help Ulrich Eckhardt <ulrich.eckhardt@dominolaser.com> - 2013-08-02 10:44 +0200
    Re: Python script help cool1574@gmail.com - 2013-08-02 02:46 -0700
      Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-02 11:01 +0100
        Re: Python script help cool1574@gmail.com - 2013-08-04 08:57 -0700
          Re: Python script help Chris Angelico <rosuav@gmail.com> - 2013-08-04 17:20 +0100
      Re: Python script help Michael Torrie <torriem@gmail.com> - 2013-08-04 16:58 -0600
      Re: Python script help Jake Angulo <jake.angulo@gmail.com> - 2013-08-05 10:30 +1000

Page 1 of 2  [1] 2  Next page →


#51565 — Python script help

Fromcool1574@gmail.com
Date2013-07-30 07:49 -0700
SubjectPython script help
Message-ID<4566d0e7-2576-4d09-83f5-fca3b370710a@googlegroups.com>
Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
I appreciate your help,
Thank you.

[toc] | [next] | [standalone]


#51569

FromChris Angelico <rosuav@gmail.com>
Date2013-07-30 16:38 +0100
Message-ID<mailman.5315.1375198746.3114.python-list@python.org>
In reply to#51565
On Tue, Jul 30, 2013 at 3:49 PM,  <cool1574@gmail.com> wrote:
> Hello, I am looking for a script that will be able to search an online document (by giving the script the URL) and find all the downloadable links in the document and then download them automatically.
> I appreciate your help,
> Thank you.

baseurl = "http://........"
options = "....."
os.system("wget "+options+" "+baseurl)

Sometimes the right tool for the job isn't Python.

ChrisA

[toc] | [prev] | [next] | [standalone]


#51571

Fromcool1574@gmail.com
Date2013-07-30 08:49 -0700
Message-ID<c382dc59-d52c-47e6-8e96-16d4ef07e640@googlegroups.com>
In reply to#51569
I know but I think using Python in this situation is good...is that the full script?

[toc] | [prev] | [next] | [standalone]


#51573

FromChris Angelico <rosuav@gmail.com>
Date2013-07-30 16:58 +0100
Message-ID<mailman.5318.1375199904.3114.python-list@python.org>
In reply to#51571
On Tue, Jul 30, 2013 at 4:49 PM,  <cool1574@gmail.com> wrote:
> I know but I think using Python in this situation is good...is that the full script?

That script just drops out to the system and lets wget do it. So don't
bother with it.

ChrisA

[toc] | [prev] | [next] | [standalone]


#51576

Fromcool1574@gmail.com
Date2013-07-30 09:10 -0700
Message-ID<ea0a2387-4ce3-4441-ac8d-64eae6ca0aee@googlegroups.com>
In reply to#51573
What if I want to use only Python? is that possible? using lib and lib2? 

[toc] | [prev] | [next] | [standalone]


#51577

Fromcool1574@gmail.com
Date2013-07-30 09:12 -0700
Message-ID<7e9a50d5-3340-4cf0-a3b8-ef1109837529@googlegroups.com>
In reply to#51576
** urlib, urlib2

[toc] | [prev] | [next] | [standalone]


#51610

FromCameron Simpson <cs@zip.com.au>
Date2013-07-31 07:47 +1000
Message-ID<mailman.5345.1375220892.3114.python-list@python.org>
In reply to#51577
On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote:
| ** urlib, urlib2

Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
So: urllib[2] to fetch the document and BS to parse it for links,
then urllib[2] to fetch the links you want.

http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/

Cheers,
-- 
Cameron Simpson <cs@zip.com.au>

You can be psychotic and still be competent.
        - John L. Young, American Academy of Psychiatry and the Law on Ted
          Kaczynski, and probably most internet users

[toc] | [prev] | [next] | [standalone]


#51625

FromJoshua Landau <joshua@landau.ws>
Date2013-07-31 07:24 +0100
Message-ID<mailman.5352.1375251900.3114.python-list@python.org>
In reply to#51577

[Multipart message — attachments visible in raw view] — view raw

On 30 July 2013 22:47, Cameron Simpson <cs@zip.com.au> wrote:

> On 30Jul2013 09:12, cool1574@gmail.com <cool1574@gmail.com> wrote:
> | ** urlib, urlib2
>
> Sure. And I'd use BeautifulSoup to do the parse. You'll need to fetch that.
> So: urllib[2] to fetch the document and BS to parse it for links,
> then urllib[2] to fetch the links you want.
>
> http://www.crummy.com/software/BeautifulSoup/bs4/download/4.0/


Personally BeautifulSoup + requests is a great combination. Maybe I'm just
lazy ;).

[toc] | [prev] | [next] | [standalone]


#51580

FromChris Angelico <rosuav@gmail.com>
Date2013-07-30 17:22 +0100
Message-ID<mailman.5322.1375201340.3114.python-list@python.org>
In reply to#51576
On Tue, Jul 30, 2013 at 5:10 PM,  <cool1574@gmail.com> wrote:
> What if I want to use only Python? is that possible? using lib and lib2?
> --
> http://mail.python.org/mailman/listinfo/python-list

Sure, anything's possible. And a lot easier if you quote context in
your posts. But why do it? wget is exactly what you need.

ChrisA

[toc] | [prev] | [next] | [standalone]


#51583

FromVincent Vande Vyvre <vincent.vandevyvre@swing.be>
Date2013-07-30 18:58 +0200
Message-ID<mailman.5325.1375203985.3114.python-list@python.org>
In reply to#51576
Le 30/07/2013 18:10, cool1574@gmail.com a écrit :
> What if I want to use only Python? is that possible? using lib and lib2?

Have a look here:

http://bazaar.launchpad.net/~vincent-vandevyvre/qarte/trunk/view/head:/parsers.py

This script get a web page and parse it to find downloadable objects.
-- 
Vincent V.V.
Oqapy <https://launchpad.net/oqapy> . Qarte 
<https://launchpad.net/qarte> . PaQager <https://launchpad.net/paqager>

[toc] | [prev] | [next] | [standalone]


#51579

FromUlrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date2013-07-30 17:32 +0200
Message-ID<it2lca-4sl.ln1@satorlaser.homedns.org>
In reply to#51565
Am 30.07.2013 16:49, schrieb cool1574@gmail.com:
> Hello, I am looking for a script that will be able to search an
> online document (by giving the script the URL) and find all the
> downloadable links in the document and then download them
> automatically.

Well, that's actually pretty simple. Using the URL, download the 
document. Then, parse it in order to extract embedded URLs and finally 
download the resulting URLs.

If you have specific problems, please provide more info which part 
exactly you're having problems with, along with what you already tried 
etc. In short, show some effort yourself. In the meantime, I'd suggest 
reading a Python tutorial and Eric Raymonds essay on asking smart questions.

Greetings!

Uli

[toc] | [prev] | [next] | [standalone]


#51620

FromDenis McMahon <denismfmcmahon@gmail.com>
Date2013-07-31 05:08 +0000
Message-ID<kta64v$1ci$4@dont-email.me>
In reply to#51565
On Tue, 30 Jul 2013 07:49:04 -0700, cool1574 wrote:

> Hello, I am looking for a script that will be able to search an online
> document (by giving the script the URL) and find all the downloadable
> links in the document and then download them automatically.
> I appreciate your help,

Why use Python? Just:

wget -m url

-- 
Denis McMahon, denismfmcmahon@gmail.com

[toc] | [prev] | [next] | [standalone]


#51631

Fromcool1574@gmail.com
Date2013-07-31 01:15 -0700
Message-ID<e47a83a9-14cc-4596-b17c-d38c5f300151@googlegroups.com>
In reply to#51565
Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)
p.s: can I set a destination folder for the downloads?

urllib.urlopen("http://....")

possible_urls = re.findall(r'\S+:\S+', text)

import urllib2
response = urllib2.urlopen('http://www.example.com/')
html = response.read()

[toc] | [prev] | [next] | [standalone]


#51710

Fromalex23 <wuwei23@gmail.com>
Date2013-08-01 10:57 +1000
Message-ID<ktcbcl$gc1$1@dont-email.me>
In reply to#51631
On 31/07/2013 6:15 PM, cool1574@gmail.com wrote:
> Here are some scripts, how do I put them together to create the script I want? (to search a online document and download all the links in it)

1. Think about the requirements.
2. Write some code.
3. Test it.
4. Repeat until requirements are met.

> p.s: can I set a destination folder for the downloads?

Yes.

Show us you're actively trying to solve this yourself rather than just 
asking us to write the code for you.

[toc] | [prev] | [next] | [standalone]


#51727

FromAlister <alister.ware@ntlworld.com>
Date2013-08-01 10:39 +0000
Message-ID<wVqKt.32469$FS6.25619@fx10.am4>
In reply to#51710
On Thu, 01 Aug 2013 10:57:01 +1000, alex23 wrote:

> On 31/07/2013 6:15 PM, cool1574@gmail.com wrote:
>> Here are some scripts, how do I put them together to create the script
>> I want? (to search a online document and download all the links in it)
> 
> 1. Think about the requirements.
> 2. Write some code.
> 3. Test it.
> 4. Repeat until requirements are met.
> 
>> p.s: can I set a destination folder for the downloads?
> 
> Yes.
> 
> Show us you're actively trying to solve this yourself rather than just
> asking us to write the code for you.

alternatively i can provide a quotation to produce a product to your 
specification.
(My rates are extremely high)



-- 
Hand me a pair of leather pants and a CASIO keyboard -- I'm living for 
today!

[toc] | [prev] | [next] | [standalone]


#52923

FromPiet van Oostrum <piet@vanoostrum.org>
Date2013-08-23 22:37 -0400
Message-ID<m2ioyv3jc5.fsf@cochabamba.vanoostrum.org>
In reply to#51631
cool1574@gmail.com writes:

> Here are some scripts, how do I put them together to create the script
> I want? (to search a online document and download all the links in it)
> p.s: can I set a destination folder for the downloads?

You can use os.chdir to go to the desired folder.
>
> urllib.urlopen("http://....")
>
> possible_urls = re.findall(r'\S+:\S+', text)
>
> import urllib2
> response = urllib2.urlopen('http://www.example.com/')
> html = response.read()

If you insist on not using wget, here is a simple script with
BeautifulSoup (v4):

########################################################################
from bs4 import BeautifulSoup
from urllib2 import urlopen
from urlparse import urljoin
import os
import re

os.chdir('OUT')

def generate_filename(url):
    url = re.sub('^[a-zA-Z0-9+.-]+:/*', '', url)
    return url.replace('/', '_')

URL = "http://www.example.com/"
soup = BeautifulSoup(urlopen(URL).read())

links = soup.select('a[href]')
for link in links:
    url = urljoin(URL, link['href'])
    print url
    html = urlopen(url).read()
    fn = generate_filename(url)
    with open(fn, 'wb') as outfile:
        outfile.write(html)
########################################################################

You should add a more intelligent filename generator, filter out mail:
urls and possibly others and add exception handling for HTTP errors.
-- 
Piet van Oostrum <piet@vanoostrum.org>
WWW: http://pietvanoostrum.com/
PGP key: [8DAE142BE17999C4]

[toc] | [prev] | [next] | [standalone]


#51741

Fromcool1574@gmail.com
Date2013-08-01 09:02 -0700
Message-ID<f6842c80-369b-4fcc-a555-1c953cc1a865@googlegroups.com>
In reply to#51565

I know I should be testing out the script myself but I did, I tried and since I am new in python and I work for a security firm that ask me to scan hundreds of documents a day for unsafe links (by opening them) I thought writing a script will be much easier. I do not know how to combine those three scripts together (the ones I wrote in my previous replay) that is why I cam to here for help. please help me build a working script that will do the job.
Thanks in advance.
you can contact me at cool1574@gmail.com

[toc] | [prev] | [next] | [standalone]


#51787

FromUlrich Eckhardt <ulrich.eckhardt@dominolaser.com>
Date2013-08-02 10:44 +0200
Message-ID<a58sca-4n7.ln1@satorlaser.homedns.org>
In reply to#51741
Am 01.08.2013 18:02, schrieb cool1574@gmail.com:
> I know I should be testing out the script myself but I did, I tried
> and since I am new in python and I work for a security firm that ask
> me to scan hundreds of documents a day for unsafe links (by opening
> them) I thought writing a script will be much easier. I do not know
> how to combine those three scripts together (the ones I wrote in my
> previous replay) that is why I cam to here for help. please help me
> build a working script that will do the job.

This first option is to hire a programmer, which should give you the 
quickest results. If the most important thing is getting the job done, 
then this should be your #1 approach.

Now, if you really want to do it yourself, you will have to do some 
learning yourself. Start with http://docs.python.org, which includes 
tutorials, references and a bunch of other links, in particular go 
through the tutorials. Make sure you pick the documentation 
corresponding to your Python version though, versions 2 and 3 are subtly 
different!

Then, read http://www.catb.org/esr/faqs/smart-questions.html. This is a 
a bit metatopical but still important, and while this doesn't make you a 
programmer in an afternoon, it will help you understand various 
reactions you received here.

hope that gets you started

Uli


[toc] | [prev] | [next] | [standalone]


#51790

Fromcool1574@gmail.com
Date2013-08-02 02:46 -0700
Message-ID<1faf05ad-2cd4-497e-a605-db4650c04103@googlegroups.com>
In reply to#51565
I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...

[toc] | [prev] | [next] | [standalone]


#51791

FromChris Angelico <rosuav@gmail.com>
Date2013-08-02 11:01 +0100
Message-ID<mailman.111.1375437723.1251.python-list@python.org>
In reply to#51790
On Fri, Aug 2, 2013 at 10:46 AM,  <cool1574@gmail.com> wrote:
> I do know some Python programming, I just dont know enough to put together the various scripts I need...I would really really appreciate if some one can help me with that...

Be aware that you might be paying money for that. If you know "some"
carpentry but not enough to put together a bookcase, and you ask a
professional carpenter to make you a bookcase, you'll have to pay him.
The same is true in programming, except that there are more people
willing to work for nothing, hence the vague "might be" rather than
the inevitable "shall" or the mighty "must" [1]. To get people to work
for you for free, you have to make them (us) want to, which in the
geeky arts generally means making it an interesting problem. Achieving
this is described well in esr's essay on asking smart questions [2],
which Ulrich also just pointed you to. We do this sort of thing for
fun, for love, so if you make your problem appeal to us, there's a
high chance that someone will provide you with code.

[1] http://www.youtube.com/watch?v=GVVTYII422k
[2] http://www.catb.org/esr/faqs/smart-questions.html

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web