Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #61759 > unrolled thread

Downloading multiple files based on info extracted from CSV

Started byMatt Graves <tunacubes@gmail.com>
First post2013-12-12 13:43 -0800
Last post2013-12-12 20:52 -0500
Articles 6 — 5 participants

Back to article view | Back to comp.lang.python


Contents

  Downloading multiple files based on info extracted from CSV Matt Graves <tunacubes@gmail.com> - 2013-12-12 13:43 -0800
    Re: Downloading multiple files based on info extracted from CSV Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-12-12 22:19 +0000
    Re: Downloading multiple files based on info extracted from CSV Chris Angelico <rosuav@gmail.com> - 2013-12-13 09:20 +1100
      Re: Downloading multiple files based on info extracted from CSV Matt Graves <tunacubes@gmail.com> - 2013-12-16 06:17 -0800
    Re: Downloading multiple files based on info extracted from CSV John Gordon <gordon@panix.com> - 2013-12-12 22:21 +0000
    Re: Downloading multiple files based on info extracted from CSV Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-12-12 20:52 -0500

#61759 — Downloading multiple files based on info extracted from CSV

FromMatt Graves <tunacubes@gmail.com>
Date2013-12-12 13:43 -0800
SubjectDownloading multiple files based on info extracted from CSV
Message-ID<88346903-2af8-48cd-9829-37cedb717ae5@googlegroups.com>
I have a CSV file containing a bunch of URLs I have to download a file from for clients (Column 7) and the clients names (Column 0) I tried making a script to go down the .csv file and just download each file from column 7, and save the file as [clientname].csv

I am relatively new to python, so this may be way off but…






import urllib 
import csv
urls = []
clientname = []

###This will set column 7 to be a list of urls
with open('clients.csv', 'r') as f:
    reader = csv.reader(f)
    for column in reader:
        urls.append(column[7])

###And this will set column 0 as a list of client names
with open('clients.csv', 'r') as g:
    reader = csv.reader(g)
    for column in reader:
        clientname.append(column[0])

###This SHOULD plug in the URL for F, and the client name for G.
def downloadFile(urls, clientname):
    urllib.urlretrieve(f, "%g.csv") % clientname


downloadFile(f,g)



When I run it, I get : AttributeError: 'file' object has no attribute 'strip'

[toc] | [next] | [standalone]


#61768

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2013-12-12 22:19 +0000
Message-ID<mailman.4036.1386886796.18130.python-list@python.org>
In reply to#61759
On 12/12/2013 21:43, Matt Graves wrote:
> I have a CSV file containing a bunch of URLs I have to download a file from for clients (Column 7) and the clients names (Column 0) I tried making a script to go down the .csv file and just download each file from column 7, and save the file as [clientname].csv
>
> I am relatively new to python, so this may be way off but…
>
> import urllib
> import csv
> urls = []
> clientname = []

I assume clientnames.

>
> ###This will set column 7 to be a list of urls
> with open('clients.csv', 'r') as f:
>      reader = csv.reader(f)
>      for column in reader:
>          urls.append(column[7])
>
> ###And this will set column 0 as a list of client names
> with open('clients.csv', 'r') as g:
>      reader = csv.reader(g)
>      for column in reader:
>          clientname.append(column[0])

You could do the above in one hit.

with open('clients.csv', 'r') as f:
      reader = csv.reader(f)
      for row in reader:
          urls.append(row[7])
          clientnames.append(row[0])

Note that you're reading rows, not columns.

>
> ###This SHOULD plug in the URL for F, and the client name for G.

What makes you think this, f and g are file handles?

> def downloadFile(urls, clientname):
>      urllib.urlretrieve(f, "%g.csv") % clientname
>

If you want one file at a time you'd want url, clientname.

>
> downloadFile(f,g)

I think you want something like.

for url, clientname in zip(urls, clientnames):
     downloadFile(url, clientname)

>
> When I run it, I get : AttributeError: 'file' object has no attribute 'strip'
>

When you get a traceback like this please cut and paste all it of, not 
just the last line.  Here it seems likely that your call to downloadFile 
doesn't like you passing in the file handle as I've explained above (I 
hope :)

-- 
My fellow Pythonistas, ask not what our language can do for you, ask 
what you can do for our language.

Mark Lawrence

[toc] | [prev] | [next] | [standalone]


#61769

FromChris Angelico <rosuav@gmail.com>
Date2013-12-13 09:20 +1100
Message-ID<mailman.4037.1386886869.18130.python-list@python.org>
In reply to#61759
On Fri, Dec 13, 2013 at 8:43 AM, Matt Graves <tunacubes@gmail.com> wrote:
> ###This SHOULD plug in the URL for F, and the client name for G.
> def downloadFile(urls, clientname):
>     urllib.urlretrieve(f, "%g.csv") % clientname
>
> downloadFile(f,g)
>
> When I run it, I get : AttributeError: 'file' object has no attribute 'strip'

When showing errors like this, you really need to copy and paste.
Fortunately, I can see where the problem is, here. You're referencing
the file object still in f, which is now a closed file object, instead
of the parameter urls.

But you're also passing f and g as parameters, instead of urls and
clientname. In fact, the downloadFile function isn't really achieving
much; you'd do better to simply inline its code into the main routine
and save yourself the hassle.

While you're at it, there are two more problems in that line of code.
Firstly, you're going to save everything into a file called "%g.csv",
and then try to modulo the return value of urlretrieve with the
clientname; I think you want the close parens at the very end of that
line. And secondly, %g is a floating-point encoder - you want %s here,
or simply use string concatenation:

urllib.urlretrieve(urls, clientname + ".csv")

Except that those are your lists, so that won't work without another
change. We'll fix that later...

> ###This will set column 7 to be a list of urls
> with open('clients.csv', 'r') as f:
>     reader = csv.reader(f)
>     for column in reader:
>         urls.append(column[7])
>
> ###And this will set column 0 as a list of client names
> with open('clients.csv', 'r') as g:
>     reader = csv.reader(g)
>     for column in reader:
>         clientname.append(column[0])

You're reading the file twice. There's no reason to do that; you can
read both columns at once. (By the way, what you're iterating over is
actually rows; for each row that comes out of the reader, do something
with one element from it. So calling it "column" is a bit confusing.)
So now we come to a choice. Question: Is it okay to hold the CSV file
open while you do the downloading? If it is, you can simplify the code
way way down:

import urllib
import csv

# You actually could get away with not using a with
# block here, but may as well keep it for best practice
with open('clients.csv') as f:
    for client in csv.reader(f):
        urllib.urlretrieve(client[7], client[0] + ".csv")

Yep, that's it! That's all you need. But retrieving all that might
take a long time, so it might be better to do all your CSV reading
first and only *then* start downloading. In that case, I'd make a
single list of tuples:

import urllib
import csv

clients = []
with open('clients.csv') as f:
    for client in csv.reader(f):
        clients.append((client[7], client[0] + ".csv"))

for client in clients:
    urllib.urlretrieve(client[0], client[1])

And since the "iterate and append to a new list" idiom is so common,
it can be simplified down to a list comprehension; and since "call
this function with this tuple of arguments" is so common, it has its
own syntax. So the code looks like this:

import urllib
import csv

with open('clients.csv') as f:
    clients = [client[7], client[0]+".csv" for client in csv.reader(f)]

for client in clients:
    urllib.urlretrieve(*client)

Again, it's really that simple! :)

Enjoy!

ChrisA

[toc] | [prev] | [next] | [standalone]


#62065

FromMatt Graves <tunacubes@gmail.com>
Date2013-12-16 06:17 -0800
Message-ID<1148ee9d-8439-4856-ab93-5b998845b7cc@googlegroups.com>
In reply to#61769
On Thursday, December 12, 2013 5:20:59 PM UTC-5, Chris Angelico wrote:

> import urllib
> 
> import csv
> 
> 
> 
> # You actually could get away with not using a with
> 
> # block here, but may as well keep it for best practice
> 
> with open('clients.csv') as f:
> 
>     for client in csv.reader(f):
> 
>         urllib.urlretrieve(client[7], client[0] + ".csv")
> 
> 
> 
> Yep, that's it! That's all you need. 


Worked perfect. Thank you!

[toc] | [prev] | [next] | [standalone]


#61770

FromJohn Gordon <gordon@panix.com>
Date2013-12-12 22:21 +0000
Message-ID<l8dctq$gb8$1@reader1.panix.com>
In reply to#61759
In <88346903-2af8-48cd-9829-37cedb717ae5@googlegroups.com> Matt Graves <tunacubes@gmail.com> writes:

> import urllib
> import csv
> urls = []
> clientname = []

> ###This will set column 7 to be a list of urls
> with open('clients.csv', 'r') as f:
>     reader = csv.reader(f)
>     for column in reader:
>         urls.append(column[7])

> ###And this will set column 0 as a list of client names
> with open('clients.csv', 'r') as g:
>     reader = csv.reader(g)
>     for column in reader:
>         clientname.append(column[0])

> ###This SHOULD plug in the URL for F, and the client name for G.
> def downloadFile(urls, clientname):
>     urllib.urlretrieve(f, "%g.csv") % clientname

> downloadFile(f,g)

> When I run it, I get : AttributeError: 'file' object has no attribute
> 'strip'

I think you're passing the wrong arguments to downloadFile().  You're
calling downloadFile(f, g), but f and g are file objects.  Don't you want
to pass urls and clientname instead?

Even if the correct arguments are passed to downloadFile, I think you're
using them incorrectly.  You don't even use the urls argument, and
clientname is supposed to be a list, so why aren't you looping through
it?

You aren't using string interpolation correctly on the call to urlretrieve.
Assuming your intent was to build a string and pass it as the second
argument, you have the close-parenthesis in the wrong place.  The call
should look like this:

    urllib.urlretrieve(f, "%g.csv" % clientname)

"%g" returns a floating-point value.  Did you mean "%s" instead?)

-- 
John Gordon         Imagine what it must be like for a real medical doctor to
gordon@panix.com    watch 'House', or a real serial killer to watch 'Dexter'.

[toc] | [prev] | [next] | [standalone]


#61781

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2013-12-12 20:52 -0500
Message-ID<mailman.4043.1386899560.18130.python-list@python.org>
In reply to#61759
On Thu, 12 Dec 2013 13:43:50 -0800 (PST), Matt Graves <tunacubes@gmail.com>
declaimed the following:

>def downloadFile(urls, clientname):
>    urllib.urlretrieve(f, "%g.csv") % clientname
>
	The most blatent error is this line...

	You are calling urllib.urlretrieve passing it arguments of f and the
string "%g.csv".

	THEN you are doing a string interpolation on the RESULT.

	The line should probably be

	urllib.urlretrieve(f, "%s.csv" % clientname)

to apply the string interpolation first, and pass that as the second
argument (also not the %s for /string/ [which, in Python, tends to accept
any argument and produce a general string from it -- using any other format
specification tends to be for cases where you need to match a particular
format... %4.4x if you want zero-filled hex, for example])
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
    wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web