Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #43972 > unrolled thread

There must be a better way

Started by"Colin J. Williams" <cjw@ncf.ca>
First post2013-04-20 19:46 -0400
Last post2013-04-23 10:02 -0500
Articles 19 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-20 19:46 -0400
    Re: There must be a better way Chris Rebert <clp2@rebertia.com> - 2013-04-20 16:57 -0700
    Re: There must be a better way Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-04-21 00:06 +0000
      Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-20 19:34 -0500
      Re: There must be a better way Terry Jan Reedy <tjreedy@udel.edu> - 2013-04-20 21:07 -0400
        Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 09:15 -0400
          Re: There must be a better way Jussi Piitulainen <jpiitula@ling.helsinki.fi> - 2013-04-21 16:39 +0300
            Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:17 -0400
          Re: There must be a better way Peter Otten <__peter__@web.de> - 2013-04-21 15:43 +0200
            Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
            Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-21 11:30 -0400
          Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-22 15:32 +0100
          Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-22 14:42 +0000
            Re: There must be a better way "Colin J. Williams" <cjw@ncf.ca> - 2013-04-22 13:44 -0400
              Re: There must be a better way Neil Cerutti <neilc@norwich.edu> - 2013-04-23 13:36 +0000
                Re: There must be a better way Oscar Benjamin <oscar.j.benjamin@gmail.com> - 2013-04-23 15:15 +0100
                Re: There must be a better way Tim Chase <python.list@tim.thechases.com> - 2013-04-23 09:30 -0500
                Re: There must be a better way Skip Montanaro <skip@pobox.com> - 2013-04-23 09:36 -0500
                Re: There must be a better way (correction) Tim Chase <python.list@tim.thechases.com> - 2013-04-23 10:02 -0500

#43972 — There must be a better way

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-20 19:46 -0400
SubjectThere must be a better way
Message-ID<kkv9bt$9bm$1@theodyn.ncf.ca>
Below is part of a script which shows the changes made to permit the 
script to run on either Python 2.7 or Python 3.2.

I was surprised to see that the CSV next method is no longer available.

Suggestions welcome.

Colin W.


def main():
     global inData, inFile
     if ver == '2':
        headerLine= inData.next()
     else:  # Python version 3.3
         inFile.close()
         inFile= open('Don Wall April 18 2013.csv', 'r', newline= '')
         inData= csv.reader(inFile)
         headerLine= inData.__next__()

[toc] | [next] | [standalone]


#43974

FromChris Rebert <clp2@rebertia.com>
Date2013-04-20 16:57 -0700
Message-ID<mailman.866.1366502229.3114.python-list@python.org>
In reply to#43972
On Sat, Apr 20, 2013 at 4:46 PM, Colin J. Williams <cjw@ncf.ca> wrote:
> Below is part of a script which shows the changes made to permit the script
> to run on either Python 2.7 or Python 3.2.
>
> I was surprised to see that the CSV next method is no longer available.
>
> Suggestions welcome.
<snip>
>     if ver == '2':
>        headerLine= inData.next()
>     else:  # Python version 3.3
<snip>
>         headerLine= inData.__next__()

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
    headerLine = next(iter(inData))

Cheers,
Chris

[toc] | [prev] | [next] | [standalone]


#43975

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2013-04-21 00:06 +0000
Message-ID<51732d81$0$29977$c3e8da3$5496439d@news.astraweb.com>
In reply to#43972
On Sat, 20 Apr 2013 19:46:07 -0400, Colin J. Williams wrote:

> Below is part of a script which shows the changes made to permit the
> script to run on either Python 2.7 or Python 3.2.
> 
> I was surprised to see that the CSV next method is no longer available.

This makes no sense. What's "the CSV next method"? Are you talking about 
the csv module? It has no "next method".

py> import csv
py> csv.next
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'next'


Please *define your terms*, otherwise we are flailing in the dark trying 
to guess what your code is supposed to do. The code you provide cannot 
possible work -- you use variables before they are defined, use other 
variables that are never defined at all, reference mysterious globals. 
You even close a file before it is opened!

Please read this:

http://sscce.org/

and provide a *short, self-contained, correct example* that we can 
actually run.

But in the meantime, I'm going to consult the entrails and try to guess 
what you are doing: you're complaining that iterators have a next method 
in Python 2, and __next__ in Python 3. Am I correct?

If so, this is true, but you should not be using the plain next method in 
Python 2. You should be using the built-in function next(), not calling 
the method directly. The plain next *method* was a mistake, only left in 
for compatibility with older versions of Python. Starting from Python 2.6 
the correct way to get the next value from an arbitrary iterator is with 
the built-in function next(), not by calling a method directly.

(In the same way that you get the length of a sequence by calling the 
built-in function len(), not by calling the __len__ method directly.)

So provided you are using Python 2.6 or better, you call:

next(inData)

to get the next value, regardless of whether it is Python 2.x or 3.x.

If you need to support older versions, you can do this:

try:
    next  # Does the built-in already exist?
except NameError:
    # No, we define our own.
    def next(iterator):
        return iterator.next()

then just use next(inData) as normal.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#43977

FromTim Chase <python.list@tim.thechases.com>
Date2013-04-20 19:34 -0500
Message-ID<mailman.867.1366504376.3114.python-list@python.org>
In reply to#43975
On 2013-04-21 00:06, Steven D'Aprano wrote:
> On Sat, 20 Apr 2013 19:46:07 -0400, Colin J. Williams wrote:
> 
> > Below is part of a script which shows the changes made to permit
> > the script to run on either Python 2.7 or Python 3.2.
> > 
> > I was surprised to see that the CSV next method is no longer
> > available.
> 
> This makes no sense. What's "the CSV next method"? Are you talking
> about the csv module? It has no "next method".

In 2.x, the csv.reader() class (and csv.DictReader() class) offered
a .next() method that is absent in 3.x  For those who use(d) the
csv.reader object on a regular basis, this was a pretty common
usage.  Particularly if you had to do your own header parsing:

  f = open(...)
  r = csv.reader(f)
  try:
    headers = r.next()
    header_map = analyze(headers)
    for row in r:
      foo = row[header_map["FOO COLUMN"]]
      process(foo)
  finally:
    f.close()

(I did this for a number of cases where the client couldn't
consistently send column-headers in a consistent
capitalization/spaces, so my header-making function had to normalize
the case/spaces and then reference the normalized names)

> So provided you are using Python 2.6 or better, you call:
> 
> next(inData)
> 
> to get the next value, regardless of whether it is Python 2.x or
> 3.x.
> 
> If you need to support older versions, you can do this:
> 
> try:
>     next  # Does the built-in already exist?
> except NameError:
>     # No, we define our own.
>     def next(iterator):
>         return iterator.next()
> 
> then just use next(inData) as normal.

This is a good expansion of Chris Rebert's suggestion to use next(),
as those of us that have to support pre-2.6 code lack the next()
function out of the box.

-tkc



[toc] | [prev] | [next] | [standalone]


#43979

FromTerry Jan Reedy <tjreedy@udel.edu>
Date2013-04-20 21:07 -0400
Message-ID<mailman.869.1366506610.3114.python-list@python.org>
In reply to#43975
On 4/20/2013 8:34 PM, Tim Chase wrote:
> In 2.x, the csv.reader() class (and csv.DictReader() class) offered
> a .next() method that is absent in 3.x

In Py 3, .next was renamed to .__next__ for *all* iterators. The 
intention is that one iterate with for item in iterable or use builtin 
functions iter() and next().

[toc] | [prev] | [next] | [standalone]


#44001

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-21 09:15 -0400
Message-ID<kl0opb$pcr$1@theodyn.ncf.ca>
In reply to#43979
On 20/04/2013 9:07 PM, Terry Jan Reedy wrote:
> On 4/20/2013 8:34 PM, Tim Chase wrote:
>> In 2.x, the csv.reader() class (and csv.DictReader() class) offered
>> a .next() method that is absent in 3.x
>
> In Py 3, .next was renamed to .__next__ for *all* iterators. The
> intention is that one iterate with for item in iterable or use builtin
> functions iter() and next().
>
>
Thanks to Chris, Tim and Terry for their helpful comments.

I was seeking some code that would be acceptable to both Python 2.7 and 3.3.

In the end, I used:

inData= csv.reader(inFile)

def main():
     if ver == '2':
         headerLine= inData.next()
     else:
         headerLine= inData.__next__()
     ...
     for item in inData:
         assert len(dataStore) == len(item)
         j= findCardinal(item[10])
         ...

This is acceptable to both versions.

It is not usual to have a name with preceding and following 
udserscores,imn user code.

Presumably, there is a rationale for the change from csv.reader.next
to csv.reader.__next__.

If next is not acceptable for the version 3 csv.reader, perhaps __next__ 
could be added to the version 2 csv.reader, so that the same code can be 
used in the two versions.

This would avoid the kluge I used above.

Colin W.

[toc] | [prev] | [next] | [standalone]


#44002

FromJussi Piitulainen <jpiitula@ling.helsinki.fi>
Date2013-04-21 16:39 +0300
Message-ID<qot61zgxatg.fsf@ruuvi.it.helsinki.fi>
In reply to#44001
Colin J. Williams writes:
...
> It is not usual to have a name with preceding and following
> udserscores,imn user code.
> 
> Presumably, there is a rationale for the change from csv.reader.next
> to csv.reader.__next__.
...

I think the user code is supposed to be next(csv.reader). For example,
current documentation contains the following.

# csvreader.__next__()
# Return the next row of the reader’s iterable object as a list,
# parsed according to the current dialect. Usually you should call
# this as next(reader).

[toc] | [prev] | [next] | [standalone]


#44008

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-21 11:17 -0400
Message-ID<kl0vsk$2u5$1@theodyn.ncf.ca>
In reply to#44002
On 21/04/2013 9:39 AM, Jussi Piitulainen wrote:
> Colin J. Williams writes:
> ...
>> It is not usual to have a name with preceding and following
>> udserscores,imn user code.
>>
>> Presumably, there is a rationale for the change from csv.reader.next
>> to csv.reader.__next__.
> ...
>
> I think the user code is supposed to be next(csv.reader). For example,
> current documentation contains the following.
>
> # csvreader.__next__()
> # Return the next row of the reader’s iterable object as a list,
> # parsed according to the current dialect. Usually you should call
> # this as next(reader).
>
Thanks,

This works with both 2.7 and 3.3

Colin W.

[toc] | [prev] | [next] | [standalone]


#44003

FromPeter Otten <__peter__@web.de>
Date2013-04-21 15:43 +0200
Message-ID<mailman.878.1366551835.3114.python-list@python.org>
In reply to#44001
Colin J. Williams wrote:

> I was seeking some code that would be acceptable to both Python 2.7 and
> 3.3.
> 
> In the end, I used:
> 
> inData= csv.reader(inFile)
> 
> def main():
>      if ver == '2':
>          headerLine= inData.next()
>      else:
>          headerLine= inData.__next__()
>      ...

I think it was mentioned before, but to be explicit:

def main():
    headerLine = next(inData)
    ...

works in Python 2.6, 2.7, and 3.x.

[toc] | [prev] | [next] | [standalone]


#44010

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-21 11:30 -0400
Message-ID<5174062D.2090009@ncf.ca>
In reply to#44003
On 21/04/2013 9:43 AM, Peter Otten wrote:
> Colin J. Williams wrote:
>
>> I was seeking some code that would be acceptable to both Python 2.7 and
>> 3.3.
>>
>> In the end, I used:
>>
>> inData= csv.reader(inFile)
>>
>> def main():
>>       if ver == '2':
>>           headerLine= inData.next()
>>       else:
>>           headerLine= inData.__next__()
>>       ...
>
> I think it was mentioned before, but to be explicit:
>
> def main():
>      headerLine = next(inData)
>      ...
>
> works in Python 2.6, 2.7, and 3.x.
>

Yes, the penny dropped eventually.  I've used your statement

The Chris suggestion was slightly different:

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
     headerLine = next(iter(inData))

Colin W.

[toc] | [prev] | [next] | [standalone]


#44011

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-21 11:30 -0400
Message-ID<mailman.883.1366558805.3114.python-list@python.org>
In reply to#44003
On 21/04/2013 9:43 AM, Peter Otten wrote:
> Colin J. Williams wrote:
>
>> I was seeking some code that would be acceptable to both Python 2.7 and
>> 3.3.
>>
>> In the end, I used:
>>
>> inData= csv.reader(inFile)
>>
>> def main():
>>       if ver == '2':
>>           headerLine= inData.next()
>>       else:
>>           headerLine= inData.__next__()
>>       ...
>
> I think it was mentioned before, but to be explicit:
>
> def main():
>      headerLine = next(inData)
>      ...
>
> works in Python 2.6, 2.7, and 3.x.
>

Yes, the penny dropped eventually.  I've used your statement

The Chris suggestion was slightly different:

Use the built-in next() function
(http://docs.python.org/2/library/functions.html#next ) instead:
     headerLine = next(iter(inData))

Colin W.

[toc] | [prev] | [next] | [standalone]


#44078

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-22 15:32 +0100
Message-ID<mailman.918.1366641146.3114.python-list@python.org>
In reply to#44001
On 21 April 2013 14:15, Colin J. Williams <cjw@ncf.ca> wrote:
> In the end, I used:
>
> inData= csv.reader(inFile)
>
> def main():
>
>     if ver == '2':
>         headerLine= inData.next()
>     else:
>         headerLine= inData.__next__()
>     ...
>     for item in inData:
>         assert len(dataStore) == len(item)
>         j= findCardinal(item[10])
>         ...

This may not be relevant for what you're doing but if you use
csv.DictReader there's no need to retrieve the top line separately:

$ cat tmp.csv
a,b,c
1,2,3
4,5,6
$ python
Python 2.7.1 (r271:86832, Nov 27 2010, 18:30:46) [MSC v.1500 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import csv
>>> with open('tmp.csv', 'rb') as csvfile:
...   for row in csv.DictReader(csvfile):
...     print(row)
...
{'a': '1', 'c': '3', 'b': '2'}
{'a': '4', 'c': '6', 'b': '5'}


Oscar

[toc] | [prev] | [next] | [standalone]


#44079

FromNeil Cerutti <neilc@norwich.edu>
Date2013-04-22 14:42 +0000
Message-ID<atl0i2Fto6uU2@mid.individual.net>
In reply to#44001
On 2013-04-21, Colin J. Williams <cjw@ncf.ca> wrote:
> On 20/04/2013 9:07 PM, Terry Jan Reedy wrote:
>> On 4/20/2013 8:34 PM, Tim Chase wrote:
>>> In 2.x, the csv.reader() class (and csv.DictReader() class) offered
>>> a .next() method that is absent in 3.x
>>
>> In Py 3, .next was renamed to .__next__ for *all* iterators. The
>> intention is that one iterate with for item in iterable or use builtin
>> functions iter() and next().
>>
>>
> Thanks to Chris, Tim and Terry for their helpful comments.
>
> I was seeking some code that would be acceptable to both Python 2.7 and 3.3.
>
> In the end, I used:
>
> inData= csv.reader(inFile)
>
> def main():
>      if ver == '2':
>          headerLine= inData.next()
>      else:
>          headerLine= inData.__next__()
>      ...
>      for item in inData:
>          assert len(dataStore) == len(item)
>          j= findCardinal(item[10])
>          ...
>
> This is acceptable to both versions.
>
> It is not usual to have a name with preceding and following 
> udserscores,imn user code.
>
> Presumably, there is a rationale for the change from csv.reader.next
> to csv.reader.__next__.
>
> If next is not acceptable for the version 3 csv.reader, perhaps __next__ 
> could be added to the version 2 csv.reader, so that the same code can be 
> used in the two versions.
>
> This would avoid the kluge I used above.

Would using csv.DictReader instead a csv.reader be an option?

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#44100

From"Colin J. Williams" <cjw@ncf.ca>
Date2013-04-22 13:44 -0400
Message-ID<kl3stb$5ck$1@theodyn.ncf.ca>
In reply to#44079
On 22/04/2013 10:42 AM, Neil Cerutti wrote:
> On 2013-04-21, Colin J. Williams <cjw@ncf.ca> wrote:
>> On 20/04/2013 9:07 PM, Terry Jan Reedy wrote:
>>> On 4/20/2013 8:34 PM, Tim Chase wrote:
>>>> In 2.x, the csv.reader() class (and csv.DictReader() class) offered
>>>> a .next() method that is absent in 3.x
>>>
>>> In Py 3, .next was renamed to .__next__ for *all* iterators. The
>>> intention is that one iterate with for item in iterable or use builtin
>>> functions iter() and next().
>>>
>>>
>> Thanks to Chris, Tim and Terry for their helpful comments.
>>
>> I was seeking some code that would be acceptable to both Python 2.7 and 3.3.
>>
>> In the end, I used:
>>
>> inData= csv.reader(inFile)
>>
>> def main():
>>       if ver == '2':
>>           headerLine= inData.next()
>>       else:
>>           headerLine= inData.__next__()
>>       ...
>>       for item in inData:
>>           assert len(dataStore) == len(item)
>>           j= findCardinal(item[10])
>>           ...
>>
>> This is acceptable to both versions.
>>
>> It is not usual to have a name with preceding and following
>> udserscores,imn user code.
>>
>> Presumably, there is a rationale for the change from csv.reader.next
>> to csv.reader.__next__.
>>
>> If next is not acceptable for the version 3 csv.reader, perhaps __next__
>> could be added to the version 2 csv.reader, so that the same code can be
>> used in the two versions.
>>
>> This would avoid the kluge I used above.
>
> Would using csv.DictReader instead a csv.reader be an option?
>
Since I'm only interested in one or two columns, the simpler approach is 
probably better.

Colin W.

[toc] | [prev] | [next] | [standalone]


#44179

FromNeil Cerutti <neilc@norwich.edu>
Date2013-04-23 13:36 +0000
Message-ID<atnh2jFgv8iU1@mid.individual.net>
In reply to#44100
On 2013-04-22, Colin J. Williams <cjw@ncf.ca> wrote:
> Since I'm only interested in one or two columns, the simpler
> approach is probably better.

Here's a sketch of how one of my projects handles that situation.
I think the index variables are invaluable documentation, and
make it a bit more robust. (Python 3, so not every bit is
relevant to you).

with open("today.csv", encoding='UTF-8', newline='') as today_file:
    reader = csv.reader(today_file)
    header = next(reader)
    majr_index = header.index('MAJR')
    div_index = header.index('DIV')
    for rec in reader:
        major = rec[majr_index]
        rec[div_index] = DIVISION_TABLE[major]

But a csv.DictReader might still be more efficient. I never
tested. This is the only place I've used this "optimization".
It's fast enough. ;)

-- 
Neil Cerutti

[toc] | [prev] | [next] | [standalone]


#44180

FromOscar Benjamin <oscar.j.benjamin@gmail.com>
Date2013-04-23 15:15 +0100
Message-ID<mailman.972.1366726549.3114.python-list@python.org>
In reply to#44179
On 23 April 2013 14:36, Neil Cerutti <neilc@norwich.edu> wrote:
> On 2013-04-22, Colin J. Williams <cjw@ncf.ca> wrote:
>> Since I'm only interested in one or two columns, the simpler
>> approach is probably better.
>
> Here's a sketch of how one of my projects handles that situation.
> I think the index variables are invaluable documentation, and
> make it a bit more robust. (Python 3, so not every bit is
> relevant to you).
>
> with open("today.csv", encoding='UTF-8', newline='') as today_file:
>     reader = csv.reader(today_file)
>     header = next(reader)

I once had a bug that took a long time to track down and was caused by
using next() without an enclosing try/except StopIteration (or the
optional default argument to next).

This is a sketch of how you can get the bug that I had:

$ cat next.py
#!/usr/bin/env python

def join(iterables):
    '''Join iterable of iterables, stripping first item'''
    for iterable in iterables:
        iterator = iter(iterable)
        header = next(iterator)  # Here's the problem
        for val in iterator:
            yield val

data = [
    ['foo', 1, 2, 3],
    ['bar', 4, 5, 6],
    [], # Whoops! Who put this empty iterable here?
    ['baz', 7, 8, 9],
]

for x in join(data):
    print(x)

$ ./next.py
1
2
3
4
5
6

The values 7, 8 and 9 are not printed but no error message is shown.
This is because calling next on the iterator over the empty list
raises a StopIteration that is not caught in the join generator. The
StopIteration is then "caught" by the for loop that iterates over
join() causing the loop to terminate prematurely. Since the exception
is caught and cleared by the for loop there's no practical way to get
a debugger to hook into the event that causes it.

In my case this happened somewhere in the middle of a long running
process. It was difficult to pin down what was causing this as the
iteration was over non-constant data and I didn't know what I was
looking for. As a result of the time spent fixing this I'm always very
cautious about calling next() to think about what a StopIteration
would do in context.

In this case a StopIteration is raised when reading from an empty csv file:

>>> import csv
>>> with open('test.csv', 'w'): pass
...
>>> with open('test.csv') as csvfile:
...     reader = csv.reader(csvfile)
...     header = next(reader)
...
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
StopIteration

If that code were called from a generator then it would most likely be
susceptible to the problem I'm describing. The fix is to use
next(reader, None) or try/except StopIteration.


Oscar

[toc] | [prev] | [next] | [standalone]


#44181

FromTim Chase <python.list@tim.thechases.com>
Date2013-04-23 09:30 -0500
Message-ID<mailman.973.1366727321.3114.python-list@python.org>
In reply to#44179
On 2013-04-23 13:36, Neil Cerutti wrote:
> On 2013-04-22, Colin J. Williams <cjw@ncf.ca> wrote:
> > Since I'm only interested in one or two columns, the simpler
> > approach is probably better.
> 
> Here's a sketch of how one of my projects handles that situation.
> I think the index variables are invaluable documentation, and
> make it a bit more robust. (Python 3, so not every bit is
> relevant to you).
> 
> with open("today.csv", encoding='UTF-8', newline='') as today_file:
>     reader = csv.reader(today_file)
>     header = next(reader)
>     majr_index = header.index('MAJR')
>     div_index = header.index('DIV')
>     for rec in reader:
>         major = rec[majr_index]
>         rec[div_index] = DIVISION_TABLE[major]
> 
> But a csv.DictReader might still be more efficient. I never
> tested. This is the only place I've used this "optimization".
> It's fast enough. ;)

I believe the csv module does all the work at c-level, rather than
as  pure Python, so it should be notably faster.  The only times I've
had to do things by hand like that are when there are header
peculiarities that I can't control, such as mismatched case or
added/remove punctuation (client files are notorious for this).  So I
often end up doing something like

  def normalize(header):
    return header.strip().upper() # other cleanup as needed

  reader = csv.reader(f)
  headers = next(reader)
  header_map = dict(
    (normalize(header), i)
    for i, header
    in enumerate(headers)
    )
  item = lambda col: row[header_map[col]].strip()
  for row in reader:
    major = item("MAJR").upper()
    division = item("DIV")
    # ...

The function calling might add overhead (in which case one could
just use explicit indirect indexing for each value assignment:

  major = row[header_map["MAJR"]].strip().upper()

but I usually find that processing CSV files leaves me I/O bound
rather than CPU bound.

-tkc


[toc] | [prev] | [next] | [standalone]


#44184

FromSkip Montanaro <skip@pobox.com>
Date2013-04-23 09:36 -0500
Message-ID<mailman.975.1366727824.3114.python-list@python.org>
In reply to#44179
> But a csv.DictReader might still be more efficient.

Depends on what efficiency you care about.  The DictReader class is
implemented in Python, and builds a dict for every row.  It will never
be more efficient CPU-wise than instantiating the csv.reader type
directly and only doing what you need.

OTOH, the DictReader class "just works" and its usage is more obvious
when you come back later to modify your code.  It also makes the code
insensitive to column ordering (though yours seems to be as well, if
I'm reading it correctly).  On the programmer efficiency axis, I score
the DictReader class higher than the reader type.

A simple test:

##########################
import csv
from timeit import Timer

setup = '''import csv
lst = ["""a,b,c,d,e,f,g"""]
lst.extend(["""05:38:24,0.6326,1,0,1.0,0.0,0.0"""] * 1000000)
reader = csv.reader(lst)
dreader = csv.DictReader(lst)
'''

t1 = Timer("for row in reader: pass", setup)
t2 = Timer("for row in dreader: pass", setup)

print(min(t1.repeat(number=10)))
print(min(t2.repeat(number=10)))
###############################

demonstrates that the raw reader is, indeed, much faster than the DictReader:

0.972723007202
8.29047989845

but that's for the basic iteration.  Whatever you need to add to the
raw reader to insulate yourself from changes to the structure of the
CSV file and improve readability will slow it down, while the
DictReader will never be worse than the above.

Skip

[toc] | [prev] | [next] | [standalone]


#44187 — Re: There must be a better way (correction)

FromTim Chase <python.list@tim.thechases.com>
Date2013-04-23 10:02 -0500
SubjectRe: There must be a better way (correction)
Message-ID<mailman.977.1366729285.3114.python-list@python.org>
In reply to#44179
On 2013-04-23 09:30, Tim Chase wrote:
> > But a csv.DictReader might still be more efficient. I never
> > tested. This is the only place I've used this "optimization".
> > It's fast enough. ;)
> 
> I believe the csv module does all the work at c-level, rather than
> as  pure Python, so it should be notably faster.

A little digging shows that csv.DictReader is pure Python, using the
underlying _csv.reader which is written in C for speed.

-tkc


[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web