Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!eweka.nl!lightspeed.eweka.nl!194.109.133.87.MISMATCH!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
To: python-list@python.org
From: Dave Angel <davea@davea.name>
Subject: Re: Processing large CSV files - how to maximise throughput?
Date: Fri, 25 Oct 2013 03:57:17 +0000 (UTC)
References: <b4737555-cb4f-457b-aed7-a1e6553fe6a5@googlegroups.com> <mailman.1494.1382667030.18130.python-list@python.org> <5269e6f6$0$29972$c3e8da3$5496439d@news.astraweb.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
User-Agent: XPN/1.2.6 (Street Spirit ; Linux)
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.1497.1382673461.18130.python-list@python.org>
Lines: 31
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:57494

On 24/10/2013 23:35, Steven D'Aprano wrote:

> On Fri, 25 Oct 2013 02:10:07 +0000, Dave Angel wrote:
>
>>> If I have multiple large CSV files to deal with, and I'm on a
>>> multi-core machine, is there anything else I can do to boost
>>> throughput?
>> 
>> Start multiple processes.  For what you're doing, there's probably no
>> point in multithreading.
>
> Since the bottleneck will probably be I/O, reading and writing data from 
> files, I expect threading actually may help.
>
>
>

We approach the tradeoff from opposite sides.  I would use
multiprocessing to utilize multiple cores unless the communication costs
(between the processes) would get too high.

They won't in this case.

But I would concur -- probably they'll both give about the same speedup.
I just detest the pain that multithreading can bring, and tend to avoid
it if at all possible.

-- 
DaveA