Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57497

Re: Processing large CSV files - how to maximise throughput?

Return-Path <python-python-list@m.gmane.org>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.016
X-Spam-Evidence '*H*': 0.97; '*S*': 0.00; 'essentially': 0.04; 'needed,': 0.07; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'subject:files': 0.09; 'python': 0.11; 'concur': 0.16; 'concurrency': 0.16; 'from:addr:behnel.de': 0.16; 'from:addr:stefan_ml': 0.16; 'from:name:stefan behnel': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'reproduce': 0.16; 'safely.': 0.16; 'subject:CSV': 0.16; 'threads,': 0.16; 'wrote:': 0.18; 'stefan': 0.19; 'seems': 0.21; 'header:User-Agent:1': 0.23; 'setup,': 0.24; 'tend': 0.24; 'header:X-Complaints-To:1': 0.27; 'header:In-Reply-To:1': 0.27; 'specifically': 0.29; 'chris': 0.29; '25,': 0.31; "they'll": 0.31; 'trivial': 0.31; 'probably': 0.32; 'run': 0.32; 'fri,': 0.33; 'maybe': 0.34; 'could': 0.34; 'except': 0.35; 'possible.': 0.35; 'received:84': 0.35; 'but': 0.35; 'add': 0.35; 'really': 0.36; 'data,': 0.36; 'i.e.': 0.36; 'subject:?': 0.36; 'wrong': 0.37; 'easily': 0.37; 'connections': 0.38; 'to:addr:python-list': 0.38; 'files': 0.38; 'pm,': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; 'major': 0.40; 'easy': 0.60; 'dave': 0.60; 'soon': 0.63; 'channels': 0.68; 'safe': 0.72; 'fortunate': 0.84; 'pain': 0.84; 'received:arcor-ip.net': 0.84; 'received:pools .arcor-ip.net': 0.84; 'angel': 0.91; 'processes,': 0.91; 'thing,': 0.91; 'whereas': 0.91; '2013': 0.98
X-Injected-Via-Gmane http://gmane.org/
To python-list@python.org
From Stefan Behnel <stefan_ml@behnel.de>
Subject Re: Processing large CSV files - how to maximise throughput?
Date Fri, 25 Oct 2013 08:39:22 +0200
References <b4737555-cb4f-457b-aed7-a1e6553fe6a5@googlegroups.com> <mailman.1494.1382667030.18130.python-list@python.org> <5269e6f6$0$29972$c3e8da3$5496439d@news.astraweb.com> <l4cq6t$oq6$1@ger.gmane.org> <CAPTjJmqvjMMqd-JzaL3BtVu3=bwgYCdpFdMSHEa8kf5kdpVJyA@mail.gmail.com>
Mime-Version 1.0
Content-Type text/plain; charset=UTF-8
Content-Transfer-Encoding 7bit
X-Gmane-NNTP-Posting-Host dslb-084-056-040-079.pools.arcor-ip.net
User-Agent Mozilla/5.0 (X11; Linux x86_64; rv:24.0) Gecko/20100101 Thunderbird/24.0
In-Reply-To <CAPTjJmqvjMMqd-JzaL3BtVu3=bwgYCdpFdMSHEa8kf5kdpVJyA@mail.gmail.com>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1499.1382683179.18130.python-list@python.org> (permalink)
Lines 28
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1382683179 news.xs4all.nl 15866 [2001:888:2000:d::a6]:57761
X-Complaints-To abuse@xs4all.nl
Path csiph.com!usenet.pasdenom.info!news.franciliens.net!feed.ac-versailles.fr!nerim.net!novso.com!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Xref csiph.com comp.lang.python:57497

Show key headers only | View raw


Chris Angelico, 25.10.2013 08:13:
> On Fri, Oct 25, 2013 at 2:57 PM, Dave Angel wrote:
>> But I would concur -- probably they'll both give about the same speedup.
>> I just detest the pain that multithreading can bring, and tend to avoid
>> it if at all possible.
> 
> I don't have a history of major pain from threading. Is this a Python
> thing, or have I just been really really fortunate

Likely the latter. Threads are ok if what they do is essentially what you
could easily use multiple processes for as well, i.e. process independent
data, maybe from/to independent files etc., using dedicated channels for
communication.

As soon as you need them to share any state, however, it's really easy to
get it wrong and to run into concurrency issues that are difficult to
reproduce and debug.

Basically, with multiple processes, you start with independent systems and
add connections specifically where needed, whereas with threads, you start
with completely shared state and then prune away interdependencies and
concurrency until it seems to work safely. That approach makes it
essentially impossible to prove that threading is safe in a given setup,
except for the really trivial cases.

Stefan

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Processing large CSV files - how to maximise throughput? Victor Hooi <victorhooi@gmail.com> - 2013-10-24 18:38 -0700
  Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 02:10 +0000
    Re: Processing large CSV files - how to maximise throughput? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-25 03:35 +0000
      Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 03:57 +0000
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 17:13 +1100
      Re: Processing large CSV files - how to maximise throughput? Stefan Behnel <stefan_ml@behnel.de> - 2013-10-25 08:39 +0200
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 18:26 +1100
      Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 11:24 +0000
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 22:42 +1100
  Re: Processing large CSV files - how to maximise throughput? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-25 03:19 +0000
  Re: Processing large CSV files - how to maximise throughput? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-25 04:46 +0100
  Re: Processing large CSV files - how to maximise throughput? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-25 19:44 -0400
    Re: Processing large CSV files - how to maximise throughput? Roy Smith <roy@panix.com> - 2013-10-25 20:22 -0400
  Re: Processing large CSV files - how to maximise throughput? Walter Hurry <walterhurry@lavabit.com> - 2013-10-26 08:53 +0000

csiph-web