Re: Processing large CSV files - how to maximise throughput?

Path	csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path	<rosuav@gmail.com>
X-Original-To	python-list@python.org
Delivered-To	python-list@mail.python.org
X-Spam-Status	OK 0.009
X-Spam-Evidence	'H': 0.98; 'S': 0.00; 'essentially': 0.04; 'needed,': 0.07; 'socket': 0.07; 'may,': 0.09; 'read-only': 0.09; 'responsive': 0.09; 'subject:files': 0.09; 'gui': 0.12; 'language,': 0.12; 'thread': 0.14; 'bound)': 0.16; 'concurrency': 0.16; 'foot': 0.16; 'for,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'globals': 0.16; 'safely.': 0.16; 'segment': 0.16; 'sorts': 0.16; 'subject:CSV': 0.16; 'threads,': 0.16; 'wrote:': 0.18; '(where': 0.19; 'stefan': 0.19; 'seems': 0.21; '(in': 0.22; 'separate': 0.22; 'setup,': 0.24; "i've": 0.25; 'handling': 0.26; 'nearly': 0.26; 'header:In-Reply- To:1': 0.27; 'specifically': 0.29; 'message-id:@mail.gmail.com': 0.30; 'easier': 0.31; '25,': 0.31; 'shoot': 0.31; 'trivial': 0.31; 'fri,': 0.33; 'except': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'really': 0.36; 'largely': 0.36; 'subject:?': 0.36; 'two': 0.37; 'connections': 0.38; 'tasks': 0.38; 'to:addr :python-list': 0.38; 'pm,': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'either': 0.39; 'strictly': 0.61; "you're": 0.61; 'you.': 0.62; 'back': 0.62; 'for:': 0.64; 'jobs': 0.68; 'safe': 0.72; 'carefully': 0.74; 'special': 0.74; 'yourself': 0.78; "everything's": 0.84; 'grew': 0.84; 'inherent': 0.84; 'processes,': 0.91; 'whereas': 0.91; 'state.': 0.95; '2013': 0.98
DKIM-Signature	v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=dbJYqAMSo4IUPz7UdzE5MkSM0QWv5h4KiTg38WSUvDk=; b=VUvfoZ+qrBPfDxTiWt5kFFhlOQNYOW/vvUi4ks0J/NjoS135Hjl88NpvaxV4O4cRmX IuakUucQtySZkOz4oZXR3WHWl9gNT5pD5bq2aTxEVUiqMADixf1K/wU5ywkaYOAym+GN SUXOQDTAX3FVwB4elJGrhWgzuj+rKxVGRwKO7KI8j8LazYAm6D7b3Z5uwvdafXZtqW21 Axkx4olT8vdaZqH71jysnPLbXgEA5jAXu8O+dFmrJmg6j3V/hl6td3wGWTdzXFsjtHT/ uE3dz0+iQBkWGEUZfLFqyrKJVSjJf5Dcm70vv5vpIda6xgyddBPYQ+2pP/mZCUh7TT/t cxZw==
MIME-Version	1.0
X-Received	by 10.68.225.9 with SMTP id rg9mr193324pbc.122.1382685994897; Fri, 25 Oct 2013 00:26:34 -0700 (PDT)
In-Reply-To	<l4d3mg$hsf$1@ger.gmane.org>
References	<b4737555-cb4f-457b-aed7-a1e6553fe6a5@googlegroups.com> <mailman.1494.1382667030.18130.python-list@python.org> <5269e6f6$0$29972$c3e8da3$5496439d@news.astraweb.com> <l4cq6t$oq6$1@ger.gmane.org> <CAPTjJmqvjMMqd-JzaL3BtVu3=bwgYCdpFdMSHEa8kf5kdpVJyA@mail.gmail.com> <l4d3mg$hsf$1@ger.gmane.org>
Date	Fri, 25 Oct 2013 18:26:34 +1100
Subject	Re: Processing large CSV files - how to maximise throughput?
From	Chris Angelico <rosuav@gmail.com>
To	python-list@python.org
Content-Type	text/plain; charset=ISO-8859-1
X-BeenThere	python-list@python.org
X-Mailman-Version	2.1.15
Precedence	list
List-Id	General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe	<https://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive	<http://mail.python.org/pipermail/python-list/>
List-Post	<mailto:python-list@python.org>
List-Help	<mailto:python-list-request@python.org?subject=help>
List-Subscribe	<https://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups	comp.lang.python
Message-ID	<mailman.1500.1382686313.18130.python-list@python.org> (permalink)
Lines	27
NNTP-Posting-Host	2001:888:2000:d::a6
X-Trace	1382686313 news.xs4all.nl 15966 [2001:888:2000:d::a6]:41344
X-Complaints-To	abuse@xs4all.nl
Xref	csiph.com comp.lang.python:57499

Show key headers only | View raw

On Fri, Oct 25, 2013 at 5:39 PM, Stefan Behnel <stefan_ml@behnel.de> wrote:
> Basically, with multiple processes, you start with independent systems and
> add connections specifically where needed, whereas with threads, you start
> with completely shared state and then prune away interdependencies and
> concurrency until it seems to work safely. That approach makes it
> essentially impossible to prove that threading is safe in a given setup,
> except for the really trivial cases.

Not strictly true. With multiple threads, you start with completely
shared global state and completely independent local state (in
assembly language, shared data segment and separate stack). If you
treat your globals as either read-only or carefully controlled, then
it makes little difference whether you're forking processes or
spinning off threads, except that with threads you don't need special
data structures (IPC-based ones) for the global state.

For me, threading largely grew out of the same sorts of concerns as
recursion - as long as all your internal state is in locals, nothing
can hurt you. Of course, it's still far easier to shoot yourself in
the foot with threads than with processes, but for the tasks I've used
them for, I've never found footholes; that may, however, be inherent
to the simplicity of the two main jobs I used threads for: socket
handling (where nearly everything's I/O bound) and worker threads spun
off to let the GUI remain responsive (posting a message back to the
main thread when there's a result).

ChrisA

Thread

Processing large CSV files - how to maximise throughput? Victor Hooi <victorhooi@gmail.com> - 2013-10-24 18:38 -0700
  Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 02:10 +0000
    Re: Processing large CSV files - how to maximise throughput? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-25 03:35 +0000
      Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 03:57 +0000
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 17:13 +1100
      Re: Processing large CSV files - how to maximise throughput? Stefan Behnel <stefan_ml@behnel.de> - 2013-10-25 08:39 +0200
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 18:26 +1100
      Re: Processing large CSV files - how to maximise throughput? Dave Angel <davea@davea.name> - 2013-10-25 11:24 +0000
      Re: Processing large CSV files - how to maximise throughput? Chris Angelico <rosuav@gmail.com> - 2013-10-25 22:42 +1100
  Re: Processing large CSV files - how to maximise throughput? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2013-10-25 03:19 +0000
  Re: Processing large CSV files - how to maximise throughput? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2013-10-25 04:46 +0100
  Re: Processing large CSV files - how to maximise throughput? Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2013-10-25 19:44 -0400
    Re: Processing large CSV files - how to maximise throughput? Roy Smith <roy@panix.com> - 2013-10-25 20:22 -0400
  Re: Processing large CSV files - how to maximise throughput? Walter Hurry <walterhurry@lavabit.com> - 2013-10-26 08:53 +0000

csiph-web