Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.009 X-Spam-Evidence: '*H*': 0.98; '*S*': 0.00; 'essentially': 0.04; 'needed,': 0.07; 'socket': 0.07; 'may,': 0.09; 'read-only': 0.09; 'responsive': 0.09; 'subject:files': 0.09; 'gui': 0.12; 'language,': 0.12; 'thread': 0.14; 'bound)': 0.16; 'concurrency': 0.16; 'foot': 0.16; 'for,': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'globals': 0.16; 'safely.': 0.16; 'segment': 0.16; 'sorts': 0.16; 'subject:CSV': 0.16; 'threads,': 0.16; 'wrote:': 0.18; '(where': 0.19; 'stefan': 0.19; 'seems': 0.21; '(in': 0.22; 'separate': 0.22; 'setup,': 0.24; "i've": 0.25; 'handling': 0.26; 'nearly': 0.26; 'header:In-Reply- To:1': 0.27; 'specifically': 0.29; 'message-id:@mail.gmail.com': 0.30; 'easier': 0.31; '25,': 0.31; 'shoot': 0.31; 'trivial': 0.31; 'fri,': 0.33; 'except': 0.35; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'really': 0.36; 'largely': 0.36; 'subject:?': 0.36; 'two': 0.37; 'connections': 0.38; 'tasks': 0.38; 'to:addr :python-list': 0.38; 'pm,': 0.38; 'little': 0.38; 'to:addr:python.org': 0.39; 'either': 0.39; 'strictly': 0.61; "you're": 0.61; 'you.': 0.62; 'back': 0.62; 'for:': 0.64; 'jobs': 0.68; 'safe': 0.72; 'carefully': 0.74; 'special': 0.74; 'yourself': 0.78; "everything's": 0.84; 'grew': 0.84; 'inherent': 0.84; 'processes,': 0.91; 'whereas': 0.91; 'state.': 0.95; '2013': 0.98 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :content-type; bh=dbJYqAMSo4IUPz7UdzE5MkSM0QWv5h4KiTg38WSUvDk=; b=VUvfoZ+qrBPfDxTiWt5kFFhlOQNYOW/vvUi4ks0J/NjoS135Hjl88NpvaxV4O4cRmX IuakUucQtySZkOz4oZXR3WHWl9gNT5pD5bq2aTxEVUiqMADixf1K/wU5ywkaYOAym+GN SUXOQDTAX3FVwB4elJGrhWgzuj+rKxVGRwKO7KI8j8LazYAm6D7b3Z5uwvdafXZtqW21 Axkx4olT8vdaZqH71jysnPLbXgEA5jAXu8O+dFmrJmg6j3V/hl6td3wGWTdzXFsjtHT/ uE3dz0+iQBkWGEUZfLFqyrKJVSjJf5Dcm70vv5vpIda6xgyddBPYQ+2pP/mZCUh7TT/t cxZw== MIME-Version: 1.0 X-Received: by 10.68.225.9 with SMTP id rg9mr193324pbc.122.1382685994897; Fri, 25 Oct 2013 00:26:34 -0700 (PDT) In-Reply-To: References: <5269e6f6$0$29972$c3e8da3$5496439d@news.astraweb.com> Date: Fri, 25 Oct 2013 18:26:34 +1100 Subject: Re: Processing large CSV files - how to maximise throughput? From: Chris Angelico To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 27 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1382686313 news.xs4all.nl 15966 [2001:888:2000:d::a6]:41344 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:57499 On Fri, Oct 25, 2013 at 5:39 PM, Stefan Behnel wrote: > Basically, with multiple processes, you start with independent systems and > add connections specifically where needed, whereas with threads, you start > with completely shared state and then prune away interdependencies and > concurrency until it seems to work safely. That approach makes it > essentially impossible to prove that threading is safe in a given setup, > except for the really trivial cases. Not strictly true. With multiple threads, you start with completely shared global state and completely independent local state (in assembly language, shared data segment and separate stack). If you treat your globals as either read-only or carefully controlled, then it makes little difference whether you're forking processes or spinning off threads, except that with threads you don't need special data structures (IPC-based ones) for the global state. For me, threading largely grew out of the same sorts of concerns as recursion - as long as all your internal state is in locals, nothing can hurt you. Of course, it's still far easier to shoot yourself in the foot with threads than with processes, but for the tasks I've used them for, I've never found footholes; that may, however, be inherent to the simplicity of the two main jobs I used threads for: socket handling (where nearly everything's I/O bound) and worker threads spun off to let the GUI remain responsive (posting a message back to the main thread when there's a result). ChrisA