Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.020 X-Spam-Evidence: '*H*': 0.96; '*S*': 0.00; 'subject:using': 0.04; 'python': 0.08; 'assumed': 0.09; 'output': 0.10; 'subject:file': 0.13; 'chunks': 0.16; 'cores': 0.16; 'subject:Processing': 0.16; 'processed': 0.18; 'input': 0.24; 'handles': 0.25; 'function': 0.27; 'received:209.85.220': 0.27; 'message-id:@mail.gmail.com': 0.29; 'separately.': 0.30; 'handling': 0.32; 'there': 0.33; 'to:addr:python-list': 0.33; 'file.': 0.34; 'test': 0.34; 'similar': 0.35; 'file': 0.36; 'thread': 0.37; 'using': 0.37; 'open': 0.37; 'could': 0.38; 'somewhat': 0.38; 'some': 0.38; 'received:google.com': 0.38; 'guys': 0.38; 'received:209.85': 0.38; 'to:addr:python.org': 0.39; 'case': 0.39; 'total': 0.61; 'kind': 0.61; 'here': 0.65 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:from:date:message-id:subject:to:content-type; bh=/f1QKxky0uKJ6bGZoL8uGKJOsce/zAMMzl98FmbfEYw=; b=MzXQ5GrmTTVHs2HosR33NqSLoJosr0L+qeph2SQzCXUS2c3YyAVUV+nSpAvNNqUL6t 5VA1wtbBdht3yU6Bkg3W3Mndgn9vvT0TDdAfnDXgXldtyO/Hw5PShxqRhy4xNYzmxNtq Oi0MTW8ShIPRCA85/L3dAh5M3xtMIz2SuoWhw= MIME-Version: 1.0 From: Abhishek Pratap Date: Thu, 8 Sep 2011 15:49:51 -0700 Subject: Processing a file using multithreads To: python-list@python.org Content-Type: text/plain; charset=ISO-8859-1 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 24 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1315522214 news.xs4all.nl 2506 [2001:888:2000:d::a6]:42826 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:12980 Hi Guys My experience with python is 2 days and I am looking for a slick way to use multi-threading to process a file. Here is what I would like to do which is somewhat similar to MapReduce in concept. # test case 1. My input file is 10 GB. 2. I want to open 10 file handles each handling 1 GB of the file 3. Each file handle is processed in by an individual thread using the same function ( so total 10 cores are assumed to be available on the machine) 4. There will be 10 different output files 5. once the 10 jobs are complete a reduce kind of function will combine the output. Could you give some ideas ? So given a file I would like to read it in #N chunks through #N file handles and process each of them separately. Best, -Abhi