Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #13024

Re: Processing a file using multithreads

From Roy Smith <roy@panix.com>
Newsgroups comp.lang.python
Subject Re: Processing a file using multithreads
Date 2011-09-09 09:19 -0400
Organization PANIX Public Access Internet and UNIX, NYC
Message-ID <roy-77E2CD.09190709092011@news.panix.com> (permalink)
References <mailman.885.1315522214.27778.python-list@python.org> <c6cbd486-7e5e-4d26-93b9-088d48a25dea@g9g2000yqb.googlegroups.com>

Show all headers | View raw


In article 
<c6cbd486-7e5e-4d26-93b9-088d48a25dea@g9g2000yqb.googlegroups.com>,
 aspineux <aspineux@gmail.com> wrote:

> On Sep 9, 12:49 am, Abhishek Pratap <abhishek....@gmail.com> wrote:
> > 1. My input file is 10 GB.
> > 2. I want to open 10 file handles each handling 1 GB of the file
> > 3. Each file handle is processed in by an individual thread using the
> > same function ( so total 10 cores are assumed to be available on the
> > machine)
> > 4. There will be 10 different output files
> > 5. once the 10 jobs are complete a reduce kind of function will
> > combine the output.
> >
> > Could you give some ideas ?
> 
> You can use "multiprocessing" module instead of thread to bypass the
> GIL limitation.

I agree with this.

> First cut your file in 10 "equal" parts. If it is line based search 
> for the first line close to the cut. Be sure to have "start" and 
> "end" for each parts, start is the address of the first character of 
> the first line and end is one line too much (== start of the next 
> block)

How much of the total time will be I/O and how much actual processing?  
Unless your processing is trivial, the I/O time will be relatively 
small.  In that case, you might do well to just use the unix 
command-line "split" utility to split the file into pieces first, then 
process the pieces in parallel.  Why waste effort getting the 
file-splitting-at-line-boundaries logic correct when somebody has done 
it for you?

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Processing a file using multithreads Abhishek Pratap <abhishek.vit@gmail.com> - 2011-09-08 15:49 -0700
  Re: Processing a file using multithreads Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-09-09 12:03 +1200
  Re: Processing a file using multithreads aspineux <aspineux@gmail.com> - 2011-09-08 21:44 -0700
    Re: Processing a file using multithreads Roy Smith <roy@panix.com> - 2011-09-09 09:19 -0400
      Re: Processing a file using multithreads Abhishek Pratap <abhishek.vit@gmail.com> - 2011-09-09 10:07 -0700
        Re: Processing a file using multithreads Tim Roberts <timr@probo.com> - 2011-09-09 22:43 -0700

csiph-web