Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #19010
| Date | 2012-10-01 05:00 -0700 |
|---|---|
| From | Patricia Shanahan <pats@acm.org> |
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: Threading model for reading 1,000 files quickly? |
| References | <051fc3d6-d22c-438a-b4d3-84378e447733@googlegroups.com> <K-SdnU_ujNRnyvTNnZ2dnUVZ7q8AAAAA@bt.com> |
| Message-ID | <1YednWK6TvBYGPTNnZ2dnUVZ_uudnZ2d@earthlink.com> (permalink) |
On 10/1/2012 1:43 AM, Chris Uppal wrote: > leegee@gmail.com wrote: >> I have directory with many sub-directories, each with many thousands of >> files. >> >> I wish to process each file, which takes one or two seconds. >> >> I wish to simultaneously process as many files as possible. > > Your problem here is not threading, but disk IO. Specifically disk seeks. If > you are using a rotating disk (as opposed to a SSD), and all the files are on > the same spindle, then using > 1 thread will just slow things down as the > different thread "fight" to position the disk heads over "their" files. > > If you are using more than one spindle (say in a RAID array) then you > may find benefits in using a similar number of threads. > > If the processing is CPU bound rather than IO bound when you are > processing just one file (doesn't sound like it, but may be true) > then you can perhaps get benefits by using roughly as many threads > and you have real cores available to compute. I agree with the idea that the objective, for rotating disk, should probably be to optimize use of the disk head's time. I disagree with the conclusion. There is no reason to expect the files to be laid out on disk in the order of requests. It is entirely possible that files N+2, N+3, and N+4 are physically between files N and N+1 for some values of N. Either or both of the operating system or the disk drive may be optimizing the request order to reduce head movement. If the scheduling algorithm knows that all of N through N+4 are needed, it can stop the head at each track that has one of them and read it as the head is moving from N to N+1. If you feed the requests to the operating system one at a time, and wait for each to finish, the disk head will be forced to do the reads in First-Come-First-Served order, regardless of disk placement. That will probably not be the optimal order. If you have too many requests outstanding there is a risk of overloading the operating system's buffering. I would suggest either using asynchronous I/O or a thread pool, so that the number of outstanding requests can be tuned based on measurements. I will be surprised of the optimal queue length is one. Patricia
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Threading model for reading 1,000 files quickly? leegee@gmail.com - 2012-10-01 00:11 -0700
Re: Threading model for reading 1,000 files quickly? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-10-01 09:43 +0100
Re: Threading model for reading 1,000 files quickly? Patricia Shanahan <pats@acm.org> - 2012-10-01 05:00 -0700
Re: Threading model for reading 1,000 files quickly? "Chris Uppal" <chris.uppal@metagnostic.REMOVE-THIS.org> - 2012-10-03 08:24 +0100
Re: Threading model for reading 1,000 files quickly? Robert Klemme <shortcutter@googlemail.com> - 2012-10-03 13:58 +0200
Re: Threading model for reading 1,000 files quickly? markspace <-@.> - 2012-10-01 09:35 -0700
Re: Threading model for reading 1,000 files quickly? Eric Sosman <esosman@ieee-dot-org.invalid> - 2012-10-01 09:32 -0400
Re: Threading model for reading 1,000 files quickly? Kevin McMurtrie <mcmurtrie@pixelmemory.us> - 2012-10-01 20:11 -0700
csiph-web