Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.java.programmer > #21764
| From | Eric Sosman <esosman@comcast-dot-net.invalid> |
|---|---|
| Newsgroups | comp.lang.java.programmer |
| Subject | Re: The first 10 files |
| Date | 2013-01-26 20:42 -0500 |
| Organization | A noiseless patient Spider |
| Message-ID | <ke20lo$sh1$1@dont-email.me> (permalink) |
| References | (1 earlier) <eWPMs.128009$tG.112190@newsfe15.iad> <51041ff8$0$284$14726298@news.sunsite.dk> <amivb3Fag8iU1@mid.individual.net> <ke1k0f$nqj$1@dont-email.me> <1iop8bl8ysrfg$.rdxcxhgxuj1r$.dlg@40tude.net> |
On 1/26/2013 6:21 PM, Peter Duniho wrote:
> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>
>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>
>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>> is really the bottleneck.
>>>>
>>>> If the real bottleneck is the OS calls to get next file, then
>>>> a filter like this will not help.
>>>
>>> Why?
>>
>> Because the listFiles() method will fetch the information
>> for all 30K files from the O/S, will construct 30K File objects
>> to represent them, and will submit all 30K File objects to the
>> FileFilter, one by one. The FileFilter will (very quickly)
>> reject 29.99K of the 30K Files, but ...
>
> Will it?
Necessarily. As far as listFiles() knows, the FileFilter
might accept the very last File object given to it. Therefore,
listFiles() cannot fail to present that very last File -- and
every other File -- for inspection.
> It is plausible that the implementation of listFiles() uses an OS API that
> enumerates files one at a time. On Windows, getting the first file of the
> enumeration is faster than asking for all the files at once.
Meh.
> Indeed, I suppose one could throw an exception from the FileFilter accept()
> method to interrupt enumeration, if that's how listFiles() is implemented.
> That would avoid the need to enumerate more than the needed number of
> actual files.
It would also avoid the burden of returning anything from
listFiles() -- like, say, the array of accepted files ...
A seriously hackish approach might be to do the processing
of the files within the FileFilter itself, treating it as a
"visit this File" callback instead of as a predicate. Then if
the FileFilter threw an exception after processing the first N
files -- well, they'd already have been processed, and you were
going to ignore the listFiles() return value anyhow, so ...
But, as I said, that's pretty seriously hackish.
> Of course, this is all implementation-dependent and since it's not
> explicitly documented, could change at any time anyway.
The performance implications of retrieving information on 30K
files from the O/S are undocumented, true. But the necessity of
retrieving that information is deducible from what *is* documented.
> But unless you've
> actually examined the implementation details for listFiles(), it's not a
> foregone conclusion that the technique of using a FileFilter offers no way
> to improve latency.
Maybe this is the disconnect: I understood the O.P.'s concern as
"It's doing three thousand times too much work," not as "It takes
three thousand times as long as it should just to get to the first
File instance." Either way, though, I think a FileFilter (used in a
non-hackish way) cannot reduce either the total work or the latency.
Observe that listFiles() cannot return anything at all until it has
built the entire array of accepted files; Java's arrays have no way
to say "I hold five elements now, but might grow."
> All that said, I think John Matthews' comment about the question of what
> 30K files are doing in a single directory in the first place is perhaps one
> of the more useful points in this topic. One doesn't always have control
> over that, of course...but if one does, it's certainly worth rethinking
> that aspect of the design. There are reasons other than code latency to
> avoid so many files in a single directory.
Yeah. The O.P. said something about external processes dumping
files into the directory, possibly dumping many between (widely-
spaced?) executions of his program. That seems odd to me, though,
because if there's a backlog of thirty thousand it seems odd to want
to reduce it by only ten ...
If he's stuck with this overall design, though, I think the
walkFileTree() method of java.nio.file.Files would be a cleaner way
to proceed. His FileVisitor could return FileVisitResult.TERMINATE
after it had seen ten files, and that would be that. No hacks.
--
Eric Sosman
esosman@comcast-dot-net.invalid
Back to comp.lang.java.programmer | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 01:14 -0800
Re: The first 10 files Roedy Green <see_website@mindprod.com.invalid> - 2013-01-26 02:44 -0800
Re: The first 10 files Lew <lewbloch@gmail.com> - 2013-01-26 10:20 -0800
Re: The first 10 files "John B. Matthews" <nospam@nospam.invalid> - 2013-01-26 06:31 -0500
Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 15:42 -0800
Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 17:13 -0700
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:21 -0500
Re: The first 10 files "John B. Matthews" <nospam@nospam.invalid> - 2013-01-26 22:05 -0500
Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 08:24 -0400
Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 08:25 -0400
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 13:26 -0500
Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-01-26 22:15 +0100
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 16:25 -0500
Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 17:06 -0500
Re: The first 10 files Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2013-01-26 15:21 -0800
Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 20:42 -0500
Re: The first 10 files Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2013-01-26 17:56 -0800
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:29 -0500
Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 21:56 -0500
Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:51 -0700
Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:47 -0700
Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 22:02 -0400
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:35 -0500
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:43 -0500
Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-01-27 13:55 +0100
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-02-24 17:50 -0500
Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-02-25 21:53 +0100
Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:57 -0700
Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 21:20 -0800
Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-27 07:23 -0400
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-27 20:36 -0500
Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-28 16:28 -0800
Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:23 -0500
Re: The first 10 files Roedy Green <see_website@mindprod.com.invalid> - 2013-01-26 19:09 -0800
Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 16:00 -0700
Re: The first 10 files Knute Johnson <nospam@knutejohnson.com> - 2013-01-26 18:37 -0800
Re: The first 10 files Wojtek <nowhere@a.com> - 2013-03-14 03:07 -0700
Re: The first 10 files lipska the kat <"nospam at neversurrender dot co dot uk"> - 2013-03-14 12:49 +0000
Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-03-15 11:38 +0100
Re: The first 10 files Wojtek <nowhere@a.com> - 2013-03-15 10:31 -0700
csiph-web