Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #21727 > unrolled thread

The first 10 files

Started byWojtek <nowhere@a.com>
First post2013-01-26 01:14 -0800
Last post2013-03-15 10:31 -0700
Articles 20 on this page of 40 — 12 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 01:14 -0800
    Re: The first 10 files Roedy Green <see_website@mindprod.com.invalid> - 2013-01-26 02:44 -0800
      Re: The first 10 files Lew <lewbloch@gmail.com> - 2013-01-26 10:20 -0800
    Re: The first 10 files "John B. Matthews" <nospam@nospam.invalid> - 2013-01-26 06:31 -0500
      Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 15:42 -0800
        Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 17:13 -0700
        Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:21 -0500
        Re: The first 10 files "John B. Matthews" <nospam@nospam.invalid> - 2013-01-26 22:05 -0500
    Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 08:24 -0400
      Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 08:25 -0400
      Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 13:26 -0500
        Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-01-26 22:15 +0100
          Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 16:25 -0500
          Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 17:06 -0500
            Re: The first 10 files Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2013-01-26 15:21 -0800
              Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 20:42 -0500
                Re: The first 10 files Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> - 2013-01-26 17:56 -0800
                  Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:29 -0500
                  Re: The first 10 files Eric Sosman <esosman@comcast-dot-net.invalid> - 2013-01-26 21:56 -0500
                    Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:51 -0700
                  Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:47 -0700
                Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-26 22:02 -0400
                  Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:35 -0500
                    Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:43 -0500
                      Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-01-27 13:55 +0100
                        Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-02-24 17:50 -0500
                          Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-02-25 21:53 +0100
                  Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 20:57 -0700
                  Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-26 21:20 -0800
                    Re: The first 10 files Arved Sandstrom <asandstrom2@eastlink.ca> - 2013-01-27 07:23 -0400
                    Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-27 20:36 -0500
                      Re: The first 10 files Wojtek <nowhere@a.com> - 2013-01-28 16:28 -0800
              Re: The first 10 files Arne Vajhøj <arne@vajhoej.dk> - 2013-01-26 21:23 -0500
              Re: The first 10 files Roedy Green <see_website@mindprod.com.invalid> - 2013-01-26 19:09 -0800
    Re: The first 10 files Jim Janney <jjanney@shell.xmission.com> - 2013-01-26 16:00 -0700
    Re: The first 10 files Knute Johnson <nospam@knutejohnson.com> - 2013-01-26 18:37 -0800
      Re: The first 10 files Wojtek <nowhere@a.com> - 2013-03-14 03:07 -0700
        Re: The first 10 files lipska the kat <"nospam at neversurrender dot co dot uk"> - 2013-03-14 12:49 +0000
        Re: The first 10 files Robert Klemme <shortcutter@googlemail.com> - 2013-03-15 11:38 +0100
          Re: The first 10 files Wojtek <nowhere@a.com> - 2013-03-15 10:31 -0700

Page 2 of 2 — ← Prev page 1 [2]


#21783

FromJim Janney <jjanney@shell.xmission.com>
Date2013-01-26 20:47 -0700
Message-ID<ydn38xn1e41.fsf@shell.xmission.com>
In reply to#21766
Peter Duniho <NpOeStPeAdM@NnOwSlPiAnMk.com> writes:

> On Sat, 26 Jan 2013 20:42:16 -0500, Eric Sosman wrote:
>
>> [...]
>>>>       Because the listFiles() method will fetch the information
>>>> for all 30K files from the O/S, will construct 30K File objects
>>>> to represent them, and will submit all 30K File objects to the
>>>> FileFilter, one by one.  The FileFilter will (very quickly)
>>>> reject 29.99K of the 30K Files, but ...
>>>
>>> Will it?
>> 
>>      Necessarily.  As far as listFiles() knows, the FileFilter
>> might accept the very last File object given to it.  Therefore,
>> listFiles() cannot fail to present that very last File -- and
>> every other File -- for inspection.
>
> Except in the way I already noted, you mean.
>
>> [...]
>>> Indeed, I suppose one could throw an exception from the FileFilter accept()
>>> method to interrupt enumeration, if that's how listFiles() is implemented.
>>> That would avoid the need to enumerate more than the needed number of
>>> actual files.
>> 
>>      It would also avoid the burden of returning anything from
>> listFiles() -- like, say, the array of accepted files ...
>
> As you've already agreed, it is possible for the FileFilter implementation
> to store the results itself, obviating any need for the listFiles() method
> to return successfully.
>
> If it works (which is not assured...it depends on how listFiles() is
> implemented in the first place), then yes, maybe it's a bit of a kludge.
> But it's an easier, more portable kludge than writing some JNI-based
> component and would in fact get the job done.
>
> Sometimes, when the library you're using doesn't provide exactly the
> features you need, you wind up with a kludge. Oh well...shit happens.
>
> I'm not saying it's a great solution. But it's a far cry from a conclusion
> that it simply cannot be done with the Java API as it exists now.

It's an abuse of the notion of a filter, but yes, it can be made to
work.  I stand corrected.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#21767

FromArved Sandstrom <asandstrom2@eastlink.ca>
Date2013-01-26 22:02 -0400
Message-ID<iV%Ms.125649$Id.75544@newsfe24.iad>
In reply to#21764
On 01/26/2013 09:42 PM, Eric Sosman wrote:
> On 1/26/2013 6:21 PM, Peter Duniho wrote:
>> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>>
>>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>>
>>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>>> is really the bottleneck.
>>>>>
>>>>> If the real bottleneck is the OS calls to get next file, then
>>>>> a filter like this will not help.
>>>>
>>>> Why?
>>>
>>>       Because the listFiles() method will fetch the information
>>> for all 30K files from the O/S, will construct 30K File objects
>>> to represent them, and will submit all 30K File objects to the
>>> FileFilter, one by one.  The FileFilter will (very quickly)
>>> reject 29.99K of the 30K Files, but ...
>>
>> Will it?
>
>      Necessarily.  As far as listFiles() knows, the FileFilter
> might accept the very last File object given to it.  Therefore,
> listFiles() cannot fail to present that very last File -- and
> every other File -- for inspection.
[ SNIP ]

I'd have to agree. A simple test shows this to be the case, but your 
reasoning precludes having to run such a test in the first place.

My code "gets' the first N files from listFiles(), for some definition 
of "first", but it certainly doesn't only get N files from the OS.

Based on Wojtek's later post, I'd be examining the entire problem in 
more detail before arriving at a decent solution. I don't think most of 
the problem pertaining to offering reasonable batches of files to a Java 
program for processing is something that I'd address in Java anyway.

AHS

[toc] | [prev] | [next] | [standalone]


#21771

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-01-26 21:35 -0500
Message-ID<5104925e$0$284$14726298@news.sunsite.dk>
In reply to#21767
On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
> On 01/26/2013 09:42 PM, Eric Sosman wrote:
>> On 1/26/2013 6:21 PM, Peter Duniho wrote:
>>> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>>>
>>>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>>>
>>>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>>>> is really the bottleneck.
>>>>>>
>>>>>> If the real bottleneck is the OS calls to get next file, then
>>>>>> a filter like this will not help.
>>>>>
>>>>> Why?
>>>>
>>>>       Because the listFiles() method will fetch the information
>>>> for all 30K files from the O/S, will construct 30K File objects
>>>> to represent them, and will submit all 30K File objects to the
>>>> FileFilter, one by one.  The FileFilter will (very quickly)
>>>> reject 29.99K of the 30K Files, but ...
>>>
>>> Will it?
>>
>>      Necessarily.  As far as listFiles() knows, the FileFilter
>> might accept the very last File object given to it.  Therefore,
>> listFiles() cannot fail to present that very last File -- and
>> every other File -- for inspection.
> [ SNIP ]
>
> I'd have to agree. A simple test shows this to be the case, but your
> reasoning precludes having to run such a test in the first place.
>
> My code "gets' the first N files from listFiles(), for some definition
> of "first", but it certainly doesn't only get N files from the OS.
>
> Based on Wojtek's later post, I'd be examining the entire problem in
> more detail before arriving at a decent solution. I don't think most of
> the problem pertaining to offering reasonable batches of files to a Java
> program for processing is something that I'd address in Java anyway.

If OP happens to be on Java 7, then I will suggest using:

java.nio.file.Files.newDirectoryStream(dir)

It is a straight forward way of getting the first N files.

And it is is as likely as the exception hack to not to read
all filenames from the OS.

Arne

[toc] | [prev] | [next] | [standalone]


#21773

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-01-26 21:43 -0500
Message-ID<51049469$0$293$14726298@news.sunsite.dk>
In reply to#21771
On 1/26/2013 9:35 PM, Arne Vajhøj wrote:
> On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
>> On 01/26/2013 09:42 PM, Eric Sosman wrote:
>>> On 1/26/2013 6:21 PM, Peter Duniho wrote:
>>>> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>>>>
>>>>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>>>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>>>>
>>>>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>>>>> is really the bottleneck.
>>>>>>>
>>>>>>> If the real bottleneck is the OS calls to get next file, then
>>>>>>> a filter like this will not help.
>>>>>>
>>>>>> Why?
>>>>>
>>>>>       Because the listFiles() method will fetch the information
>>>>> for all 30K files from the O/S, will construct 30K File objects
>>>>> to represent them, and will submit all 30K File objects to the
>>>>> FileFilter, one by one.  The FileFilter will (very quickly)
>>>>> reject 29.99K of the 30K Files, but ...
>>>>
>>>> Will it?
>>>
>>>      Necessarily.  As far as listFiles() knows, the FileFilter
>>> might accept the very last File object given to it.  Therefore,
>>> listFiles() cannot fail to present that very last File -- and
>>> every other File -- for inspection.
>> [ SNIP ]
>>
>> I'd have to agree. A simple test shows this to be the case, but your
>> reasoning precludes having to run such a test in the first place.
>>
>> My code "gets' the first N files from listFiles(), for some definition
>> of "first", but it certainly doesn't only get N files from the OS.
>>
>> Based on Wojtek's later post, I'd be examining the entire problem in
>> more detail before arriving at a decent solution. I don't think most of
>> the problem pertaining to offering reasonable batches of files to a Java
>> program for processing is something that I'd address in Java anyway.
>
> If OP happens to be on Java 7, then I will suggest using:
>
> java.nio.file.Files.newDirectoryStream(dir)
>
> It is a straight forward way of getting the first N files.
>
> And it is is as likely as the exception hack to not to read
> all filenames from the OS.

import java.io.IOException;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.Iterator;

public class ListFilesWithLimit {
	public static void main(String[] args) throws IOException {
		Iterator<Path> dir = 
Files.newDirectoryStream(Paths.get("/work")).iterator();
		int n = 0;
		while(dir.hasNext() && n < 10) {
			System.out.println(dir.next());
		}
	}
}

Arne


[toc] | [prev] | [next] | [standalone]


#21797

FromRobert Klemme <shortcutter@googlemail.com>
Date2013-01-27 13:55 +0100
Message-ID<amkmdqFlumnU1@mid.individual.net>
In reply to#21773
On 27.01.2013 03:43, Arne Vajhøj wrote:
> On 1/26/2013 9:35 PM, Arne Vajhøj wrote:
>> On 1/26/2013 9:02 PM, Arved Sandstrom wrote:

>> If OP happens to be on Java 7, then I will suggest using:
>>
>> java.nio.file.Files.newDirectoryStream(dir)
>>
>> It is a straight forward way of getting the first N files.
>>
>> And it is is as likely as the exception hack to not to read
>> all filenames from the OS.
>
> import java.io.IOException;
> import java.nio.file.Files;
> import java.nio.file.Path;
> import java.nio.file.Paths;
> import java.util.Iterator;
>
> public class ListFilesWithLimit {
>      public static void main(String[] args) throws IOException {
>          Iterator<Path> dir =
> Files.newDirectoryStream(Paths.get("/work")).iterator();
>          int n = 0;
>          while(dir.hasNext() && n < 10) {
>              System.out.println(dir.next());
>          }
>      }
> }

For earlier Java versions we could emulate that with a second thread.

package file;

import java.io.File;
import java.io.FileFilter;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.BlockingQueue;
import java.util.concurrent.SynchronousQueue;
import java.util.concurrent.TimeUnit;

public final class ListFileTestThreaded2 {

   private static final class CountFilterThread extends Thread 
implements FileFilter {

     private final File dir;
     private final int maxFiles;
     private final BlockingQueue<List<File>> queue;
     private List<File> filesSeen = new ArrayList<File>();

     public CountFilterThread(File dir, int maxFiles, 
BlockingQueue<List<File>> queue) {
       this.dir = dir;
       this.maxFiles = maxFiles;
       this.queue = queue;
     }

     @Override
     public void run() {
       try {
         dir.listFiles(this);

         if (filesSeen != null) {
           send();
         }
       } catch (InterruptedException e) {
         e.printStackTrace();
       }
     }

     private void send() throws InterruptedException {
       queue.put(filesSeen);
       filesSeen = null;
     }

     @Override
     public boolean accept(final File f) {
       try {
         if (filesSeen != null) {
           filesSeen.add(f);

           if (filesSeen.size() == maxFiles) {
             send();
             assert filesSeen == null;
           }
         }

         return false;
       } catch (InterruptedException e) {
         throw new IllegalStateException(e);
       }
     }
   }

   private static final int[] LIMITS = { 10, 100, 1000, 10000, 
Integer.MAX_VALUE };

   public static void main(String[] args) throws InterruptedException {
     for (final String s : args) {
       System.out.println("Testing: " + s);
       final File dir = new File(s);

       if (dir.isDirectory()) {
         for (final int limit : LIMITS) {
           final SynchronousQueue<List<File>> queue = new 
SynchronousQueue<List<File>>();
           final CountFilterThread cf = new CountFilterThread(dir, 
limit, queue);
           cf.setDaemon(true);
           final long t1 = System.nanoTime();
           cf.start();
           final List<File> entries = queue.take();
           final long delta = System.nanoTime() - t1;
           System.out.printf("It took %20dus to retrieve %20d files, 
%20.5fus/file.\n",
               TimeUnit.NANOSECONDS.toMicros(delta), entries.size(), 
(double) TimeUnit.NANOSECONDS.toMicros(delta)
                   / entries.size());
         }
       } else {
         System.out.println("Not a directory.");
       }
     }

     System.out.println("done");
   }

}

https://gist.github.com/4648256

It's not guaranteed though that this will be faster.  And it's 
definitively not simpler than the straight forward approach. :-)

Cheers

	robert



-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#22489

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-02-24 17:50 -0500
Message-ID<512a993e$0$289$14726298@news.sunsite.dk>
In reply to#21797
On 1/27/2013 7:55 AM, Robert Klemme wrote:
> On 27.01.2013 03:43, Arne Vajhøj wrote:
>> On 1/26/2013 9:35 PM, Arne Vajhøj wrote:
>>> On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
>
>>> If OP happens to be on Java 7, then I will suggest using:
>>>
>>> java.nio.file.Files.newDirectoryStream(dir)
>>>
>>> It is a straight forward way of getting the first N files.
>>>
>>> And it is is as likely as the exception hack to not to read
>>> all filenames from the OS.
>>
>> import java.io.IOException;
>> import java.nio.file.Files;
>> import java.nio.file.Path;
>> import java.nio.file.Paths;
>> import java.util.Iterator;
>>
>> public class ListFilesWithLimit {
>>      public static void main(String[] args) throws IOException {
>>          Iterator<Path> dir =
>> Files.newDirectoryStream(Paths.get("/work")).iterator();
>>          int n = 0;
>>          while(dir.hasNext() && n < 10) {
>>              System.out.println(dir.next());
>>          }
>>      }
>> }
>
> For earlier Java versions we could emulate that with a second thread.
>
> package file;
>
> import java.io.File;
> import java.io.FileFilter;
> import java.util.ArrayList;
> import java.util.List;
> import java.util.concurrent.BlockingQueue;
> import java.util.concurrent.SynchronousQueue;
> import java.util.concurrent.TimeUnit;
>
> public final class ListFileTestThreaded2 {
>
>    private static final class CountFilterThread extends Thread
> implements FileFilter {
>
>      private final File dir;
>      private final int maxFiles;
>      private final BlockingQueue<List<File>> queue;
>      private List<File> filesSeen = new ArrayList<File>();
>
>      public CountFilterThread(File dir, int maxFiles,
> BlockingQueue<List<File>> queue) {
>        this.dir = dir;
>        this.maxFiles = maxFiles;
>        this.queue = queue;
>      }
>
>      @Override
>      public void run() {
>        try {
>          dir.listFiles(this);
>
>          if (filesSeen != null) {
>            send();
>          }
>        } catch (InterruptedException e) {
>          e.printStackTrace();
>        }
>      }
>
>      private void send() throws InterruptedException {
>        queue.put(filesSeen);
>        filesSeen = null;
>      }
>
>      @Override
>      public boolean accept(final File f) {
>        try {
>          if (filesSeen != null) {
>            filesSeen.add(f);
>
>            if (filesSeen.size() == maxFiles) {
>              send();
>              assert filesSeen == null;
>            }
>          }
>
>          return false;
>        } catch (InterruptedException e) {
>          throw new IllegalStateException(e);
>        }
>      }
>    }
>
>    private static final int[] LIMITS = { 10, 100, 1000, 10000,
> Integer.MAX_VALUE };
>
>    public static void main(String[] args) throws InterruptedException {
>      for (final String s : args) {
>        System.out.println("Testing: " + s);
>        final File dir = new File(s);
>
>        if (dir.isDirectory()) {
>          for (final int limit : LIMITS) {
>            final SynchronousQueue<List<File>> queue = new
> SynchronousQueue<List<File>>();
>            final CountFilterThread cf = new CountFilterThread(dir,
> limit, queue);
>            cf.setDaemon(true);
>            final long t1 = System.nanoTime();
>            cf.start();
>            final List<File> entries = queue.take();
>            final long delta = System.nanoTime() - t1;
>            System.out.printf("It took %20dus to retrieve %20d files,
> %20.5fus/file.\n",
>                TimeUnit.NANOSECONDS.toMicros(delta), entries.size(),
> (double) TimeUnit.NANOSECONDS.toMicros(delta)
>                    / entries.size());
>          }
>        } else {
>          System.out.println("Not a directory.");
>        }
>      }
>
>      System.out.println("done");
>    }
>
> }
>
> https://gist.github.com/4648256
>
> It's not guaranteed though that this will be faster.  And it's
> definitively not simpler than the straight forward approach. :-)

Is that much different from the throw exception in filter solution
except that it requires a lot more code?

Arne

[toc] | [prev] | [next] | [standalone]


#22514

FromRobert Klemme <shortcutter@googlemail.com>
Date2013-02-25 21:53 +0100
Message-ID<ap21b9F52mbU2@mid.individual.net>
In reply to#22489
On 24.02.2013 23:50, Arne Vajhøj wrote:
> On 1/27/2013 7:55 AM, Robert Klemme wrote:
>> On 27.01.2013 03:43, Arne Vajhøj wrote:
>>> On 1/26/2013 9:35 PM, Arne Vajhøj wrote:
>>>> On 1/26/2013 9:02 PM, Arved Sandstrom wrote:
>>
>>>> If OP happens to be on Java 7, then I will suggest using:
>>>>
>>>> java.nio.file.Files.newDirectoryStream(dir)
>>>>
>>>> It is a straight forward way of getting the first N files.
>>>>
>>>> And it is is as likely as the exception hack to not to read
>>>> all filenames from the OS.
>>>
>>> import java.io.IOException;
>>> import java.nio.file.Files;
>>> import java.nio.file.Path;
>>> import java.nio.file.Paths;
>>> import java.util.Iterator;
>>>
>>> public class ListFilesWithLimit {
>>>      public static void main(String[] args) throws IOException {
>>>          Iterator<Path> dir =
>>> Files.newDirectoryStream(Paths.get("/work")).iterator();
>>>          int n = 0;
>>>          while(dir.hasNext() && n < 10) {
>>>              System.out.println(dir.next());
>>>          }
>>>      }
>>> }
>>
>> For earlier Java versions we could emulate that with a second thread.
>>
>> package file;
>>
>> import java.io.File;
>> import java.io.FileFilter;
>> import java.util.ArrayList;
>> import java.util.List;
>> import java.util.concurrent.BlockingQueue;
>> import java.util.concurrent.SynchronousQueue;
>> import java.util.concurrent.TimeUnit;
>>
>> public final class ListFileTestThreaded2 {
>>
>>    private static final class CountFilterThread extends Thread
>> implements FileFilter {
>>
>>      private final File dir;
>>      private final int maxFiles;
>>      private final BlockingQueue<List<File>> queue;
>>      private List<File> filesSeen = new ArrayList<File>();
>>
>>      public CountFilterThread(File dir, int maxFiles,
>> BlockingQueue<List<File>> queue) {
>>        this.dir = dir;
>>        this.maxFiles = maxFiles;
>>        this.queue = queue;
>>      }
>>
>>      @Override
>>      public void run() {
>>        try {
>>          dir.listFiles(this);
>>
>>          if (filesSeen != null) {
>>            send();
>>          }
>>        } catch (InterruptedException e) {
>>          e.printStackTrace();
>>        }
>>      }
>>
>>      private void send() throws InterruptedException {
>>        queue.put(filesSeen);
>>        filesSeen = null;
>>      }
>>
>>      @Override
>>      public boolean accept(final File f) {
>>        try {
>>          if (filesSeen != null) {
>>            filesSeen.add(f);
>>
>>            if (filesSeen.size() == maxFiles) {
>>              send();
>>              assert filesSeen == null;
>>            }
>>          }
>>
>>          return false;
>>        } catch (InterruptedException e) {
>>          throw new IllegalStateException(e);
>>        }
>>      }
>>    }
>>
>>    private static final int[] LIMITS = { 10, 100, 1000, 10000,
>> Integer.MAX_VALUE };
>>
>>    public static void main(String[] args) throws InterruptedException {
>>      for (final String s : args) {
>>        System.out.println("Testing: " + s);
>>        final File dir = new File(s);
>>
>>        if (dir.isDirectory()) {
>>          for (final int limit : LIMITS) {
>>            final SynchronousQueue<List<File>> queue = new
>> SynchronousQueue<List<File>>();
>>            final CountFilterThread cf = new CountFilterThread(dir,
>> limit, queue);
>>            cf.setDaemon(true);
>>            final long t1 = System.nanoTime();
>>            cf.start();
>>            final List<File> entries = queue.take();
>>            final long delta = System.nanoTime() - t1;
>>            System.out.printf("It took %20dus to retrieve %20d files,
>> %20.5fus/file.\n",
>>                TimeUnit.NANOSECONDS.toMicros(delta), entries.size(),
>> (double) TimeUnit.NANOSECONDS.toMicros(delta)
>>                    / entries.size());
>>          }
>>        } else {
>>          System.out.println("Not a directory.");
>>        }
>>      }
>>
>>      System.out.println("done");
>>    }
>>
>> }
>>
>> https://gist.github.com/4648256
>>
>> It's not guaranteed though that this will be faster.  And it's
>> definitively not simpler than the straight forward approach. :-)
>
> Is that much different from the throw exception in filter solution
> except that it requires a lot more code?

No.

	robert



-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#21785

FromJim Janney <jjanney@shell.xmission.com>
Date2013-01-26 20:57 -0700
Message-ID<ydntxq3z3ao.fsf@shell.xmission.com>
In reply to#21767
Arved Sandstrom <asandstrom2@eastlink.ca> writes:

> On 01/26/2013 09:42 PM, Eric Sosman wrote:
>> On 1/26/2013 6:21 PM, Peter Duniho wrote:
>>> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>>>
>>>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>>>
>>>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>>>> is really the bottleneck.
>>>>>>
>>>>>> If the real bottleneck is the OS calls to get next file, then
>>>>>> a filter like this will not help.
>>>>>
>>>>> Why?
>>>>
>>>>       Because the listFiles() method will fetch the information
>>>> for all 30K files from the O/S, will construct 30K File objects
>>>> to represent them, and will submit all 30K File objects to the
>>>> FileFilter, one by one.  The FileFilter will (very quickly)
>>>> reject 29.99K of the 30K Files, but ...
>>>
>>> Will it?
>>
>>      Necessarily.  As far as listFiles() knows, the FileFilter
>> might accept the very last File object given to it.  Therefore,
>> listFiles() cannot fail to present that very last File -- and
>> every other File -- for inspection.
> [ SNIP ]
>
> I'd have to agree. A simple test shows this to be the case, but your
> reasoning precludes having to run such a test in the first place.
>
> My code "gets' the first N files from listFiles(), for some definition
> of "first", but it certainly doesn't only get N files from the OS.
>
> Based on Wojtek's later post, I'd be examining the entire problem in
> more detail before arriving at a decent solution. I don't think most
> of the problem pertaining to offering reasonable batches of files to a
> Java program for processing is something that I'd address in Java
> anyway.

There's also the problem of starvation, since we have no guarantees
concerning the order of entries in the directory.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#21788

FromWojtek <nowhere@a.com>
Date2013-01-26 21:20 -0800
Message-ID<mn.d5007dd1cea6abb1.70216@a.com>
In reply to#21767
Arved Sandstrom wrote :
> I'd be examining the entire problem in more detail before arriving at a 
> decent solution. I don't think most of the problem pertaining to offering 
> reasonable batches of files to a Java program for processing is something 
> that I'd address in Java anyway.

Events are on a per-user basis, that is to say each user has their own 
event list.

The events are observed when the user logs in. Might be today or next 
week.

To keep server processing reasonable I want to limit the number of 
events sent back to the user at a time (10 was just a number I pulled 
out of the air, obviously some tuning is required, and might even be 
dynamic depending on how busy the rest of the system is).

I have no control over the number of events, how often they occur, nor 
how often a user logs in to look at them. 30K might be the high end, 
though I need to cover it if I get a busy event set and a lazy user.

I might even set up a DB table for each user and store each event file 
as it comes in. Then use the DB to get the file names.

Still white-boarding this...

-- 
Wojtek :-)

[toc] | [prev] | [next] | [standalone]


#21795

FromArved Sandstrom <asandstrom2@eastlink.ca>
Date2013-01-27 07:23 -0400
Message-ID<278Ns.137021$pV4.59710@newsfe21.iad>
In reply to#21788
On 01/27/2013 01:20 AM, Wojtek wrote:
> Arved Sandstrom wrote :
>> I'd be examining the entire problem in more detail before arriving at
>> a decent solution. I don't think most of the problem pertaining to
>> offering reasonable batches of files to a Java program for processing
>> is something that I'd address in Java anyway.
>
> Events are on a per-user basis, that is to say each user has their own
> event list.
>
> The events are observed when the user logs in. Might be today or next week.
>
> To keep server processing reasonable I want to limit the number of
> events sent back to the user at a time (10 was just a number I pulled
> out of the air, obviously some tuning is required, and might even be
> dynamic depending on how busy the rest of the system is).
>
> I have no control over the number of events, how often they occur, nor
> how often a user logs in to look at them. 30K might be the high end,
> though I need to cover it if I get a busy event set and a lazy user.
>
> I might even set up a DB table for each user and store each event file
> as it comes in. Then use the DB to get the file names.
>
> Still white-boarding this...
>
A file is not actually an unreasonable place to keep info for one event. 
You want to store that information *someplace*, and a file is not worse 
than a row in a DB table or a message on a queue somewhere. It's just 
that we don't want to have tens or hundreds of thousands of files in one 
directory.

SIDE NOTE: don't set up a DB table for each user. :-)

Why not use the NIO2 watch service, and observe the event file input 
directory for file creation events? On each such event do something with 
the event file. Number of options here:

1. Move it into a user-specific directory;
2. Append it to a user-specific event file;
3. Put it in a DB.
etc

I sort of like (2) myself.

What do you mean by keeping server processing reasonable?

AHS

[toc] | [prev] | [next] | [standalone]


#21812

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-01-27 20:36 -0500
Message-ID<5105d604$0$281$14726298@news.sunsite.dk>
In reply to#21788
On 1/27/2013 12:20 AM, Wojtek wrote:
> Arved Sandstrom wrote :
>> I'd be examining the entire problem in more detail before arriving at
>> a decent solution. I don't think most of the problem pertaining to
>> offering reasonable batches of files to a Java program for processing
>> is something that I'd address in Java anyway.
>
> Events are on a per-user basis, that is to say each user has their own
> event list.
>
> The events are observed when the user logs in. Might be today or next week.
>
> To keep server processing reasonable I want to limit the number of
> events sent back to the user at a time (10 was just a number I pulled
> out of the air, obviously some tuning is required, and might even be
> dynamic depending on how busy the rest of the system is).
>
> I have no control over the number of events, how often they occur, nor
> how often a user logs in to look at them. 30K might be the high end,
> though I need to cover it if I get a busy event set and a lazy user.
>
> I might even set up a DB table for each user and store each event file
> as it comes in. Then use the DB to get the file names.
>
> Still white-boarding this...

Java 6 no DB:

Spread files out over some subdirs.

Java 7 no DB:

Use new NIO caoabilities.

DB:

Single table for all users and just use index.

Arne

[toc] | [prev] | [next] | [standalone]


#21824

FromWojtek <nowhere@a.com>
Date2013-01-28 16:28 -0800
Message-ID<mn.e3dc7dd1c4cbc071.70216@a.com>
In reply to#21812
Arne Vajhøj wrote :

> DB:
>
> Single table for all users and just use index.

Sigh, this is what comes out when I am really tired and the fermented 
grape juice takes effect.

I have to stop thinking about this stuff on weekends...

-- 
Wojtek :-)

[toc] | [prev] | [next] | [standalone]


#21769

FromArne Vajhøj <arne@vajhoej.dk>
Date2013-01-26 21:23 -0500
Message-ID<51048f97$0$284$14726298@news.sunsite.dk>
In reply to#21760
On 1/26/2013 6:21 PM, Peter Duniho wrote:
> On Sat, 26 Jan 2013 17:06:07 -0500, Eric Sosman wrote:
>
>> On 1/26/2013 4:15 PM, Robert Klemme wrote:
>>> On 26.01.2013 19:26, Arne Vajhøj wrote:
>>>
>>>> But I am a bit skeptical about whether a String[] with 30K elements
>>>> is really the bottleneck.
>>>>
>>>> If the real bottleneck is the OS calls to get next file, then
>>>> a filter like this will not help.
>>>
>>> Why?
>>
>>       Because the listFiles() method will fetch the information
>> for all 30K files from the O/S, will construct 30K File objects
>> to represent them, and will submit all 30K File objects to the
>> FileFilter, one by one.  The FileFilter will (very quickly)
>> reject 29.99K of the 30K Files, but ...
>
> Will it?
>
> It is plausible that the implementation of listFiles() uses an OS API that
> enumerates files one at a time. On Windows, getting the first file of the
> enumeration is faster than asking for all the files at once.
>
> Indeed, I suppose one could throw an exception from the FileFilter accept()
> method to interrupt enumeration, if that's how listFiles() is implemented.
> That would avoid the need to enumerate more than the needed number of
> actual files.
>
> Of course, this is all implementation-dependent and since it's not
> explicitly documented, could change at any time anyway.  But unless you've
> actually examined the implementation details for listFiles(), it's not a
> foregone conclusion that the technique of using a FileFilter offers no way
> to improve latency.

It is a foregone conclusion that the posted code that Eric commented
on would read all files, because it did not throw an exception.

Code with a different logic could behave differently.

Arne

[toc] | [prev] | [next] | [standalone]


#21778

FromRoedy Green <see_website@mindprod.com.invalid>
Date2013-01-26 19:09 -0800
Message-ID<2469g89u7vpchs8lo0lbc7dh7lrtldslor@4ax.com>
In reply to#21760
On Sat, 26 Jan 2013 15:21:53 -0800, Peter Duniho
<NpOeStPeAdM@NnOwSlPiAnMk.com> wrote, quoted or indirectly quoted
someone who said :

>Indeed, I suppose one could throw an exception from the FileFilter accept()
>method to interrupt enumeration, if that's how listFiles() is implemented.
>That would avoid the need to enumerate more than the needed number of
>actual files.

you could resolve that question with some System.nanotime dumps.  How
long for first to show up relative to others. IIRC is builds the array
then feeds it to the Filter, but that could just have been someone
explaining how it works conceptually.


I do know that Java takes a lot longer to span a disk than C .
Building the array first means less native code needed for
multiplatform implementation.
For most applications, you need to run every file name through the
filter so it does not matter which you do first. You would save
building File objects for items not passing the Filter.
-- 
Roedy Green Canadian Mind Products http://mindprod.com
The first 90% of the code accounts for the first 90% of the development time.
The remaining 10% of the code accounts for the other 90% of the development 
time. 
~ Tom Cargill  Ninety-ninety Law 

[toc] | [prev] | [next] | [standalone]


#21758

FromJim Janney <jjanney@shell.xmission.com>
Date2013-01-26 16:00 -0700
Message-ID<ydnbocb1rdx.fsf@shell.xmission.com>
In reply to#21727
Wojtek <nowhere@a.com> writes:

> Using:
>
> int max = 10;
> int count = 0;
>
> for (File thisFile : aDir.listFiles())
> {
>  doSomething(thisFile);
>
>  if ( ++count >= max )
>    break;
> }
>
> gives me the first ten files in aDir. But if aDir contains 30K files,
> then the listFiles() will run for a long time as it builds an array
> for the 30K files.
>
> Is there a way to have Java only get the first "max" files?

As Roedy says, in pure Java there's no way to avoid reading the entire
directory, whether it builds the entire array or not.  And if you want
them in any particular order it's necessary to read them all and sort
them anyway.

-- 
Jim Janney

[toc] | [prev] | [next] | [standalone]


#21772

FromKnute Johnson <nospam@knutejohnson.com>
Date2013-01-26 18:37 -0800
Message-ID<ke23sm$afb$1@dont-email.me>
In reply to#21727
On 1/26/2013 1:14 AM, Wojtek wrote:
> Using:
>
> int max = 10;
> int count = 0;
>
> for (File thisFile : aDir.listFiles())
> {
>   doSomething(thisFile);
>
>   if ( ++count >= max )
>     break;
> }
>
> gives me the first ten files in aDir. But if aDir contains 30K files,
> then the listFiles() will run for a long time as it builds an array for
> the 30K files.
>
> Is there a way to have Java only get the first "max" files?
>

import java.io.*;
import java.nio.*;
import java.nio.file.*;

public class FileSystemsTest {
     public static void main(String[] args) throws IOException {
         long start = System.currentTimeMillis();
         Path dir = FileSystems.getDefault().getPath(".");
         int i=10;
         DirectoryStream<Path> stream = Files.newDirectoryStream(dir);
         for (Path path : stream) {
             System.out.println(path.getFileName());
             if (--i <= 0)
                 break;
         }
         long stop = System.currentTimeMillis();
         System.out.println(stop - start);
     }
}

300003 files in the directory, almost 1.7GB of files, Windows XP, Java 7 
and it takes 16 ms to run.  Somebody else should try this out on their 
computer to see if it works as fast.

.
.
.
01/26/2013  05:46 PM            58,890 9998.txt
01/26/2013  05:46 PM            58,890 9999.txt
01/26/2013  06:31 PM             1,316 FileSystemsTest.class
01/26/2013  06:29 PM               636 FileSystemsTest.java
01/26/2013  05:44 PM               650 MakeFiles.java
            30003 File(s)  1,766,702,602 bytes
                2 Dir(s)  49,387,085,824 bytes free

C:\Documents and Settings\Knute Johnson\bigdirectory>java FileSystemsTest
0.txt
1.txt
10.txt
100.txt
1000.txt
10000.txt
10001.txt
10002.txt
10003.txt
10004.txt
16

C:\Documents and Settings\Knute Johnson\bigdirectory>


-- 

Knute Johnson

[toc] | [prev] | [next] | [standalone]


#22952

FromWojtek <nowhere@a.com>
Date2013-03-14 03:07 -0700
Message-ID<mn.70bb7dd3ff682a5b.70216@a.com>
In reply to#21772
Knute Johnson wrote :
>
> 300003 files in the directory, almost 1.7GB of files, Windows XP, Java 7 and 
> it takes 16 ms to run.  Somebody else should try this out on their computer 
> to see if it works as fast.

Ok, I'm back :-)

I am using WinXP and Java 7, and the directory holds 30,001 32K files,
920MBytes

The Code:
----------------------------------------------------
package tester;

import java.io.File;
import java.io.IOException;
import java.nio.file.DirectoryStream;
import java.nio.file.FileSystems;
import java.nio.file.Files;
import java.nio.file.Path;

public class NewsGroup
{
  public static void main( String[] args ) throws IOException
  {
    int maxFiles = 10;

    System.out.println( "Large File Number Tester" );

    if (args[0].equals( "nio" ))
      nioRun( "C:\\apps\\test", maxFiles );

    else if (args[0].equals( "io" ))
      ioRun( "C:\\apps\\test", maxFiles );

    else
      System.out.println( "NewsGroup io|nio" );

  }

  private static void ioRun( String filePath, int maxFiles ) throws 
IOException
  {
    int i = 1;

    System.out.println( "IO run" );
    long start = System.currentTimeMillis();

    File folder = new File( filePath );
    File[] listOfFiles = folder.listFiles();

    for (File file : listOfFiles)
    {
      System.out.println( "" + i + ": " + file.getName() );

      if (++i > maxFiles)
        break;
    }

    long stop = System.currentTimeMillis();

    System.out.println( "Elapsed: " + (stop - start) + " ms" );
  }

  private static void nioRun( String filePath, int maxFiles ) throws 
IOException
  {
    int i = 1;

    System.out.println( "NIO run" );
    long start = System.currentTimeMillis();

    Path dir = FileSystems.getDefault().getPath( filePath );
    DirectoryStream<Path> stream = Files.newDirectoryStream( dir );

    for (Path path : stream)
    {
      System.out.println( "" + i + ": " + path.getFileName() );

      if (++i > maxFiles)
        break;
    }

    long stop = System.currentTimeMillis();

    System.out.println( "Elapsed: " + (stop - start) + " ms" );
  }
}
----------------------------------------------------

A batch file to run it:
----------------------------------------------------
@echo off
java -jar NewsGroup.jar %1
----------------------------------------------------

And the results:
----------------------------------------------------
C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 156 ms

C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 140 ms

C:\apps>run io
Large File Number Tester
IO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 156 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 219 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 31 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 31 ms

C:\apps>run nio
Large File Number Tester
NIO run
1: F_000000.jpg
2: F_000001.jpg
3: F_000002.jpg
4: F_000003.jpg
5: F_000004.jpg
6: F_000005.jpg
7: F_000006.jpg
8: F_000007.jpg
9: F_000008.jpg
10: F_000009.jpg
Elapsed: 78 ms

C:\apps>
----------------------------------------------------

So NIO is about 4-5 times faster than IO. The first NIO run looks like 
an anomoly, might be some JRE loading happening.

All the runs produce different timings, might be a Windows caching 
effect. However the NIO is consistently much faster overall.

-- 
Wojtek :-)

[toc] | [prev] | [next] | [standalone]


#22953

Fromlipska the kat <"nospam at neversurrender dot co dot uk">
Date2013-03-14 12:49 +0000
Message-ID<RvudnYCk24DlWtzMnZ2dnUVZ8iqdnZ2d@bt.com>
In reply to#22952
On 14/03/13 10:07, Wojtek wrote:
> Knute Johnson wrote :
>>
>> 300003 files in the directory, almost 1.7GB of files, Windows XP, Java
>> 7 and it takes 16 ms to run. Somebody else should try this out on
>> their computer to see if it works as fast.
>
> Ok, I'm back :-)

[snip]

Ubuntu Linux 12.04 64 bit, default kernel
Intel® Pentium(R) CPU B960 @ 2.20GHz × 2
java-7-openjdk-amd64 64-Bit Server VM (build 22.0-b10, mixed mode)
4GB RAM
ls /var/images | wc -l
1292

Find below the results for 10 runs for a count of 10 files
results for 100 runs for a count of 100 files ommitting the filename 
output but executing file.getName(); or path.getFileName(); available here

http://pastebin.com/tyFni9xA


Large File Number Tester

============= Run 1 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 36 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 9 ms

============= Run 2 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 13 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 2 ms

============= Run 3 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 19 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 1 ms

============= Run 4 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 17 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 2 ms

============= Run 5 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 8 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 1 ms

============= Run 6 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 7 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 0 ms

============= Run 7 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 7 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 1 ms

============= Run 8 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 9 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 2 ms

============= Run 9 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 6 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 2 ms

============= Run 10 ====================
IO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 6 ms
----------
NIO run
1: tn_c0DgCQGwMklC.jpg
2: tn_7O3dRExnb1z5.jpg
3: tn_rljSB5H8yWGr.jpg
4: BpBBTq5FmOem.jpg
5: tn_FSc7Cn8S2KQP.jpg
6: tn_qRTnPNB7BdQq.jpg
7: IGbwhM3DsMlu.jpg
8: tn_VSetEaWqbuMD.jpg
9: IR081OywEjqb.jpg
10: cvNziQ9ecLeq.jpg
Elapsed: 2 ms


lipska

-- 
Lipska the Kat©: Troll hunter, sandbox destroyer
and farscape dreamer of Aeryn Sun

[toc] | [prev] | [next] | [standalone]


#22958

FromRobert Klemme <shortcutter@googlemail.com>
Date2013-03-15 11:38 +0100
Message-ID<aqgc14FilbeU1@mid.individual.net>
In reply to#22952
On 14.03.2013 11:07, Wojtek wrote:

> So NIO is about 4-5 times faster than IO. The first NIO run looks like
> an anomoly, might be some JRE loading happening.
>
> All the runs produce different timings, might be a Windows caching
> effect. However the NIO is consistently much faster overall.

I am not convinced that this conclusion is warranted.  There are a few 
factors which I believe make your conclusion questionable:

- You included class loading time in your measurement.  For example, 
assuming that all io functionality is implemented on top of nio it would 
be logical to expect more classes to be loaded.  There are a number of 
use cases where it matters - but there are also use cases where it 
doesn't matter (long running servers).

- Generally we are dealing with quite low timings (around 100ms) and 
relatively high variations.  Also the test was made on Windows and the 
System.currentTimeMillis() is known to be imprecise on that platform (in 
the order of tens of milliseconds).

- Your io approach does not use FileFilter which some have suggested to 
be a way to avoid constructing a large result array.

- The test is an artificial situation.  With all factors like JVM 
involved it may be that in a realistic application things look different 
to an extent that the differences you measured here do not matter any more.

Kind regards

	robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]


#22960

FromWojtek <nowhere@a.com>
Date2013-03-15 10:31 -0700
Message-ID<mn.7a777dd3df9ee7cf.70216@a.com>
In reply to#22958
Robert Klemme wrote :
> On 14.03.2013 11:07, Wojtek wrote:
>
>> So NIO is about 4-5 times faster than IO. The first NIO run looks like
>> an anomoly, might be some JRE loading happening.
>>
>> All the runs produce different timings, might be a Windows caching
>> effect. However the NIO is consistently much faster overall.
>
> I am not convinced that this conclusion is warranted.  There are a few 
> factors which I believe make your conclusion questionable:
>
> - You included class loading time in your measurement.  For example, assuming 
> that all io functionality is implemented on top of nio it would be logical to 
> expect more classes to be loaded.  There are a number of use cases where it 
> matters - but there are also use cases where it doesn't matter (long running 
> servers).

The class loading would be part of a real project. The alternative is 
to keep an object around which holds a link to a directory. Actually 
many objects linked to many directories. Also the file list will change 
as files are added and deleted.

> - Generally we are dealing with quite low timings (around 100ms) and 
> relatively high variations.  Also the test was made on Windows and the 
> System.currentTimeMillis() is known to be imprecise on that platform (in the 
> order of tens of milliseconds).

Fair enough.

> - Your io approach does not use FileFilter which some have suggested to be a 
> way to avoid constructing a large result array.

Yes, but to filter the result still means loading each file name, then 
checking to see if it matches the filter.

> - The test is an artificial situation.  With all factors like JVM involved it 
> may be that in a realistic application things look different to an extent 
> that the differences you measured here do not matter any more.

While the absolute times may be questionable, the relative times are 
consistent.

-- 
Wojtek :-)

[toc] | [prev] | [standalone]


Page 2 of 2 — ← Prev page 1 [2]

Back to top | Article view | comp.lang.java.programmer


csiph-web