Groups > comp.lang.java.programmer > #16700 > unrolled thread

How can you make idle processors pick up java work?

Started by	qwertmonkey@syberianoutpost.ru
First post	2012-07-31 07:14 +0000
Last post	2012-07-31 06:33 -0700
Articles	4 — 4 participants

Back to article view | Back to comp.lang.java.programmer

  How can you make idle processors pick up java work? qwertmonkey@syberianoutpost.ru - 2012-07-31 07:14 +0000
    Re: How can you make idle processors pick up java work? Joerg Meier <joergmmeier@arcor.de> - 2012-07-31 12:13 +0200
    Re: How can you make idle processors pick up java work? Joshua Cranmer <Pidgeot18@verizon.invalid> - 2012-07-31 07:41 -0400
    Re: How can you make idle processors pick up java work? Patricia Shanahan <pats@acm.org> - 2012-07-31 06:33 -0700

#16700 — How can you make idle processors pick up java work?

From	qwertmonkey@syberianoutpost.ru
Date	2012-07-31 07:14 +0000
Subject	How can you make idle processors pick up java work?
Message-ID	<jv80k4$9kk$1@speranza.aioe.org>

~ 
> How slow is the NL processing?
~ 
> Does it make any sense to read lines in one thread and pass each off 
to one of the iPrx-1 other threads that might run on separate processors? 
~ 
 I don't think this would make sense. All sentences are short and all I 
need to do is basically scan them and use look-up tables to do some tinkering 
with the code points. The scheduling of threads and constant context switching
will most probably make things slower
~ 
 OK this is the piece of the code I am trying to optimize and the results
I get, using a large enough file with sentences:
~ 
 http://corpora.informatik.uni-leipzig.de/download.html
~ 
 http://corpora.uni-leipzig.de/downloads/deu_news_2008_10M-text.tar.gz
~ 
 inside of the tar ball there is a file with just sentences:
~ 
$ ls -l deu_news_2008_10M-sentences.txt
-rw-r--r-- 1 knoppix knoppix 1235804164 May 28  2011
 deu_news_2008_10M-sentences.txt

$ md5sum -b deu_news_2008_10M-sentences.txt
23041587b6414d1a1a56c9c389d3c18f *deu_news_2008_10M-sentences.txt

$ wc -l deu_news_2008_10M-sentences.txt
10000000 deu_news_2008_10M-sentences.txt
~ 
 Again, do you know of any faster way to go about reading the sentences of
such large files and getting their code points?
 lbrtchx
~ 
import java.nio.file.FileSystems;
import java.nio.file.Path;
import java.nio.file.Files;
import java.nio.charset.Charset;

import java.io.BufferedReader;
import java.io.IOException;

// __ 
public class NIO2_newBufferedReader02Test{
 private static final String aNWLn = System.getProperty("line.separator");
// __ 
 public static void main(String[] aArgs){

  if((aArgs != null) && (aArgs.length == 1)){
   long lTm00 = System.currentTimeMillis();
   long lLns = 0;
   int iTtlRdKdPnts = 0;
   BufferedReader BfR = null;
   Path IFlPth = FileSystems.getDefault().getPath(aArgs[0]);
   long lIFlL = IFlPth.toFile().length();
   int iKdPnt, iSxL;

   StringBuilder aBldr = new StringBuilder(1024);
// __ 
   try{
    BfR = Files.newBufferedReader(IFlPth, Charset.forName("UTF-8"));
    String aSx = BfR.readLine();
    while(aSx != null){
     iSxL = aSx.length();
     if(iSxL > 0){
      for(int i = 0; (i < iSxL); ++i){
       iKdPnt = aSx.codePointAt(i); ++iTtlRdKdPnts;
       aBldr.appendCodePoint(iKdPnt);
      }
// __ 
      aBldr.delete(0, aBldr.length());
     }// (iSxL > 0)
     ++lLns;
     aSx = BfR.readLine();
    }// (aSx != null)

    BfR.close();
// __ 
    System.err.println("// __ reading |" + lIFlL  + "|  bytes long text file
with |" +  lLns + "| lines took |" + (System.currentTimeMillis() - lTm00) +
"| (ms)");
    System.err.println("// __ iTtlRdKdPnts: |" + iTtlRdKdPnts + "|");
   }catch(IOException IOX) { IOX.printStackTrace(System.err); }
  }
  else{ System.err.println("// __ usage:" + aNWLn + aNWLn + 
" java NIO2_newBufferedReader02Test \"<text file>\"" + aNWLn); }
 }
}

~ 
$ java -version
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13)
Java HotSpot(TM) Server VM (build 22.0-b10, mixed mode)
~ 
$ free
             total       used       free     shared    buffers     cached
Mem:       4051236     719224    3332012          0      22008     408260
-/+ buffers/cache:     288956    3762280
Swap:      3038424          0    3038424
~ 
$ javac -encoding utf8 NIO2_newBufferedReader02Test.java
~ 
$ date; java -Xms256m -Xmx1024m -Xincgc -Dfile.encoding=utf8 
NIO2_newBufferedReader02Test /media/sdb1/tmp/eng_news_2006_10M-sentences.txt;
date;
~ 
Tue Jul 31 02:05:04 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |41922| (ms)
Tue Jul 31 02:05:46 UTC 2012
~ 
Tue Jul 31 02:05:51 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |27299| (ms)
Tue Jul 31 02:06:19 UTC 2012
~ 
Tue Jul 31 02:06:22 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |28180| (ms)
Tue Jul 31 02:06:50 UTC 2012
~ 
Tue Jul 31 02:26:43 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |35388| (ms)
Tue Jul 31 02:27:18 UTC 2012
~
Tue Jul 31 02:27:21 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |38155| (ms)
Tue Jul 31 02:28:00 UTC 2012
~
Tue Jul 31 02:30:40 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |41099| (ms)
Tue Jul 31 02:31:21 UTC 2012

[toc] | [next] | [standalone]

#16701

From	Joerg Meier <joergmmeier@arcor.de>
Date	2012-07-31 12:13 +0200
Message-ID	<1tfjicp56idnk$.9wz2r0rvroy8.dlg@40tude.net>
In reply to	#16700

You might have more luck reading the whole file at once and then looping
through it once it's read. IO is generally slow, and COULD be your bottle
neck. But really, your first step should be using a profiler so you don't
have to guess as to what's slow.

Liebe Gruesse,
		Joerg

-- 
Ich lese meine Emails nicht, replies to Email bleiben also leider
ungelesen.

[toc] | [prev] | [next] | [standalone]

#16704

From	Joshua Cranmer <Pidgeot18@verizon.invalid>
Date	2012-07-31 07:41 -0400
Message-ID	<jv8g91$kr1$1@dont-email.me>
In reply to	#16700

On 7/31/2012 3:14 AM, qwertmonkey@syberianoutpost.ru wrote:
>   I don't think this would make sense. All sentences are short and all I
> need to do is basically scan them and use look-up tables to do some tinkering
> with the code points. The scheduling of threads and constant context switching
> will most probably make things slower

In this case, the limiting factor is probably not going to be your CPU 
but your disk drive.

-- 
Beware of bugs in the above code; I have only proved it correct, not 
tried it. -- Donald E. Knuth

[toc] | [prev] | [next] | [standalone]

#16706

From	Patricia Shanahan <pats@acm.org>
Date	2012-07-31 06:33 -0700
Message-ID	<c_idnTmcjekFQ4rNnZ2dnUVZ_hGdnZ2d@earthlink.com>
In reply to	#16700

On 7/31/2012 12:14 AM, qwertmonkey@syberianoutpost.ru wrote:
...
>   I don't think this would make sense. All sentences are short and all I
> need to do is basically scan them and use look-up tables to do some tinkering
> with the code points. The scheduling of threads and constant context switching
> will most probably make things slower

What context switching? This started out as a question about spare
processors sitting idle while there is work to do.

Step 1 is to decide whether this workload is CPU bound or I/O bound. I
assumed CPU bound because of the initial question about idle processors,
but now it sounds as though the processing is trivial so the workload
may be I/O bound.

If it is I/O bound you may not need more than one thread, but maybe
should be looking at using non-blocking I/O to manage your own prefetches.

Patricia

[toc] | [prev] | [standalone]

csiph-web

How can you make idle processors pick up java work?

Contents

#16700 — How can you make idle processors pick up java work?

#16701

#16704

#16706