Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.java.programmer > #16884 > unrolled thread

How can you make idle processors pick up java work?

Started by"qwertmonkey" <qwertmonkey@1:261/38.remove-x1c-this>
First post2012-07-31 20:07 +0000
Last post2012-07-31 20:07 +0000
Articles 4 — 4 participants

Back to article view | Back to comp.lang.java.programmer


Contents

  How can you make idle processors pick up java work? "qwertmonkey" <qwertmonkey@1:261/38.remove-x1c-this> - 2012-07-31 20:07 +0000
    Re: How can you make idle processors pick up java work? "Joerg Meier" <joerg.meier@1:261/38.remove-x1c-this> - 2012-07-31 20:07 +0000
    Re: How can you make idle processors pick up java work? "Joshua Cranmer" <joshua.cranmer@1:261/38.remove-x1c-this> - 2012-07-31 20:07 +0000
    Re: How can you make idle processors pick up java work? "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-x1c-this> - 2012-07-31 20:07 +0000

#16884 — How can you make idle processors pick up java work?

From"qwertmonkey" <qwertmonkey@1:261/38.remove-x1c-this>
Date2012-07-31 20:07 +0000
SubjectHow can you make idle processors pick up java work?
Message-ID<50182C82.55852.calajapr@time.synchro.net>
From: "qwertmonkey" <qwertmonkey@1:261/38.remove-dpk-this>

From: qwertmonkey@syberianoutpost.ru

~
> How slow is the NL processing?
~
> Does it make any sense to read lines in one thread and pass each off
to one of the iPrx-1 other threads that might run on separate processors? ~
 I don't think this would make sense. All sentences are short and all I
need to do is basically scan them and use look-up tables to do some tinkering 
with the code points. The scheduling of threads and constant context switching 
will most probably make things slower ~
 OK this is the piece of the code I am trying to optimize and the results
I get, using a large enough file with sentences: ~
 http://corpora.informatik.uni-leipzig.de/download.html
~
 http://corpora.uni-leipzig.de/downloads/deu_news_2008_10M-text.tar.gz
~
 inside of the tar ball there is a file with just sentences:
~
$ ls -l deu_news_2008_10M-sentences.txt
-rw-r--r-- 1 knoppix knoppix 1235804164 May 28  2011
 deu_news_2008_10M-sentences.txt

$ md5sum -b deu_news_2008_10M-sentences.txt
23041587b6414d1a1a56c9c389d3c18f *deu_news_2008_10M-sentences.txt

$ wc -l deu_news_2008_10M-sentences.txt
10000000 deu_news_2008_10M-sentences.txt ~
 Again, do you know of any faster way to go about reading the sentences of
such large files and getting their code points?
 lbrtchx
~
import java.nio.file.FileSystems;
import java.nio.file.Path;
import java.nio.file.Files;
import java.nio.charset.Charset;

import java.io.BufferedReader;
import java.io.IOException;

// __
public class NIO2_newBufferedReader02Test{
 private static final String aNWLn = System.getProperty("line.separator");
// __
 public static void main(String[] aArgs){

  if((aArgs != null) && (aArgs.length == 1)){
   long lTm00 = System.currentTimeMillis();
   long lLns = 0;
   int iTtlRdKdPnts = 0;
   BufferedReader BfR = null;
   Path IFlPth = FileSystems.getDefault().getPath(aArgs[0]);
   long lIFlL = IFlPth.toFile().length();
   int iKdPnt, iSxL;

   StringBuilder aBldr = new StringBuilder(1024);
// __
   try{
    BfR = Files.newBufferedReader(IFlPth, Charset.forName("UTF-8"));
    String aSx = BfR.readLine();
    while(aSx != null){
     iSxL = aSx.length();
     if(iSxL > 0){
      for(int i = 0; (i < iSxL); ++i){
       iKdPnt = aSx.codePointAt(i); ++iTtlRdKdPnts;
       aBldr.appendCodePoint(iKdPnt);
      }
// __
      aBldr.delete(0, aBldr.length());
     }// (iSxL > 0)
     ++lLns;
     aSx = BfR.readLine();
    }// (aSx != null)

    BfR.close();
// __
    System.err.println("// __ reading |" + lIFlL  + "|  bytes long text file
with |" +  lLns + "| lines took |" + (System.currentTimeMillis() - lTm00) + "| 
(ms)");
    System.err.println("// __ iTtlRdKdPnts: |" + iTtlRdKdPnts + "|");
   }catch(IOException IOX) { IOX.printStackTrace(System.err); }
  }
  else{ System.err.println("// __ usage:" + aNWLn + aNWLn +
" java NIO2_newBufferedReader02Test \"<text file>\"" + aNWLn); }
 }
}

~
$ java -version
java version "1.7.0_02"
Java(TM) SE Runtime Environment (build 1.7.0_02-b13) Java HotSpot(TM) Server VM 
(build 22.0-b10, mixed mode) ~
$ free
             total       used       free     shared    buffers     cached
Mem:       4051236     719224    3332012          0      22008     408260
-/+ buffers/cache:     288956    3762280
Swap:      3038424          0    3038424
~
$ javac -encoding utf8 NIO2_newBufferedReader02Test.java
~
$ date; java -Xms256m -Xmx1024m -Xincgc -Dfile.encoding=utf8
NIO2_newBufferedReader02Test /media/sdb1/tmp/eng_news_2006_10M-sentences.txt; 
date;
~
Tue Jul 31 02:05:04 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |41922| (ms)
Tue Jul 31 02:05:46 UTC 2012
~
Tue Jul 31 02:05:51 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |27299| (ms)
Tue Jul 31 02:06:19 UTC 2012
~
Tue Jul 31 02:06:22 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |28180| (ms)
Tue Jul 31 02:06:50 UTC 2012
~
Tue Jul 31 02:26:43 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |35388| (ms)
Tue Jul 31 02:27:18 UTC 2012
~
Tue Jul 31 02:27:21 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |38155| (ms)
Tue Jul 31 02:28:00 UTC 2012
~
Tue Jul 31 02:30:40 UTC 2012
// __ reading |1280939143|  bytes long text file with |10000000| lines took
 |41099| (ms)
Tue Jul 31 02:31:21 UTC 2012

-+- BBBS/Li6 v4.10 Dada-1
 + Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [next] | [standalone]


#16885

From"Joerg Meier" <joerg.meier@1:261/38.remove-x1c-this>
Date2012-07-31 20:07 +0000
Message-ID<50182C82.55853.calajapr@time.synchro.net>
In reply to#16884
  To: qwertmonkey
From: "Joerg Meier" <joerg.meier@1:261/38.remove-dpk-this>

  To: qwertmonkey
From: Joerg Meier <joergmmeier@arcor.de>

You might have more luck reading the whole file at once and then looping 
through it once it's read. IO is generally slow, and COULD be your bottle neck. 
But really, your first step should be using a profiler so you don't have to 
guess as to what's slow.

Liebe Gruesse,
                Joerg

--
Ich lese meine Emails nicht, replies to Email bleiben also leider ungelesen.

-+- BBBS/Li6 v4.10 Dada-1
 + Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#16888

From"Joshua Cranmer" <joshua.cranmer@1:261/38.remove-x1c-this>
Date2012-07-31 20:07 +0000
Message-ID<50182C83.55856.calajapr@time.synchro.net>
In reply to#16884
  To: qwertmonkey
From: "Joshua Cranmer" <joshua.cranmer@1:261/38.remove-dpk-this>

  To: qwertmonkey
From: Joshua Cranmer <Pidgeot18@verizon.invalid>

On 7/31/2012 3:14 AM, qwertmonkey@syberianoutpost.ru wrote:
>   I don't think this would make sense. All sentences are short and all I
> need to do is basically scan them and use look-up tables to do some tinkering
> with the code points. The scheduling of threads and constant context
switching
> will most probably make things slower

In this case, the limiting factor is probably not going to be your CPU but your 
disk drive.

--
Beware of bugs in the above code; I have only proved it correct, not tried it.
-- Donald E. Knuth

-+- BBBS/Li6 v4.10 Dada-1
 + Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [next] | [standalone]


#16890

From"Patricia Shanahan" <patricia.shanahan@1:261/38.remove-x1c-this>
Date2012-07-31 20:07 +0000
Message-ID<50182C83.55858.calajapr@time.synchro.net>
In reply to#16884
  To: qwertmonkey
From: "Patricia Shanahan" <patricia.shanahan@1:261/38.remove-dpk-this>

  To: qwertmonkey
From: Patricia Shanahan <pats@acm.org>

On 7/31/2012 12:14 AM, qwertmonkey@syberianoutpost.ru wrote: ...
>   I don't think this would make sense. All sentences are short and all I
> need to do is basically scan them and use look-up tables to do some tinkering
> with the code points. The scheduling of threads and constant context
switching
> will most probably make things slower

What context switching? This started out as a question about spare processors 
sitting idle while there is work to do.

Step 1 is to decide whether this workload is CPU bound or I/O bound. I assumed 
CPU bound because of the initial question about idle processors, but now it 
sounds as though the processing is trivial so the workload may be I/O bound.

If it is I/O bound you may not need more than one thread, but maybe should be 
looking at using non-blocking I/O to manage your own prefetches.

Patricia

-+- BBBS/Li6 v4.10 Dada-1
 + Origin: Prism bbs (1:261/38)
-+- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

--- BBBS/Li6 v4.10 Dada-1
 * Origin: Prism bbs (1:261/38)
--- Synchronet 3.16a-Win32 NewsLink 1.98
Time Warp of the Future BBS - telnet://time.synchro.net:24

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.java.programmer


csiph-web