Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3114

Re: Need for speed -> a C extension?

From Martin Hansen <mail@maasha.dk>
Newsgroups comp.lang.ruby
Subject Re: Need for speed -> a C extension?
Date 2011-04-18 13:30 -0500
Organization Service de news de lacave.net
Message-ID <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> (permalink)
References <c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com> <iohsms02e2q@enews2.newsguy.com>

Show all headers | View raw


WJ wrote in post #993576:
> Martin Hansen wrote:
>
>> The below code is too slow for practical use. I need it to run at least
>> 20 times faster. Perhaps that is possible with some C code? I have no
>> experience with writing Ruby extensions. What are the pitfalls? Which
>> part of the code should be ported? Any pointers to get me started?
>
> Please give a clear description of the algorithm, and then
> give some sample input and output.


Here is a working version of the code that can be profiled (though it 
will take forever with 20M iterations):

http://pastie.org/1808127

The slow part according to profiler is:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 29.39     2.66      2.66     1521     1.75    11.78  Range#each
 15.80     4.09      1.43    33000     0.04     0.06  Seq#match?
 10.72     5.06      0.97    78940     0.01     0.03  Kernel.dup
  9.28     5.90      0.84    78940     0.01     0.01 
Kernel.initialize_dup
  6.63     6.50      0.60   142380     0.00     0.00 
Seq::Score#edit_distance
  5.30     6.98      0.48    22220     0.02     0.03  Seq#deletion?
  3.54     7.30      0.32    66016     0.00     0.00  String#ord
  3.43     7.61      0.31    14680     0.02     0.04  Seq#mismatch?
  3.31     7.91      0.30     8300     0.04     0.05  Seq#insertion?


The input is DNA sequences. Basically strings of ATCG and Ns of length 
50-100. These comes in files with 20M-30M sequences per file. I've got 
~50 of these files and more incoming. The output will be truncated 
sequences based on the match position located with this bit of code.

The algorithm is of the dynamic programming flavor and was inspired by 
the paper by Bruno Woltzenlogel Paleo (page 197):

http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf

Locating variable length matches is tricky!



Cheers,

Martin

-- 
Posted via http://www.ruby-forum.com/.

Back to comp.lang.ruby | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
  Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
    Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
      Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
        Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
          Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
          Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
                Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
                Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
              Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
  Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
    Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200

csiph-web