Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3159

Re: Need for speed -> a C extension?

From Robert Klemme <shortcutter@googlemail.com>
Newsgroups comp.lang.ruby
Subject Re: Need for speed -> a C extension?
Date 2011-04-19 07:21 -0500
Organization Service de news de lacave.net
Message-ID <BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com> (permalink)
References <c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com> <iohsms02e2q@enews2.newsguy.com> <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> <2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com> <620c883309d2336f7741bf716b688aaf@ruby-forum.com>

Show all headers | View raw


On Tue, Apr 19, 2011 at 12:30 PM, Martin Hansen <mail@maasha.dk> wrote:
>>>  def match?(char1, char2)
>>>    (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
>>>  end
>>
>> That would be really easy to write tests and a benchmark against and
>> then rewrite in C using RubyInline... without tests and a benchmark tho,
>> you won't know that you've done it correctly and provides a measurable
>> benefit.
>
> They bottlenecks are the iteration over the sequence (while loop at line
> 68) and the vector_update (line 120). I am a bit surprised that match?
> is that slow - I expect it to be almost instantaneous in C. I would like
> to test that with a benchmark and Inline C.

Frankly, I find your code has a design issue: it seems you mix data
and iteration in a single class.  This is visible from how #match
works

def match(pattern, pos = 0, max_edit_distance = 0)
    @pattern           = pattern
    @pos               = pos
    @max_edit_distance = max_edit_distance
    @vector            = vector_init

..

IMHO it would be better to separate representation of the sequence and
the matching process.  The matcher then would only carry a reference
to the sequence and all the data it needs to do matching.

Also #vector_update creates a lot of objects and does so for each
position in the sequence.  That's likely where you can improve things.

I am not sure what the matching algorithm is exactly.  Can you summarize it?

Ah, and one thing: if you add lowercase entries

EQUAL['a'.ord] = BIT_A
EQUAL['t'.ord] = BIT_T
EQUAL['u'.ord] = BIT_T
EQUAL['c'.ord] = BIT_C
EQUAL['g'.ord] = BIT_G
EQUAL['m'.ord] = (BIT_A|BIT_C)
EQUAL['r'.ord] = (BIT_A|BIT_G)
EQUAL['w'.ord] = (BIT_A|BIT_T)
EQUAL['s'.ord] = (BIT_C|BIT_G)
EQUAL['y'.ord] = (BIT_C|BIT_T)
EQUAL['k'.ord] = (BIT_G|BIT_T)
EQUAL['b'.ord] = (BIT_C|BIT_G|BIT_T)
EQUAL['d'.ord] = (BIT_A|BIT_G|BIT_T)
EQUAL['h'.ord] = (BIT_A|BIT_C|BIT_T)
EQUAL['v'.ord] = (BIT_A|BIT_C|BIT_G)
EQUAL['n'.ord] = (BIT_A|BIT_C|BIT_G|BIT_T)

you can make matching simpler

def match?(char1, char2)
    (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
end

You might as well consider changing the Array into a Hash.  Then you
can even get rid of the #ord call.

A different approach would just use regular expressions, e.g.

MAP = {
  'a' => 1,
  't' => 2,
  'w' => '[12]',
..
}

seq = Array.new(20) { %w{1 2 4 8}.sample }
pat = "acgw"
rx = Regexp.new(pat.chars.map {|c| MAP[c.downcase]}.join)

seq.scan pat do |m|
  p m
end

Cheers

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

Back to comp.lang.ruby | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
  Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
    Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
      Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
        Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
          Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
          Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
                Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
                Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
              Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
  Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
    Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200

csiph-web