Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #3159
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.ruby |
| Subject | Re: Need for speed -> a C extension? |
| Date | 2011-04-19 07:21 -0500 |
| Organization | Service de news de lacave.net |
| Message-ID | <BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com> (permalink) |
| References | <c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com> <iohsms02e2q@enews2.newsguy.com> <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> <2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com> <620c883309d2336f7741bf716b688aaf@ruby-forum.com> |
On Tue, Apr 19, 2011 at 12:30 PM, Martin Hansen <mail@maasha.dk> wrote:
>>> def match?(char1, char2)
>>> (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
>>> end
>>
>> That would be really easy to write tests and a benchmark against and
>> then rewrite in C using RubyInline... without tests and a benchmark tho,
>> you won't know that you've done it correctly and provides a measurable
>> benefit.
>
> They bottlenecks are the iteration over the sequence (while loop at line
> 68) and the vector_update (line 120). I am a bit surprised that match?
> is that slow - I expect it to be almost instantaneous in C. I would like
> to test that with a benchmark and Inline C.
Frankly, I find your code has a design issue: it seems you mix data
and iteration in a single class. This is visible from how #match
works
def match(pattern, pos = 0, max_edit_distance = 0)
@pattern = pattern
@pos = pos
@max_edit_distance = max_edit_distance
@vector = vector_init
..
IMHO it would be better to separate representation of the sequence and
the matching process. The matcher then would only carry a reference
to the sequence and all the data it needs to do matching.
Also #vector_update creates a lot of objects and does so for each
position in the sequence. That's likely where you can improve things.
I am not sure what the matching algorithm is exactly. Can you summarize it?
Ah, and one thing: if you add lowercase entries
EQUAL['a'.ord] = BIT_A
EQUAL['t'.ord] = BIT_T
EQUAL['u'.ord] = BIT_T
EQUAL['c'.ord] = BIT_C
EQUAL['g'.ord] = BIT_G
EQUAL['m'.ord] = (BIT_A|BIT_C)
EQUAL['r'.ord] = (BIT_A|BIT_G)
EQUAL['w'.ord] = (BIT_A|BIT_T)
EQUAL['s'.ord] = (BIT_C|BIT_G)
EQUAL['y'.ord] = (BIT_C|BIT_T)
EQUAL['k'.ord] = (BIT_G|BIT_T)
EQUAL['b'.ord] = (BIT_C|BIT_G|BIT_T)
EQUAL['d'.ord] = (BIT_A|BIT_G|BIT_T)
EQUAL['h'.ord] = (BIT_A|BIT_C|BIT_T)
EQUAL['v'.ord] = (BIT_A|BIT_C|BIT_G)
EQUAL['n'.ord] = (BIT_A|BIT_C|BIT_G|BIT_T)
you can make matching simpler
def match?(char1, char2)
(EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
end
You might as well consider changing the Array into a Hash. Then you
can even get rid of the #ord call.
A different approach would just use regular expressions, e.g.
MAP = {
'a' => 1,
't' => 2,
'w' => '[12]',
..
}
seq = Array.new(20) { %w{1 2 4 8}.sample }
pat = "acgw"
rx = Regexp.new(pat.chars.map {|c| MAP[c.downcase]}.join)
seq.scan pat do |m|
p m
end
Cheers
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
Back to comp.lang.ruby | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200
csiph-web