Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #3166

Re: Need for speed -> a C extension?

From Martin Hansen <mail@maasha.dk>
Newsgroups comp.lang.ruby
Subject Re: Need for speed -> a C extension?
Date 2011-04-19 08:13 -0500
Organization Service de news de lacave.net
Message-ID <4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com> (permalink)
References (1 earlier) <iohsms02e2q@enews2.newsguy.com> <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> <2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com> <620c883309d2336f7741bf716b688aaf@ruby-forum.com> <BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com>

Show all headers | View raw


> IMHO it would be better to separate representation of the sequence and
> the matching process.  The matcher then would only carry a reference
> to the sequence and all the data it needs to do matching.

I am not sure if I understand this. I have tried to copy the behavior of 
String#match.

> Also #vector_update creates a lot of objects and does so for each
> position in the sequence.  That's likely where you can improve things.

Yes, that is quite possible. I might be able to skip .dup on line 128 
and 130. That will require some thinking and testing on my side.

> I am not sure what the matching algorithm is exactly.  Can you summarize
> it?

Well, it is a dynamic programming algorithm to do fuzzy searches of 
patterns in strings - allowing for custom matching rules (A==N, etc) and 
a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel 
Paleo (page 197):

http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf

A short example:

http://pastie.org/1811496


> you can make matching simpler
>
> def match?(char1, char2)
>     (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
> end

Yes, but that should not give any significant speed increase?

> You might as well consider changing the Array into a Hash.  Then you
> can even get rid of the #ord call.

Actually, I started with a hash for this - and it was slightly faster. 
However, I think this bit field is very elegant - and since I was 
preparing for porting to C - I think this is the way to go!



Cheers


Martin

-- 
Posted via http://www.ruby-forum.com/.

Back to comp.lang.ruby | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
  Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
  Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
    Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
      Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
        Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
          Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
              Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
          Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
                Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
                Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
                Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
              Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
  Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
    Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200

csiph-web