Path: csiph.com!x330-a1.tempe.blueboxinc.net!feeder1.hal-mli.net!weretis.net!feeder4.news.weretis.net!news.cgarbs.de!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeder.news-service.com!xlned.com!feeder7.xlned.com!novso.com!news.skynet.be!talisker.lacave.net!lacave.net!not-for-mail
From: Martin Hansen <mail@maasha.dk>
Newsgroups: comp.lang.ruby
Subject: Re: Need for speed -> a C extension?
Date: Tue, 19 Apr 2011 08:13:02 -0500
Organization: Service de news de lacave.net
Lines: 53
Message-ID: <4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com>
References: <c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com> <iohsms02e2q@enews2.newsguy.com> <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> <2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com> <620c883309d2336f7741bf716b688aaf@ruby-forum.com> <BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com>
NNTP-Posting-Host: bristol.highgroove.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Trace: talisker.lacave.net 1303219099 17894 65.111.164.187 (19 Apr 2011 13:18:19 GMT)
X-Complaints-To: abuse@lacave.net
NNTP-Posting-Date: Tue, 19 Apr 2011 13:18:19 +0000 (UTC)
In-Reply-To: <BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com>
X-Received-From: This message has been automatically forwarded from the ruby-talk mailing list by a gateway at comp.lang.ruby. If it is SPAM, it did not originate at comp.lang.ruby. Please report the original sender, and not us. Thanks! For more details about this gateway, please visit: http://blog.grayproductions.net/categories/the_gateway
X-Mail-Count: 381846
X-Ml-Name: ruby-talk
X-Rubymirror: Yes
X-Ruby-Talk: <4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com>
Xref: x330-a1.tempe.blueboxinc.net comp.lang.ruby:3166

> IMHO it would be better to separate representation of the sequence and
> the matching process.  The matcher then would only carry a reference
> to the sequence and all the data it needs to do matching.

I am not sure if I understand this. I have tried to copy the behavior of 
String#match.

> Also #vector_update creates a lot of objects and does so for each
> position in the sequence.  That's likely where you can improve things.

Yes, that is quite possible. I might be able to skip .dup on line 128 
and 130. That will require some thinking and testing on my side.

> I am not sure what the matching algorithm is exactly.  Can you summarize
> it?

Well, it is a dynamic programming algorithm to do fuzzy searches of 
patterns in strings - allowing for custom matching rules (A==N, etc) and 
a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel 
Paleo (page 197):

http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf

A short example:

http://pastie.org/1811496


> you can make matching simpler
>
> def match?(char1, char2)
>     (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
> end

Yes, but that should not give any significant speed increase?

> You might as well consider changing the Array into a Hash.  Then you
> can even get rid of the #ord call.

Actually, I started with a hash for this - and it was slightly faster. 
However, I think this bit field is very elegant - and since I was 
preparing for porting to C - I think this is the way to go!



Cheers


Martin

-- 
Posted via http://www.ruby-forum.com/.