Path: csiph.com!x330-a1.tempe.blueboxinc.net!feeder1.hal-mli.net!weretis.net!feeder4.news.weretis.net!news.cgarbs.de!de-l.enfer-du-nord.net!feeder1.enfer-du-nord.net!feeder.news-service.com!xlned.com!feeder7.xlned.com!novso.com!news.skynet.be!talisker.lacave.net!lacave.net!not-for-mail From: Martin Hansen Newsgroups: comp.lang.ruby Subject: Re: Need for speed -> a C extension? Date: Tue, 19 Apr 2011 08:13:02 -0500 Organization: Service de news de lacave.net Lines: 53 Message-ID: <4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com> References: <4638db940d54573d9643ab0a369c8c7e@ruby-forum.com> <2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com> <620c883309d2336f7741bf716b688aaf@ruby-forum.com> NNTP-Posting-Host: bristol.highgroove.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit X-Trace: talisker.lacave.net 1303219099 17894 65.111.164.187 (19 Apr 2011 13:18:19 GMT) X-Complaints-To: abuse@lacave.net NNTP-Posting-Date: Tue, 19 Apr 2011 13:18:19 +0000 (UTC) In-Reply-To: X-Received-From: This message has been automatically forwarded from the ruby-talk mailing list by a gateway at comp.lang.ruby. If it is SPAM, it did not originate at comp.lang.ruby. Please report the original sender, and not us. Thanks! For more details about this gateway, please visit: http://blog.grayproductions.net/categories/the_gateway X-Mail-Count: 381846 X-Ml-Name: ruby-talk X-Rubymirror: Yes X-Ruby-Talk: <4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com> Xref: x330-a1.tempe.blueboxinc.net comp.lang.ruby:3166 > IMHO it would be better to separate representation of the sequence and > the matching process. The matcher then would only carry a reference > to the sequence and all the data it needs to do matching. I am not sure if I understand this. I have tried to copy the behavior of String#match. > Also #vector_update creates a lot of objects and does so for each > position in the sequence. That's likely where you can improve things. Yes, that is quite possible. I might be able to skip .dup on line 128 and 130. That will require some thinking and testing on my side. > I am not sure what the matching algorithm is exactly. Can you summarize > it? Well, it is a dynamic programming algorithm to do fuzzy searches of patterns in strings - allowing for custom matching rules (A==N, etc) and a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel Paleo (page 197): http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf A short example: http://pastie.org/1811496 > you can make matching simpler > > def match?(char1, char2) > (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0 > end Yes, but that should not give any significant speed increase? > You might as well consider changing the Array into a Hash. Then you > can even get rid of the #ord call. Actually, I started with a hash for this - and it was slightly faster. However, I think this bit field is very elegant - and since I was preparing for porting to C - I think this is the way to go! Cheers Martin -- Posted via http://www.ruby-forum.com/.