Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #4560
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Newsgroups | comp.lang.ruby |
| Subject | Re: Need for speed -> a C extension? |
| Date | 2011-05-15 13:46 +0200 |
| Message-ID | <939sphF22pU1@mid.individual.net> (permalink) |
| References | <c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com> <c16430d6dfe5b07ba3cf5e525e65ff30@ruby-forum.com> |
On 15.05.2011 11:16, Martin Hansen wrote: > Sorry guys, I have been busy with a few other things before continuing > on this one (also solving the pesky issue with my RubyInline install). > > @Robert. > > I have been thinking hard about your comments - these contain a lot of > programming insight of the kind you dont get from a "learnings > <programming language> book". However, I struggle with grasping the > wisdom: > > "Frankly, I find your code has a design issue: it seems you mix data > and iteration in a single class. This is visible from how #match > works > > def match(pattern, pos = 0, max_edit_distance = 0) > @pattern = pattern > @pos = pos > @max_edit_distance = max_edit_distance > @vector = vector_init > > .. > > IMHO it would be better to separate representation of the sequence and > the matching process. The matcher then would only carry a reference > to the sequence and all the data it needs to do matching. > " > > What is this "mixing data with iteration in a single class"? To me you > have data and then you iterate - what exactly is the problem? What > should I do for separating these? I am referring to http://pastie.org/1808127 : The issue with the code presented lies in the *state*: an instance of Seq represents a sequence of items (in your case amino acids, I guess). You need state to represent this sequence. This state is stored in instance variables of an instance of class Seq. The first thing that your method #match (see above) does, is to set some other instance variables of Seq. This poses a problem if - the Seq instance is frozen, i.e. immutable, - more than one matching processes are under way concurrently (i.e. #match is invoked from more than one thread at a time). The reason is that you mixed state needed to represent your sequence with state needed to execute the matching process. Apart from the problems listed above this also makes code harder to read and more error prone. For example, you might modify the Seq implementation and accidentally reuse the name of an instance variable which then can make your code break in unpredictable ways. > "Maybe on the interface, but you create side effects on the String (Seq > in your case). This is neither a clean separation (makes classes > bigger and harder to understand) nor is it thread safe (e.g. if you > want to try to concurrently match several patterns against the same > sequence)." > > Again, "separation", but what is the problem with big classes? When is a > class too big? And how to divide your code in the best way? There is no easy answer to that. In this case you violate modularity by lumping too many different functionalities together in a single class. Apart from the advantage to avoid mischief laid out above by decoupling the matching process from the sequence representation you also gain modularity if you let the matching process only rely on the public interface of Seq. That way you can even have different implementations of sequences which can all be scanned by the same representation. I am not saying you should necessarily do this as efficient matching might also need some knowledge of Seq internals, but this is to illustrate what kind of things you should consider when designing classes. > Also, I managed to get down to a single vector, but I think I may have > more .dup's than needed - though removing any causes erroneous output: > > http://pastie.org/1902844 That code still stores matching state in the Seq instance. And you have quite a few #dups which have object creation overhead. I think you should better work with two Array of Score instances and swap them at each sequence position. IMHO you do not even need to reinitialize Score instances because of your dynamic programming approach it is guaranteed that you access instances sequentially from index 0 on and need to access at most last[i - 1], last[i] and current[i - 1]. Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
Back to comp.lang.ruby | Previous | Next — Previous in thread | Find similar | Unroll thread
Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200
csiph-web