Groups > comp.lang.ruby > #3105 > unrolled thread

Need for speed -> a C extension?

Started by	Martin Hansen <mail@maasha.dk>
First post	2011-04-18 10:15 -0500
Last post	2011-05-15 13:46 +0200
Articles	20 on this page of 28 — 9 participants

Back to article view | Back to comp.lang.ruby

  Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 10:15 -0500
    Re: Need for speed -> a C extension? Chuck Remes <cremes.devlist@mac.com> - 2011-04-18 11:10 -0500
    Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-18 11:10 -0500
    Re: Need for speed -> a C extension? "WJ" <w_a_x_man@yahoo.com> - 2011-04-18 17:34 +0000
      Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-18 13:30 -0500
        Re: Need for speed -> a C extension? Ryan Davis <ryand-ruby@zenspider.com> - 2011-04-18 14:15 -0500
          Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 05:30 -0500
            Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 07:21 -0500
              Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 08:13 -0500
                Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 09:56 -0500
                Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-04-19 10:19 -0500
            Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 08:35 -0500
              Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-19 09:12 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 13:51 -0500
                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-19 18:13 -0500
                  Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 02:04 -0500
                    Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:33 -0500
                      Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 07:40 -0500
                        Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 07:55 -0500
                          Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 08:42 -0500
                            Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-04-20 10:18 -0500
                              Re: Need for speed -> a C extension? Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-20 10:24 -0500
                                Re: Need for speed -> a C extension? Eric Christopherson <echristopherson@gmail.com> - 2011-04-20 17:08 -0500
                              Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:34 -0500
                                Re: Need for speed -> a C extension? brabuhr@gmail.com - 2011-04-20 10:39 -0500
                Re: Need for speed -> a C extension? Colin Bartlett <colinb2r@googlemail.com> - 2011-04-20 22:39 -0500
    Re: Need for speed -> a C extension? Martin Hansen <mail@maasha.dk> - 2011-05-15 04:16 -0500
      Re: Need for speed -> a C extension? Robert Klemme <shortcutter@googlemail.com> - 2011-05-15 13:46 +0200

Page 1 of 2 [1] 2 Next page →

#3105 — Need for speed -> a C extension?

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-18 10:15 -0500
Subject	Need for speed -> a C extension?
Message-ID	<c094098c0ea21c2b9618d1b8d7a4b176@ruby-forum.com>

Hello all,


The below code is too slow for practical use. I need it to run at least
20 times faster. Perhaps that is possible with some C code? I have no
experience with writing Ruby extensions. What are the pitfalls? Which
part of the code should be ported? Any pointers to get me started?

Cheers,


Martin


#!/usr/bin/env ruby

# IUPAC nucleotide pair ambiguity equivalents are saved in an
# array of bit fields.

BIT_A = 1 << 0
BIT_T = 1 << 1
BIT_C = 1 << 2
BIT_G = 1 << 3

EQUAL          = Array.new(256, 0)
EQUAL['A'.ord] = BIT_A
EQUAL['T'.ord] = BIT_T
EQUAL['U'.ord] = BIT_T
EQUAL['C'.ord] = BIT_C
EQUAL['G'.ord] = BIT_G
EQUAL['M'.ord] = (BIT_A|BIT_C)
EQUAL['R'.ord] = (BIT_A|BIT_G)
EQUAL['W'.ord] = (BIT_A|BIT_T)
EQUAL['S'.ord] = (BIT_C|BIT_G)
EQUAL['Y'.ord] = (BIT_C|BIT_T)
EQUAL['K'.ord] = (BIT_G|BIT_T)
EQUAL['B'.ord] = (BIT_C|BIT_G|BIT_T)
EQUAL['D'.ord] = (BIT_A|BIT_G|BIT_T)
EQUAL['H'.ord] = (BIT_A|BIT_C|BIT_T)
EQUAL['V'.ord] = (BIT_A|BIT_C|BIT_G)
EQUAL['N'.ord] = (BIT_A|BIT_C|BIT_G|BIT_T)

# Module containing code to locate nucleotide patterns in sequences
allowing for
# ambiguity codes and a given maximum edit distance.
# Insertions are nucleotides found in the pattern but not in the
sequence.
# Deletions are nucleotides found in the sequence but not in the
pattern.
#
# Inspired by the paper by Bruno Woltzenlogel Paleo (page 197):
# http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf
module PatternMatcher
  #
------------------------------------------------------------------------------
  #   str.match(pattern[, pos[, max_edit_distance]])
  #   -> Match or nil
  #
  #
------------------------------------------------------------------------------
  # Method to locate the next pattern match starting from a given
position. A match
  # is allowed to contain a given maximum edit distance. If a match is
located a
  # Match object will be returned otherwise nil.
  def match(pattern, pos = 0, max_edit_distance = 0)
    @pattern           = pattern
    @pos               = pos
    @max_edit_distance = max_edit_distance
    @vector            = vector_init

    while @pos < @seq.length
      vector_update

      return match_new if match_found?

      @pos += 1
    end
  end

  #
------------------------------------------------------------------------------
  #   str.scan(pattern[, pos[, max_edit_distance]])
  #   -> Array
  #   str.scan(pattern[, pos[, max_edit_distance]]) { |match|
  #     block
  #   }
  #   -> Match
  #
  #
------------------------------------------------------------------------------
  # Method to iterate through a sequence to locate pattern matches
starting
  # from a given position and allowing for a maximum edit distance.
  # Matches found in block context return the Match object. Otherwise
matches are
  # returned in an Array.
  def scan(pattern, pos = 0, max_edit_distance = 0)
    matches = []

    while match = match(pattern, pos, max_edit_distance)
      if block_given?
        yield match
      else
        matches << match
      end

      pos = match.pos + 1
    end

    return matches unless block_given?
  end

  private

  # Method to initailize the score vector and return this.
  def vector_init
    vector = []

    (0 ... @pattern.length + 1).each do |i|
      vector[i] = Score.new(matches = 0, mismatches = 0, insertions = i)
    end

    vector
  end

  # Method to update the score vector.
  def vector_update
    new_vector = @vector.dup

    (0 ... @pattern.length).each do |i|
      if match?(@seq[@pos], @pattern[i])
        new_vector[i + 1] = @vector[i].dup
        new_vector[i + 1].matches += 1
      else
        mismatch  = @vector[i].dup
        insertion = new_vector[i].dup
        deletion  = @vector[i + 1].dup

        if deletion?(mismatch, insertion, deletion)
          deletion.deletions += 1
          new_vector[i + 1] = deletion
        elsif mismatch?(mismatch, insertion, deletion)
          mismatch.mismatches += 1
          new_vector[i + 1] = mismatch
        elsif insertion?(mismatch, insertion, deletion)
          insertion.insertions += 1
          new_vector[i + 1] = insertion
        end
      end
    end

    @vector = new_vector
  end

  # Method to determine if a match occurred.
  def match?(char1, char2)
    (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
  end

  # Method to determine if a mismatch occured.
  def mismatch?(mismatch, insertion, deletion)
    if mismatch.edit_distance <= insertion.edit_distance and
       mismatch.edit_distance <= deletion.edit_distance
      true
    end
  end

  # Method to determine if an insertion occured.
  def insertion?(mismatch, insertion, deletion)
    if insertion.edit_distance <= mismatch.edit_distance and
       insertion.edit_distance <= deletion.edit_distance
      true
    end
  end

  # Method to determine if a deletion occured.
  def deletion?(mismatch, insertion, deletion)
    if deletion.edit_distance <= mismatch.edit_distance and
       deletion.edit_distance <= insertion.edit_distance
      true
    end
  end

  # Method to print the score vector.
  def vector_print
    @vector.each do |s|
      puts s
    end

    puts
  end

  # Method that returns a Match object initialized with
  # information from the score vector.
  def match_new
    matches    = @vector.last.matches
    mismatches = @vector.last.mismatches
    insertions = @vector.last.insertions
    deletions  = @vector.last.deletions
    length     = @pattern.length - insertions + deletions
    pos        = @pos - length + 1
    match      = @seq[pos ... pos + length]

    Match.new(pos, match, matches, mismatches, insertions, deletions,
length)
  end

  # Method that determines if a match was found by analyzing the score
vector.
  def match_found?
    if @vector.last.edit_distance <= @max_edit_distance
      true
    end
  end

  # Class to instantiate Score objects that holds score information.
  class Score
    attr_accessor :matches, :mismatches, :insertions, :deletions

    def initialize(matches = 0, mismatches = 0, insertions = 0,
deletions = 0)
      @matches    = matches
      @mismatches = mismatches
      @insertions = insertions
      @deletions  = deletions
    end

    # Method to calculate and return the edit distance.
    def edit_distance
      self.mismatches + self.insertions + self.deletions
    end

    private

    def to_s
      "(#{[self.matches, self.mismatches, self.insertions,
self.deletions].join(',')})"
    end
  end

  # Class for creating Match objects which contain the description of a
  # match between a nucleotide sequence and a pattern.
  class Match
    attr_reader :pos, :match, :matches, :mismatches, :insertions,
:deletions, :edit_distance, :length

    def initialize(pos, match, matches, mismatches, insertions,
deletions, length)
      @pos           = pos
      @match         = match
      @matches       = matches
      @mismatches    = mismatches
      @insertions    = insertions
      @deletions     = deletions
      @edit_distance = mismatches + insertions + deletions
      @length        = length
    end
  end
end

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]

#3106

From	Chuck Remes <cremes.devlist@mac.com>
Date	2011-04-18 11:10 -0500
Message-ID	<C651B062-8EDC-49A6-BFFC-5C2B6604C9F1@mac.com>
In reply to	#3105

On Apr 18, 2011, at 10:15 AM, Martin Hansen wrote:

> Hello all,
> 
> 
> The below code is too slow for practical use. I need it to run at least
> 20 times faster. Perhaps that is possible with some C code? I have no
> experience with writing Ruby extensions. What are the pitfalls? Which
> part of the code should be ported? Any pointers to get me started?

I pasted the code into a local file and ran it. It completed instantly.

It would be super helpful if you provided *all* of the code to make this run and do some useful work. By doing so, people who want to help you can run it through a profiler and identify hot spots.

cr

[toc] | [prev] | [next] | [standalone]

#3107

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-18 11:10 -0500
Message-ID	<BANLkTikfTqajopuNPVmhhvbMrXnm+O9fow@mail.gmail.com>
In reply to	#3105

On Mon, Apr 18, 2011 at 5:15 PM, Martin Hansen <mail@maasha.dk> wrote:

> The below code is too slow for practical use. I need it to run at least
> 20 times faster. Perhaps that is possible with some C code? I have no
> experience with writing Ruby extensions. What are the pitfalls? Which
> part of the code should be ported? Any pointers to get me started?

Can you be a bit more specific about the slow part or even extract a
simple use case which demonstrates the issue you are having?  Also,
copying it from your email and then fixing all the line breaks is
tedious.  Please use a more appropriate means to distribute the code
in such cases (e.g. pastie, gist).

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3110

From	"WJ" <w_a_x_man@yahoo.com>
Date	2011-04-18 17:34 +0000
Message-ID	<iohsms02e2q@enews2.newsguy.com>
In reply to	#3105

Martin Hansen wrote:

> The below code is too slow for practical use. I need it to run at least
> 20 times faster. Perhaps that is possible with some C code? I have no
> experience with writing Ruby extensions. What are the pitfalls? Which
> part of the code should be ported? Any pointers to get me started?

Please give a clear description of the algorithm, and then
give some sample input and output.

[toc] | [prev] | [next] | [standalone]

#3114

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-18 13:30 -0500
Message-ID	<4638db940d54573d9643ab0a369c8c7e@ruby-forum.com>
In reply to	#3110

WJ wrote in post #993576:
> Martin Hansen wrote:
>
>> The below code is too slow for practical use. I need it to run at least
>> 20 times faster. Perhaps that is possible with some C code? I have no
>> experience with writing Ruby extensions. What are the pitfalls? Which
>> part of the code should be ported? Any pointers to get me started?
>
> Please give a clear description of the algorithm, and then
> give some sample input and output.

Here is a working version of the code that can be profiled (though it 
will take forever with 20M iterations):

http://pastie.org/1808127

The slow part according to profiler is:

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 29.39     2.66      2.66     1521     1.75    11.78  Range#each
 15.80     4.09      1.43    33000     0.04     0.06  Seq#match?
 10.72     5.06      0.97    78940     0.01     0.03  Kernel.dup
  9.28     5.90      0.84    78940     0.01     0.01 
Kernel.initialize_dup
  6.63     6.50      0.60   142380     0.00     0.00 
Seq::Score#edit_distance
  5.30     6.98      0.48    22220     0.02     0.03  Seq#deletion?
  3.54     7.30      0.32    66016     0.00     0.00  String#ord
  3.43     7.61      0.31    14680     0.02     0.04  Seq#mismatch?
  3.31     7.91      0.30     8300     0.04     0.05  Seq#insertion?

The input is DNA sequences. Basically strings of ATCG and Ns of length 
50-100. These comes in files with 20M-30M sequences per file. I've got 
~50 of these files and more incoming. The output will be truncated 
sequences based on the match position located with this bit of code.

The algorithm is of the dynamic programming flavor and was inspired by 
the paper by Bruno Woltzenlogel Paleo (page 197):

http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf

Locating variable length matches is tricky!

Cheers,

Martin

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3116

From	Ryan Davis <ryand-ruby@zenspider.com>
Date	2011-04-18 14:15 -0500
Message-ID	<2CC4ECA8-9286-4431-AB2D-E7F0A67698FC@zenspider.com>
In reply to	#3114

On Apr 18, 2011, at 11:30 , Martin Hansen wrote:

> WJ wrote in post #993576:
>> Martin Hansen wrote:
>> 
>>> The below code is too slow for practical use. I need it to run at least
>>> 20 times faster. Perhaps that is possible with some C code? I have no
>>> experience with writing Ruby extensions. What are the pitfalls? Which
>>> part of the code should be ported? Any pointers to get me started?
>> 
>> Please give a clear description of the algorithm, and then
>> give some sample input and output.
> 
> 
> Here is a working version of the code that can be profiled (though it 
> will take forever with 20M iterations):
> 
> http://pastie.org/1808127
> 
> The slow part according to profiler is:

Profiling is great, but you also need to write a benchmark in order to objectively measure the work you're doing. You might also want to write microbenchmarks against the bottlenecks you identify in your profiling work. Eg:

>  def match?(char1, char2)
>    (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
>  end

That would be really easy to write tests and a benchmark against and then rewrite in C using RubyInline... without tests and a benchmark tho, you won't know that you've done it correctly and provides a measurable benefit.

[toc] | [prev] | [next] | [standalone]

#3155

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-19 05:30 -0500
Message-ID	<620c883309d2336f7741bf716b688aaf@ruby-forum.com>
In reply to	#3116

>>  def match?(char1, char2)
>>    (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
>>  end
>
> That would be really easy to write tests and a benchmark against and
> then rewrite in C using RubyInline... without tests and a benchmark tho,
> you won't know that you've done it correctly and provides a measurable
> benefit.

They bottlenecks are the iteration over the sequence (while loop at line
68) and the vector_update (line 120). I am a bit surprised that match?
is that slow - I expect it to be almost instantaneous in C. I would like
to test that with a benchmark and Inline C.

And I just discovered RubyInline! Inline code is so much easier to deal
with that an external library IMHO. I have been messing a bit around
with "inline", but that turns out to be some old crappy gem!!! Its not
the first time I have missed the right gem :o( ...

However, I get a peculiar error when testing RubyInline with the first
of the examples:

http://pastie.org/1811057

I am on Ruby 1.9.2.


Cheers,


Martin

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3159

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-19 07:21 -0500
Message-ID	<BANLkTimhvc-DHOiJdxEhJN4Y9gwr_rownQ@mail.gmail.com>
In reply to	#3155

On Tue, Apr 19, 2011 at 12:30 PM, Martin Hansen <mail@maasha.dk> wrote:
>>>  def match?(char1, char2)
>>>    (EQUAL[char1.upcase.ord] & EQUAL[char2.upcase.ord]) != 0
>>>  end
>>
>> That would be really easy to write tests and a benchmark against and
>> then rewrite in C using RubyInline... without tests and a benchmark tho,
>> you won't know that you've done it correctly and provides a measurable
>> benefit.
>
> They bottlenecks are the iteration over the sequence (while loop at line
> 68) and the vector_update (line 120). I am a bit surprised that match?
> is that slow - I expect it to be almost instantaneous in C. I would like
> to test that with a benchmark and Inline C.

Frankly, I find your code has a design issue: it seems you mix data
and iteration in a single class.  This is visible from how #match
works

def match(pattern, pos = 0, max_edit_distance = 0)
    @pattern           = pattern
    @pos               = pos
    @max_edit_distance = max_edit_distance
    @vector            = vector_init

..

IMHO it would be better to separate representation of the sequence and
the matching process.  The matcher then would only carry a reference
to the sequence and all the data it needs to do matching.

Also #vector_update creates a lot of objects and does so for each
position in the sequence.  That's likely where you can improve things.

I am not sure what the matching algorithm is exactly.  Can you summarize it?

Ah, and one thing: if you add lowercase entries

EQUAL['a'.ord] = BIT_A
EQUAL['t'.ord] = BIT_T
EQUAL['u'.ord] = BIT_T
EQUAL['c'.ord] = BIT_C
EQUAL['g'.ord] = BIT_G
EQUAL['m'.ord] = (BIT_A|BIT_C)
EQUAL['r'.ord] = (BIT_A|BIT_G)
EQUAL['w'.ord] = (BIT_A|BIT_T)
EQUAL['s'.ord] = (BIT_C|BIT_G)
EQUAL['y'.ord] = (BIT_C|BIT_T)
EQUAL['k'.ord] = (BIT_G|BIT_T)
EQUAL['b'.ord] = (BIT_C|BIT_G|BIT_T)
EQUAL['d'.ord] = (BIT_A|BIT_G|BIT_T)
EQUAL['h'.ord] = (BIT_A|BIT_C|BIT_T)
EQUAL['v'.ord] = (BIT_A|BIT_C|BIT_G)
EQUAL['n'.ord] = (BIT_A|BIT_C|BIT_G|BIT_T)

you can make matching simpler

def match?(char1, char2)
    (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
end

You might as well consider changing the Array into a Hash.  Then you
can even get rid of the #ord call.

A different approach would just use regular expressions, e.g.

MAP = {
  'a' => 1,
  't' => 2,
  'w' => '[12]',
..
}

seq = Array.new(20) { %w{1 2 4 8}.sample }
pat = "acgw"
rx = Regexp.new(pat.chars.map {|c| MAP[c.downcase]}.join)

seq.scan pat do |m|
  p m
end

Cheers

robert


-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3166

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-19 08:13 -0500
Message-ID	<4a23edd33fda1c9e1bcf42858c32d298@ruby-forum.com>
In reply to	#3159

> IMHO it would be better to separate representation of the sequence and
> the matching process.  The matcher then would only carry a reference
> to the sequence and all the data it needs to do matching.

I am not sure if I understand this. I have tried to copy the behavior of 
String#match.

> Also #vector_update creates a lot of objects and does so for each
> position in the sequence.  That's likely where you can improve things.

Yes, that is quite possible. I might be able to skip .dup on line 128 
and 130. That will require some thinking and testing on my side.

> I am not sure what the matching algorithm is exactly.  Can you summarize
> it?

Well, it is a dynamic programming algorithm to do fuzzy searches of 
patterns in strings - allowing for custom matching rules (A==N, etc) and 
a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel 
Paleo (page 197):

http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf

A short example:

http://pastie.org/1811496


> you can make matching simpler
>
> def match?(char1, char2)
>     (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
> end

Yes, but that should not give any significant speed increase?

> You might as well consider changing the Array into a Hash.  Then you
> can even get rid of the #ord call.

Actually, I started with a hash for this - and it was slightly faster. 
However, I think this bit field is very elegant - and since I was 
preparing for porting to C - I think this is the way to go!



Cheers


Martin

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3174

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-19 09:56 -0500
Message-ID	<BANLkTikwV+OP_eSGh_nWoQD5enjnXkdHqg@mail.gmail.com>
In reply to	#3166

On Tue, Apr 19, 2011 at 3:13 PM, Martin Hansen <mail@maasha.dk> wrote:
>> IMHO it would be better to separate representation of the sequence and
>> the matching process.  The matcher then would only carry a reference
>> to the sequence and all the data it needs to do matching.
>
> I am not sure if I understand this. I have tried to copy the behavior of
> String#match.

Maybe on the interface, but you create side effects on the String (Seq

>> Also #vector_update creates a lot of objects and does so for each
>> position in the sequence.  That's likely where you can improve things.
>
> Yes, that is quite possible. I might be able to skip .dup on line 128
> and 130. That will require some thinking and testing on my side.
>
>> I am not sure what the matching algorithm is exactly.  Can you summarize
>> it?
>
> Well, it is a dynamic programming algorithm to do fuzzy searches of
> patterns in strings - allowing for custom matching rules (A==N, etc) and
> a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel
> Paleo (page 197):
>
> http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf
>
> A short example:
>
> http://pastie.org/1811496
>
>
>> you can make matching simpler
>>
>> def match?(char1, char2)
>>     (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
>> end
>
> Yes, but that should not give any significant speed increase?
>
>> You might as well consider changing the Array into a Hash.  Then you
>> can even get rid of the #ord call.
>
> Actually, I started with a hash for this - and it was slightly faster.
> However, I think this bit field is very elegant - and since I was
> preparing for porting to C - I think this is the way to go!

I did not say you should get rid of the bit field!  Plus, if you are
in need for speed and if it was slow then you should get rid of it
regardless of elegance.

Cheers

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3176

From	Robert Klemme <shortcutter@googlemail.com>
Date	2011-04-19 10:19 -0500
Message-ID	<BANLkTimf3LPhycM4=_r_TsaG1b+ASQ2Ung@mail.gmail.com>
In reply to	#3166

Sorry for the half message, thick fingers hit the wrong key accidentally.

On Tue, Apr 19, 2011 at 3:13 PM, Martin Hansen <mail@maasha.dk> wrote:
>> IMHO it would be better to separate representation of the sequence and
>> the matching process.  The matcher then would only carry a reference
>> to the sequence and all the data it needs to do matching.
>
> I am not sure if I understand this. I have tried to copy the behavior of
> String#match.

Maybe on the interface, but you create side effects on the String (Seq
in your case).  This is neither a clean separation (makes classes
bigger and harder to understand) nor is it thread safe (e.g. if you
want to try to concurrently match several patterns against the same
sequence).

Btw, in your case I would probably implement #scan and derive
implement match like

def match(*args)
  scan(*args) do |match_data|
    return match_data
  end

  nil # no match
end

That's pretty easy and you can still replace this implementation with
something else if it should be necessary.

>> Also #vector_update creates a lot of objects and does so for each
>> position in the sequence.  That's likely where you can improve things.
>
> Yes, that is quite possible. I might be able to skip .dup on line 128
> and 130. That will require some thinking and testing on my side.

You could at least switch to only two arrays which you use alternating
with relatively moderate effort.  You might even get away with a
single array.

>> I am not sure what the matching algorithm is exactly.  Can you summarize
>> it?
>
> Well, it is a dynamic programming algorithm to do fuzzy searches of
> patterns in strings - allowing for custom matching rules (A==N, etc) and
> a maximum edit distance. Inspired by the paper by Bruno Woltzenlogel
> Paleo (page 197):
>
> http://www.logic.at/people/bruno/Papers/2007-GATE-ESSLLI.pdf
>
> A short example:
>
> http://pastie.org/1811496

Ah, interesting.  Thanks!

>> you can make matching simpler
>>
>> def match?(char1, char2)
>>     (EQUAL[char1.ord] & EQUAL[char2.ord]) != 0
>> end
>
> Yes, but that should not give any significant speed increase?

Why not?  You get at least rid of a Ruby method call.  Did you measure
differences?

>> You might as well consider changing the Array into a Hash.  Then you
>> can even get rid of the #ord call.
>
> Actually, I started with a hash for this - and it was slightly faster.
> However, I think this bit field is very elegant - and since I was
> preparing for porting to C - I think this is the way to go!

I did not say you should get rid of the bit field!  Plus, if you are
in need for speed and if it was slow then you should get rid of it
regardless of elegance.

Kind regards

robert

-- 
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/

[toc] | [prev] | [next] | [standalone]

#3169

From	brabuhr@gmail.com
Date	2011-04-19 08:35 -0500
Message-ID	<BANLkTinzsz-ohWmE=SNMgUK8zzjN98sJLA@mail.gmail.com>
In reply to	#3155

On Tue, Apr 19, 2011 at 6:30 AM, Martin Hansen <mail@maasha.dk> wrote:
> And I just discovered RubyInline! Inline code is so much easier to deal
> with that an external library IMHO. I have been messing a bit around
> with "inline", but that turns out to be some old crappy gem!!! Its not
> the first time I have missed the right gem :o( ...
>
> However, I get a peculiar error when testing RubyInline with the first
> of the examples:
>
> http://pastie.org/1811057
>
> I am on Ruby 1.9.2.

What exact version? what platform? how was it installed?

$ uname -a
Linux linux116.ctc.com 2.6.32-71.24.1.el6.x86_64 #1 SMP Fri Apr 8
01:07:23 CDT 2011 x86_64 x86_64 x86_64 GNU/Linux

$ ruby -v j.rb
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-linux]
120

$ cat j.rb
require "inline"

class MyTest
  inline do |builder|
    builder.c "
      long factorial(int max) {
        int i=max, result=1;
        while (i >= 2) { result *= i--; }
        return result;
      }"
  end
end

t = MyTest.new()

puts factorial_5 = t.factorial(5)

[toc] | [prev] | [next] | [standalone]

#3173

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-19 09:12 -0500
Message-ID	<553b3269b5914bead36e20f5fbe421da@ruby-forum.com>
In reply to	#3169

unknown wrote in post #993757:
> On Tue, Apr 19, 2011 at 6:30 AM, Martin Hansen <mail@maasha.dk> wrote:
>> I am on Ruby 1.9.2.
> What exact version? what platform? how was it installed?

$ uname -a
Darwin mel.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29 
15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 i386

$ ruby -v j.rb
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.6.0]
<internal:lib/rubygems/custom_require>:29:in `require': undefined 
class/module Digest::Base (ArgumentError)
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from 
/Users/maasha/maasha_install/lib/ruby/gems/1.9.1/gems/RubyInline-3.9.0/lib/inline.rb:51:in 
`<top (required)>'
        from <internal:lib/rubygems/custom_require>:33:in `require'
        from <internal:lib/rubygems/custom_require>:33:in `rescue in 
require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from j.rb:1:in `<main>'


Ruby I compiled and installed myself. Inline was installed: gem install 
RubyInline

:o(


Martin

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3186

From	brabuhr@gmail.com
Date	2011-04-19 13:51 -0500
Message-ID	<BANLkTimdXngw+GBuW1CQD-uu9dBYizF-wg@mail.gmail.com>
In reply to	#3173

On Tue, Apr 19, 2011 at 10:12 AM, Martin Hansen <mail@maasha.dk> wrote:
> unknown wrote in post #993757:
>> On Tue, Apr 19, 2011 at 6:30 AM, Martin Hansen <mail@maasha.dk> wrote:
>>> I am on Ruby 1.9.2.
>> What exact version? what platform? how was it installed?
>
> $ uname -a
> Darwin mel.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29
> 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 i386
>
> $ ruby -v j.rb
> ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.6.0]
> <internal:lib/rubygems/custom_require>:29:in `require': undefined
> class/module Digest::Base (ArgumentError)
>        from <internal:lib/rubygems/custom_require>:29:in `require'
>        from
> /Users/maasha/maasha_install/lib/ruby/gems/1.9.1/gems/RubyInline-3.9.0/lib/inline.rb:51:in
> `<top (required)>'
>        from <internal:lib/rubygems/custom_require>:33:in `require'
>        from <internal:lib/rubygems/custom_require>:33:in `rescue in
> require'
>        from <internal:lib/rubygems/custom_require>:29:in `require'
>        from j.rb:1:in `<main>'
>
>
> Ruby I compiled and installed myself. Inline was installed: gem install
> RubyInline

Thanks; I'll try to give it a try to replicate this from home later.

[toc] | [prev] | [next] | [standalone]

#3195

From	brabuhr@gmail.com
Date	2011-04-19 18:13 -0500
Message-ID	<BANLkTinXCsv7KiTcJ5fmYaLcx9LVZZcVZg@mail.gmail.com>
In reply to	#3173

On Tue, Apr 19, 2011 at 10:12 AM, Martin Hansen <mail@maasha.dk> wrote:
> unknown wrote in post #993757:
>> On Tue, Apr 19, 2011 at 6:30 AM, Martin Hansen <mail@maasha.dk> wrote:
>>> I am on Ruby 1.9.2.
>> What exact version? what platform? how was it installed?
>
> $ uname -a
> Darwin mel.local 10.7.0 Darwin Kernel Version 10.7.0: Sat Jan 29
> 15:17:16 PST 2011; root:xnu-1504.9.37~1/RELEASE_I386 i386 i386
>
> $ ruby -v j.rb
> ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.6.0]
> <internal:lib/rubygems/custom_require>:29:in `require': undefined
> class/module Digest::Base (ArgumentError)

$ uname -a
Darwin Mac-mini.local 10.6.0 Darwin Kernel Version 10.6.0: Wed Nov 10
18:13:17 PST 2010; root:xnu-1504.9.26~3/RELEASE_I386 i386

$ ruby -v j.rb
ruby 1.9.2p180 (2011-02-18 revision 30909) [x86_64-darwin10.6.0]
120

ruby installed via homebrew

> Ruby I compiled and installed myself.

Your Ruby was compiled under 10.7 or 10.6?

[toc] | [prev] | [next] | [standalone]

#3213

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-20 02:04 -0500
Message-ID	<994a6271738aaece938a5d75f19de823@ruby-forum.com>
In reply to	#3195

> Your Ruby was compiled under 10.7 or 10.6?

I keep a log of the ruby compile. It looks like it was build under 
10.6.0. The Mac OS here is 10.6.7 (there is no 10.7 out yet).

Is there a better place for this "bug report" than this thread?

Cheers


Martin

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3239

From	brabuhr@gmail.com
Date	2011-04-20 07:33 -0500
Message-ID	<BANLkTi=NY_kXSnuwhGacv+kxfRbYyK7W+A@mail.gmail.com>
In reply to	#3213

Martin Hansen <mail@maasha.dk> wrote:
> > Your Ruby was compiled under 10.7 or 10.6?
>
> I keep a log of the ruby compile. It looks like it was build under
> 10.6.0. The Mac OS here is 10.6.7 (there is no 10.7 out yet).

Oh, okay.  I wanted to check since your uname above said "10.7.0".

> Is there a better place for this "bug report" than this thread?

Ruby Issue Tracking System: http://redmine.ruby-lang.org/

Can you run your test script with "ruby -d"?

[toc] | [prev] | [next] | [standalone]

#3240

From	brabuhr@gmail.com
Date	2011-04-20 07:40 -0500
Message-ID	<BANLkTimCc+zggMx49c02XeBKkRPs_vaJkQ@mail.gmail.com>
In reply to	#3239

> Martin Hansen <mail@maasha.dk> wrote:
>> > Your Ruby was compiled under 10.7 or 10.6?
>>
>> I keep a log of the ruby compile. It looks like it was build under
>> 10.6.0. The Mac OS here is 10.6.7 (there is no 10.7 out yet).
>
> Oh, okay.  I wanted to check since your uname above said "10.7.0".
>
>> Is there a better place for this "bug report" than this thread?
>
> Ruby Issue Tracking System: http://redmine.ruby-lang.org/

I'm not sure about RubyInline

> Can you run your test script with "ruby -d"?

Also, can you just try "require 'digest/md5'" in irb?

[toc] | [prev] | [next] | [standalone]

#3241

From	Martin Hansen <mail@maasha.dk>
Date	2011-04-20 07:55 -0500
Message-ID	<c59649941b47df0040952e438ffd7f4c@ruby-forum.com>
In reply to	#3240

unknown wrote in post #994006:
>> Ruby Issue Tracking System: http://redmine.ruby-lang.org/
> I'm not sure about RubyInline
>
>> Can you run your test script with "ruby -d"?

$ ruby -d j.rb ->

http://pastie.org/1815275


> Also, can you just try "require 'digest/md5'" in irb?

irb(main):001:0> require 'digest/md5'
ArgumentError: undefined class/module Digest::Base
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from <internal:lib/rubygems/custom_require>:29:in `require'
        from (irb):1
        from /Users/maasha/maasha_install/bin/irb:12:in `<main>'


M

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3242

From	brabuhr@gmail.com
Date	2011-04-20 08:42 -0500
Message-ID	<BANLkTin+XfUhR-LfXW4UkfbAh=wQviEx8Q@mail.gmail.com>
In reply to	#3241

On Wed, Apr 20, 2011 at 8:55 AM, Martin Hansen <mail@maasha.dk> wrote:
> $ ruby -d j.rb ->
>
> http://pastie.org/1815275

"Exception `ArgumentError' at
/Users/maasha/maasha_install/lib/ruby/1.9.1/x86_64-darwin10.6.0/digest/md5.bundle
- undefined class/module Digest::Base"

>> Also, can you just try "require 'digest/md5'" in irb?
>
> irb(main):001:0> require 'digest/md5'
> ArgumentError: undefined class/module Digest::Base
>        from <internal:lib/rubygems/custom_require>:29:in `require'
>        from <internal:lib/rubygems/custom_require>:29:in `require'
>        from (irb):1
>        from /Users/maasha/maasha_install/bin/irb:12:in `<main>'

Okay, so the issue is in your ruby not RubyInline.

I updated my machine to 10.6.7 (uname: 10.7.0 :-) and my previously
compiled 1.9.2 still works correctly.  Can you try to install a fresh
build in another prefix (by hand, RVM, or homebrew) and see if the
same behavior is still present?

Thanks.

[toc] | [prev] | [next] | [standalone]

Page 1 of 2 [1] 2 Next page →

csiph-web

Need for speed -> a C extension?

Contents

#3105 — Need for speed -> a C extension?

#3106

#3107

#3110

#3114

#3116

#3155

#3159

#3166

#3174

#3176

#3169

#3173

#3186

#3195

#3213

#3239

#3240

#3241

#3242