Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #2355

Match a pattern multiple times, returning matches, captures and offset?

From Markus Fischer <markus@fischer.name>
Newsgroups comp.lang.ruby
Subject Match a pattern multiple times, returning matches, captures and offset?
Date 2011-04-05 12:22 -0500
Organization Service de news de lacave.net
Message-ID <4D9B4FBD.9020602@fischer.name> (permalink)

Show all headers | View raw


Hi,

I'm used to be able to use the following in PHP. What is basically does
is: return me all matches, including the captures, order by matching set
and provide me the offsets.

$ php -r 'preg_match_all("/_(\w+)_/", "_foo_ _bar_", $matches,
PREG_SET_ORDER|PREG_OFFSET_CAPTURE); var_dump($matches);'
array(2) {
  [0]=>
  array(2) {
    [0]=>
    array(2) {
      [0]=>
      string(5) "_foo_"
      [1]=>
      int(0)
    }
    [1]=>
    array(2) {
      [0]=>
      string(3) "foo"
      [1]=>
      int(1)
    }
  }
  [1]=>
  array(2) {
    [0]=>
    array(2) {
      [0]=>
      string(5) "_bar_"
      [1]=>
      int(6)
    }
    [1]=>
    array(2) {
      [0]=>
      string(3) "bar"
      [1]=>
      int(7)
    }
  }
}

I've found two ways in ruby getting in this direction, either use
String#match or String#scan, but both only provide me partial
information. I guess I can combine the knowledge of both, but before
attempting this I wanted to verify if I didn't overlook something. Here
are my ruby attempts:

ruby-1.9.2-p180 :001 > m = "_foo_ _bar_".match(/_(\w+)_/)
 => #<MatchData "_foo_" 1:"foo">
ruby-1.9.2-p180 :002 > [ m[0], m[1] ]
 => ["_foo_", "foo"]
ruby-1.9.2-p180 :003 > [ m.begin(0), m.begin(1) ]
 => [0, 1]

But here I'm missing the further possible matches, "_bar_" and "bar". Or
the #scan approach:

ruby-1.9.2-p180 :004 > m = "_foo_ _bar_".scan(/_(\w+)_/)
 => [["foo"], ["bar"]]

But in this case I've even less information, the match including _foo_
or _bar_ is not present and I can't get the offsets too.

I re-read the documentation for Regexp#match and found out that you can
pass an offset into the string as second parameter, so I guess I can
iterate over the string in a loop until I find no further matches ...?
Considering this I came up with:

$ cat test_match_all.rb
require 'pp'

class String
    def match_all(pattern)
        matches = []
        offset = 0
        while m = match(pattern, offset) do
            matches << m
            offset = m.begin(0) + m[0].length
        end
        matches
    end
end

pp "_foo_ _bar_ _baz_".match_all(/_(\w+)_/)


$ ruby test_match_all.rb
[#<MatchData "_foo_" 1:"foo">,
 #<MatchData "_bar_" 1:"bar">,
 #<MatchData "_baz_" 1:"baz">]


I've lots of data to parse so I could foresee that this approach can
become a bottleneck. Is there a more direct solution to it?

thanks,
- Markus

Back to comp.lang.ruby | Previous | NextNext in thread | Find similar | Unroll thread


Thread

Match a pattern multiple times, returning matches, captures and offset? Markus Fischer <markus@fischer.name> - 2011-04-05 12:22 -0500
  Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-05 13:07 -0500
  Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-05 20:37 -0500
    Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-06 04:42 -0500
  Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-06 18:58 -0500
    Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 02:13 -0500
    Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-07 03:39 -0500
      Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-07 14:04 -0500
        Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-08 02:19 -0500
          Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-08 14:53 -0500

csiph-web