Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #2355
| From | Markus Fischer <markus@fischer.name> |
|---|---|
| Newsgroups | comp.lang.ruby |
| Subject | Match a pattern multiple times, returning matches, captures and offset? |
| Date | 2011-04-05 12:22 -0500 |
| Organization | Service de news de lacave.net |
| Message-ID | <4D9B4FBD.9020602@fischer.name> (permalink) |
Hi,
I'm used to be able to use the following in PHP. What is basically does
is: return me all matches, including the captures, order by matching set
and provide me the offsets.
$ php -r 'preg_match_all("/_(\w+)_/", "_foo_ _bar_", $matches,
PREG_SET_ORDER|PREG_OFFSET_CAPTURE); var_dump($matches);'
array(2) {
[0]=>
array(2) {
[0]=>
array(2) {
[0]=>
string(5) "_foo_"
[1]=>
int(0)
}
[1]=>
array(2) {
[0]=>
string(3) "foo"
[1]=>
int(1)
}
}
[1]=>
array(2) {
[0]=>
array(2) {
[0]=>
string(5) "_bar_"
[1]=>
int(6)
}
[1]=>
array(2) {
[0]=>
string(3) "bar"
[1]=>
int(7)
}
}
}
I've found two ways in ruby getting in this direction, either use
String#match or String#scan, but both only provide me partial
information. I guess I can combine the knowledge of both, but before
attempting this I wanted to verify if I didn't overlook something. Here
are my ruby attempts:
ruby-1.9.2-p180 :001 > m = "_foo_ _bar_".match(/_(\w+)_/)
=> #<MatchData "_foo_" 1:"foo">
ruby-1.9.2-p180 :002 > [ m[0], m[1] ]
=> ["_foo_", "foo"]
ruby-1.9.2-p180 :003 > [ m.begin(0), m.begin(1) ]
=> [0, 1]
But here I'm missing the further possible matches, "_bar_" and "bar". Or
the #scan approach:
ruby-1.9.2-p180 :004 > m = "_foo_ _bar_".scan(/_(\w+)_/)
=> [["foo"], ["bar"]]
But in this case I've even less information, the match including _foo_
or _bar_ is not present and I can't get the offsets too.
I re-read the documentation for Regexp#match and found out that you can
pass an offset into the string as second parameter, so I guess I can
iterate over the string in a loop until I find no further matches ...?
Considering this I came up with:
$ cat test_match_all.rb
require 'pp'
class String
def match_all(pattern)
matches = []
offset = 0
while m = match(pattern, offset) do
matches << m
offset = m.begin(0) + m[0].length
end
matches
end
end
pp "_foo_ _bar_ _baz_".match_all(/_(\w+)_/)
$ ruby test_match_all.rb
[#<MatchData "_foo_" 1:"foo">,
#<MatchData "_bar_" 1:"bar">,
#<MatchData "_baz_" 1:"baz">]
I've lots of data to parse so I could foresee that this approach can
become a bottleneck. Is there a more direct solution to it?
thanks,
- Markus
Back to comp.lang.ruby | Previous | Next — Next in thread | Find similar | Unroll thread
Match a pattern multiple times, returning matches, captures and offset? Markus Fischer <markus@fischer.name> - 2011-04-05 12:22 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-05 13:07 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-05 20:37 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-06 04:42 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-06 18:58 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 02:13 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-07 03:39 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-07 14:04 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-08 02:19 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-08 14:53 -0500
csiph-web