Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #2355
| X-FeedAbuse | http://nntpfeed.proxad.net/abuse.pl feeded by 88.191.16.109 |
|---|---|
| Path | csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!news.dougwise.org!nntpfeed.proxad.net!nospam.fr.eu.org!talisker.lacave.net!lacave.net!not-for-mail |
| From | Markus Fischer <markus@fischer.name> |
| Newsgroups | comp.lang.ruby |
| Subject | Match a pattern multiple times, returning matches, captures and offset? |
| Date | Tue, 5 Apr 2011 12:22:20 -0500 |
| Organization | Service de news de lacave.net |
| Lines | 103 |
| Message-ID | <4D9B4FBD.9020602@fischer.name> (permalink) |
| NNTP-Posting-Host | bristol.highgroove.com |
| Content-Type | text/plain; charset=UTF-8 |
| Content-Transfer-Encoding | 7bit |
| X-Trace | talisker.lacave.net 1302024161 57967 65.111.164.187 (5 Apr 2011 17:22:41 GMT) |
| X-Complaints-To | abuse@lacave.net |
| NNTP-Posting-Date | Tue, 5 Apr 2011 17:22:41 +0000 (UTC) |
| X-Received-From | This message has been automatically forwarded from the ruby-talk mailing list by a gateway at comp.lang.ruby. If it is SPAM, it did not originate at comp.lang.ruby. Please report the original sender, and not us. Thanks! For more details about this gateway, please visit: http://blog.grayproductions.net/categories/the_gateway |
| X-Mail-Count | 381015 |
| X-Ml-Name | ruby-talk |
| X-Rubymirror | Yes |
| X-Ruby-Talk | <4D9B4FBD.9020602@fischer.name> |
| Xref | x330-a1.tempe.blueboxinc.net comp.lang.ruby:2355 |
Show key headers only | View raw
Hi,
I'm used to be able to use the following in PHP. What is basically does
is: return me all matches, including the captures, order by matching set
and provide me the offsets.
$ php -r 'preg_match_all("/_(\w+)_/", "_foo_ _bar_", $matches,
PREG_SET_ORDER|PREG_OFFSET_CAPTURE); var_dump($matches);'
array(2) {
[0]=>
array(2) {
[0]=>
array(2) {
[0]=>
string(5) "_foo_"
[1]=>
int(0)
}
[1]=>
array(2) {
[0]=>
string(3) "foo"
[1]=>
int(1)
}
}
[1]=>
array(2) {
[0]=>
array(2) {
[0]=>
string(5) "_bar_"
[1]=>
int(6)
}
[1]=>
array(2) {
[0]=>
string(3) "bar"
[1]=>
int(7)
}
}
}
I've found two ways in ruby getting in this direction, either use
String#match or String#scan, but both only provide me partial
information. I guess I can combine the knowledge of both, but before
attempting this I wanted to verify if I didn't overlook something. Here
are my ruby attempts:
ruby-1.9.2-p180 :001 > m = "_foo_ _bar_".match(/_(\w+)_/)
=> #<MatchData "_foo_" 1:"foo">
ruby-1.9.2-p180 :002 > [ m[0], m[1] ]
=> ["_foo_", "foo"]
ruby-1.9.2-p180 :003 > [ m.begin(0), m.begin(1) ]
=> [0, 1]
But here I'm missing the further possible matches, "_bar_" and "bar". Or
the #scan approach:
ruby-1.9.2-p180 :004 > m = "_foo_ _bar_".scan(/_(\w+)_/)
=> [["foo"], ["bar"]]
But in this case I've even less information, the match including _foo_
or _bar_ is not present and I can't get the offsets too.
I re-read the documentation for Regexp#match and found out that you can
pass an offset into the string as second parameter, so I guess I can
iterate over the string in a loop until I find no further matches ...?
Considering this I came up with:
$ cat test_match_all.rb
require 'pp'
class String
def match_all(pattern)
matches = []
offset = 0
while m = match(pattern, offset) do
matches << m
offset = m.begin(0) + m[0].length
end
matches
end
end
pp "_foo_ _bar_ _baz_".match_all(/_(\w+)_/)
$ ruby test_match_all.rb
[#<MatchData "_foo_" 1:"foo">,
#<MatchData "_bar_" 1:"bar">,
#<MatchData "_baz_" 1:"baz">]
I've lots of data to parse so I could foresee that this approach can
become a bottleneck. Is there a more direct solution to it?
thanks,
- Markus
Back to comp.lang.ruby | Previous | Next — Next in thread | Find similar | Unroll thread
Match a pattern multiple times, returning matches, captures and offset? Markus Fischer <markus@fischer.name> - 2011-04-05 12:22 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-05 13:07 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-05 20:37 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-06 04:42 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-06 18:58 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Robert Klemme <shortcutter@googlemail.com> - 2011-04-07 02:13 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-07 03:39 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-07 14:04 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? Brian Candler <b.candler@pobox.com> - 2011-04-08 02:19 -0500
Re: Match a pattern multiple times, returning matches, captures and offset? 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-08 14:53 -0500
csiph-web