Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #2088 > unrolled thread
| Started by | Ruby Ruby <ruby@ayni.com> |
|---|---|
| First post | 2011-04-01 02:20 -0500 |
| Last post | 2011-04-01 03:23 -0500 |
| Articles | 3 — 3 participants |
Back to article view | Back to comp.lang.ruby
Unwanted collector in regular expression Ruby Ruby <ruby@ayni.com> - 2011-04-01 02:20 -0500
Re: Unwanted collector in regular expression Brian Candler <b.candler@pobox.com> - 2011-04-01 03:17 -0500
Re: Unwanted collector in regular expression Robert Klemme <shortcutter@googlemail.com> - 2011-04-01 03:23 -0500
| From | Ruby Ruby <ruby@ayni.com> |
|---|---|
| Date | 2011-04-01 02:20 -0500 |
| Subject | Unwanted collector in regular expression |
| Message-ID | <10d78ef55fdd56f820bd1bef075b3562@ruby-forum.com> |
Hi every
installation here ruby-1.8.6.420-2.fc13.x86_64
i wanted to highlight some text fields (change background color) in an
xhtml page. To do that, i had to skip all xhtml markup, 'cause it is not
advisable to put markup into markup.
i created the following function:
<snip>
def highlight(what, crix)
if crix.empty? then return what end
@tix.puts "WHAT: "+what
ccir = Regexp.new("(?:\<[^\>]+\>)|(?:("+crix.join("|")+"))", "i")
return what.gsub(ccir) {|s|
@tix.puts "COLL: "+s
if ! s.empty? && ! s.match(/^\</)
"<span style='background-color:"+CCOL+";'>"+s+"</span>"
else s
end
}
end # highlight
</snip>
the main part of the function is the ccir regular expression.
my thinking was, that with the first group in the regular expression i
would skip all markup, and with the second group in the regular
expression i would collect the fields designated in the crix array.
When the first group of the regular expression matches, i was awaiting
an empty string to be returned to the block, otherwise the element to be
highlighted would be returned to the block.
NOPE.
when the first group of the regular expression matched, the test file
showed me, that an entire markup sequence was returned to the block,
even if no collector was active in the first group. when the second
group of the regular expression matched, it returned the expected string
to the block.
this is why i had to avoid the markup-in-markup by checking again in the
block if the string returned to the block started with "<", i.e. if it
was markup.
so be warned, if you use groups in regular expressions, as they may not
return what you expected.
suomi
--
Posted via http://www.ruby-forum.com/.
[toc] | [next] | [standalone]
| From | Brian Candler <b.candler@pobox.com> |
|---|---|
| Date | 2011-04-01 03:17 -0500 |
| Message-ID | <ef9f018241a2b9774b77b7c1743deed9@ruby-forum.com> |
| In reply to | #2088 |
Why don't you make a simple, standalone test program which demonstrates the behaviour you're describing? Then we can run it ourselves, and can perhaps explain what's happening. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2011-04-01 03:23 -0500 |
| Message-ID | <BANLkTi=AjvaaONUWHFZ4yVFJfK_AU=qcqg@mail.gmail.com> |
| In reply to | #2088 |
On Fri, Apr 1, 2011 at 9:20 AM, Ruby Ruby <ruby@ayni.com> wrote:
> Hi every
> installation here ruby-1.8.6.420-2.fc13.x86_64
>
> i wanted to highlight some text fields (change background color) in an
> xhtml page. To do that, i had to skip all xhtml markup, 'cause it is not
> advisable to put markup into markup.
>
> i created the following function:
>
> <snip>
> def highlight(what, crix)
> if crix.empty? then return what end
> @tix.puts "WHAT: "+what
> ccir = Regexp.new("(?:\<[^\>]+\>)|(?:("+crix.join("|")+"))", "i")
You should rather use Regexp.union(crix) in order to ensure proper
escaping. Note that you can use string interpolation in regexp, e.g.
irb(main):007:0> s=123
=> 123
irb(main):008:0> /foo#{s}/
=> /foo123/
Also, I don't see why you use \< because the backslash disappears:
irb(main):022:0> puts "\<"
<
=> nil
irb(main):023:0> puts "<"
<
=> nil
> return what.gsub(ccir) {|s|
> @tix.puts "COLL: "+s
> if ! s.empty? && ! s.match(/^\</)
> "<span style='background-color:"+CCOL+";'>"+s+"</span>"
> else s
> end
> }
> end # highlight
> </snip>
>
> the main part of the function is the ccir regular expression.
> my thinking was, that with the first group in the regular expression i
> would skip all markup, and with the second group in the regular
> expression i would collect the fields designated in the crix array.
> When the first group of the regular expression matches, i was awaiting
> an empty string to be returned to the block, otherwise the element to be
> highlighted would be returned to the block.
> NOPE.
> when the first group of the regular expression matched, the test file
> showed me, that an entire markup sequence was returned to the block,
> even if no collector was active in the first group. when the second
> group of the regular expression matched, it returned the expected string
> to the block.
> this is why i had to avoid the markup-in-markup by checking again in the
> block if the string returned to the block started with "<", i.e. if it
> was markup.
>
> so be warned, if you use groups in regular expressions, as they may not
> return what you expected.
Yes, groups can be tricky but I believe yours is rather a case of
malformed regexp. I chime in with what Brian said: write a small
program demonstrating the effect and describe the desired output.
Otherwise I recommend "Mastering Regular Expressions" (O'Reilly).
Cheers
robert
--
remember.guy do |as, often| as.you_can - without end
http://blog.rubybestpractices.com/
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.ruby
csiph-web