Groups > comp.lang.ruby > #2465 > unrolled thread

String.gsub with regex and block

Started by	Alexey Petrushin <axyd80@gmail.com>
First post	2011-04-07 10:25 -0500
Last post	2011-04-08 07:31 -0500
Articles	11 — 7 participants

Back to article view | Back to comp.lang.ruby

  String.gsub with regex and block Alexey Petrushin <axyd80@gmail.com> - 2011-04-07 10:25 -0500
    Re: String.gsub with regex and block Reid Thompson <Reid.Thompson@ateb.com> - 2011-04-07 10:36 -0500
      Re: String.gsub with regex and block Brian Candler <b.candler@pobox.com> - 2011-04-08 02:20 -0500
    Re: String.gsub with regex and block Brian Candler <b.candler@pobox.com> - 2011-04-07 10:50 -0500
      Re: String.gsub with regex and block 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-07 13:22 -0500
    Re: String.gsub with regex and block Brian Candler <b.candler@pobox.com> - 2011-04-07 10:57 -0500
    Re: String.gsub with regex and block 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-07 13:10 -0500
      Re: String.gsub with regex and block Phillip Gawlowski <cmdjackryan@googlemail.com> - 2011-04-07 13:39 -0500
    Re: String.gsub with regex and block jake kaiden <jakekaiden@yahoo.com> - 2011-04-07 15:57 -0500
      Re: String.gsub with regex and block Sergey Avseyev <sergey.avseyev@gmail.com> - 2011-04-07 22:09 -0700
    Re: String.gsub with regex and block Alexey Petrushin <axyd80@gmail.com> - 2011-04-08 07:31 -0500

#2465 — String.gsub with regex and block

From	Alexey Petrushin <axyd80@gmail.com>
Date	2011-04-07 10:25 -0500
Subject	String.gsub with regex and block
Message-ID	<0be3e165974797b12de3053911b6f7e2@ruby-forum.com>

Probably a stupid question, but is there a way to use :gsub replacement
without $0 $1 $2 $3 (and without "\0\1\2\3")?

I would prefer something like:

    "John Smith".gsub /(.+)\s(.+)/ do |name, family|
      p [name, family]

      # instead of this
      p [$1, $2]
    end

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]

#2467

From	Reid Thompson <Reid.Thompson@ateb.com>
Date	2011-04-07 10:36 -0500
Message-ID	<1302190574.20448.1.camel@raker.ateb.com>
In reply to	#2465

On Fri, 2011-04-08 at 00:25 +0900, Alexey Petrushin wrote:
> "John Smith".gsub /(.+)\s(.+)/ do |name, family|
>       p [name, family]
> 
>       # instead of this
>       p [$1, $2]
>     end 
is it a requirement that you use gsub?

irb(main):008:0> name, family = "John Smith".split
=> ["John", "Smith"]
irb(main):009:0> p [name, family]
["John", "Smith"]
=> nil

[toc] | [prev] | [next] | [standalone]

#2515

From	Brian Candler <b.candler@pobox.com>
Date	2011-04-08 02:20 -0500
Message-ID	<2c0ba6e5af22438debe7a3697d89a2c8@ruby-forum.com>
In reply to	#2467

Chad Perrin wrote in post #991517:
> . . . so, is there some way to use more descriptive variable names than
> the default $1, $2, et cetera, for captures from within a regex?  I'm
> not
> aware of any, but I too would find that agreeable.

Yes, ruby 1.9 has named capture groups. I posted an example earlier in 
this thread.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2469

From	Brian Candler <b.candler@pobox.com>
Date	2011-04-07 10:50 -0500
Message-ID	<8af1e658a94490797feb1e89a54b7a53@ruby-forum.com>
In reply to	#2465

Alexey Petrushin wrote in post #991484:
> Probably a stupid question, but is there a way to use :gsub replacement
> without $0 $1 $2 $3 (and without "\0\1\2\3")?

There is also $~ (Regexp.last_match); $1/$2/etc are just a facade.

> I would prefer something like:
>
>     "John Smith".gsub /(.+)\s(.+)/ do |name, family|
>       p [name, family]
>
>       # instead of this
>       p [$1, $2]
>     end

"John Smith".gsub /(.+)\s(.+)/ do
  name, family = $~.captures
  p [name, family]
end

Not pretty, but you can wrap it up in your own method:

class String
  def gsubcap(*arg)
    gsub(*arg) { yield $~.captures }
  end
end

"John Smith".gsubcap /(.+)\s(.+)/ do |name, family|
  p [name, family]
end

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2480

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-07 13:22 -0500
Message-ID	<bd4f08a899f7957c6f4de796256a539b@ruby-forum.com>
In reply to	#2469

Brian Candler wrote in post #991490:
> Alexey Petrushin wrote in post #991484:
>> Probably a stupid question, but is there a way to use :gsub replacement
>> without $0 $1 $2 $3 (and without "\0\1\2\3")?
>
> There is also $~ (Regexp.last_match); $1/$2/etc are just a facade.
>
>> I would prefer something like:
>>
>>     "John Smith".gsub /(.+)\s(.+)/ do |name, family|
>>       p [name, family]
>>
>>       # instead of this
>>       p [$1, $2]
>>     end
>
> "John Smith".gsub /(.+)\s(.+)/ do
>   name, family = $~.captures
>   p [name, family]
> end
>

And if you want to avoid writing code in perl:

str = "John Smith"
pattern = /(.+)\s(.+)/

result = str.gsub(pattern) do
  md_obj = Regexp.last_match
  first_name, last_name = md_obj[1], md_obj[2]

  p first_name, last_name
end

Or to avoid any indexing, you could do this:

str = "John Smith"
pattern = /(.+)\s(.+)/

result = str.gsub(pattern) do |match|
  first_name, last_name = match.split
  p first_name, last_name

  "some replacement"
end

puts result

--output:--
"John"
"Smith"
some replacement

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2471

From	Brian Candler <b.candler@pobox.com>
Date	2011-04-07 10:57 -0500
Message-ID	<b2307c024c813c369034901374a8eab4@ruby-forum.com>
In reply to	#2465

Or if you are a ruby 1.9 user, you could use named capture groups 
instead. I'm not sure they make the regexp itself any clearer in this 
case though:

"John Smith".gsub /(?<name>.+)\s(?<family>.+)/ do
  p [$~[:name],$~[:family]]
end

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2479

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-07 13:10 -0500
Message-ID	<2aaa7891436b722893ab4c050c9de9d0@ruby-forum.com>
In reply to	#2465

Alexey Petrushin wrote in post #991484:
> Probably a stupid question, but is there a way to use :gsub replacement
> without $0 $1 $2 $3 (and without "\0\1\2\3")?

Where are you replacing anything?

> I would prefer something like:
>
>     "John Smith".gsub /(.+)\s(.+)/ do |name, family|
>       p [name, family]
>
>       # instead of this
>       p [$1, $2]
>     end


"John Smith".scan(/\S+/) do |match|
  puts match
end

--output:--
John
Smith

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2482

From	Phillip Gawlowski <cmdjackryan@googlemail.com>
Date	2011-04-07 13:39 -0500
Message-ID	<BANLkTin4Efzor7KmVUVc1mDQCcqb7prjZA@mail.gmail.com>
In reply to	#2479

On Thu, Apr 7, 2011 at 8:10 PM, 7stud -- <bbxx789_05ss@yahoo.com> wrote:
>
> "John Smith".scan(/\S+/) do |match|
>  puts match
> end

irb(main):001:0> "John;Smith".scan /\S+/ do |match|
irb(main):002:1* puts match
irb(main):003:1> end
John;Smith
=> "John;Smith"

Ups.

Better:

irb(main):04:0> "John;Smith".scan /\w+/ do |match|
irb(main):05:1* puts match
irb(main):06:1> end
John
Smith

The code still makes assumptions about the data, though: it is uniform
in that only the first n parts are the name, and not n[+|-]1.

-- 
Phillip Gawlowski

Though the folk I have met,
(Ah, how soon!) they forget
When I've moved on to some other place,
There may be one or two,
When I've played and passed through,
Who'll remember my song or my face.

[toc] | [prev] | [next] | [standalone]

#2486

From	jake kaiden <jakekaiden@yahoo.com>
Date	2011-04-07 15:57 -0500
Message-ID	<a04876ce1e733c5b5c15fdc57c231aad@ruby-forum.com>
In reply to	#2465

Alexey Petrushin wrote in post #991484:
>
> I would prefer something like:
>
>     "John Smith".gsub /(.+)\s(.+)/ do |name, family|
>       p [name, family]
>
>       # instead of this
>       p [$1, $2]
>     end

  i also wonder if gsub is necessary...  there's no replacement here as 
far as i can tell.  my oversimplified monkey brain comes up with this:


irb(main):001:0> str = "John Smith - Minister of Funny Walks."
=> "John Smith - Minister of Funny Walks."
irb(main):002:0> arr = str.split(" ")
=> ["John", "Smith", "-", "Minister", "of", "Funny", "Walks."]
irb(main):003:0> name, family = arr[0], arr[1]
=> ["John", "Smith"]
irb(main):004:0> puts name
John
=> nil
irb(main):005:0> puts family
Smith
=> nil

  this also makes some assumptions about the data, of course...

  -j

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#2505

From	Sergey Avseyev <sergey.avseyev@gmail.com>
Date	2011-04-07 22:09 -0700
Message-ID	<6123c33e-694e-484a-b2b5-d4658eda2415@w6g2000vbo.googlegroups.com>
In reply to	#2486

On Apr 7, 11:57 pm, jake kaiden <jakekai...@yahoo.com> wrote:
> irb(main):001:0> str = "John Smith - Minister of Funny Walks."
> => "John Smith - Minister of Funny Walks."
> irb(main):002:0> arr = str.split(" ")
> => ["John", "Smith", "-", "Minister", "of", "Funny", "Walks."]
> irb(main):003:0> name, family = arr[0], arr[1]

name, family, = arr

[toc] | [prev] | [next] | [standalone]

#2526

From	Alexey Petrushin <axyd80@gmail.com>
Date	2011-04-08 07:31 -0500
Message-ID	<f5538c33624d85f5e6f07bfa139bcc49@ruby-forum.com>
In reply to	#2465

Thanks for advices, I didn't know about $~ containting arrays of 
results, add new 'substitute' method is probably the best solution.

> This solution is not very generalizable.  It only works as presented for
> cases where all the stuff you want to discard looks the same.
No, it's no less generalizable than $X stuff, use splats if You have 
different matches.
"John Smith".substitute{|*tokens| ...}

And yes the provided sample is unclear, there where actually no 
replacement, maybe it should be something like that:

    "John Smith".gsub /(.+)\s(.+)/ do |name, family|
      "#{name[0..0]}. #{family}"
    end


Here's the complete solution:

    class String
      def substitute(*args)
        gsub(*args){yield Regexp.last_match.captures}
      end

      def substitute!(*args)
        gsub!(*args){yield Regexp.last_match.captures}
      end
    end

Thanks for help!

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]

csiph-web

String.gsub with regex and block

Contents

#2465 — String.gsub with regex and block

#2467

#2515

#2469

#2480

#2471

#2479

#2482

#2486

#2505

#2526