Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.ruby > #2389 > unrolled thread

String#each_*slice* methods (like Enumerable#each_slice)

Started by"Aaron D. Gifford" <astounding@gmail.com>
First post2011-04-06 12:52 -0500
Last post2011-04-06 18:33 -0500
Articles 5 — 3 participants

Back to article view | Back to comp.lang.ruby


Contents

  String#each_*slice* methods (like Enumerable#each_slice) "Aaron D. Gifford" <astounding@gmail.com> - 2011-04-06 12:52 -0500
    Re: String#each_*slice* methods (like Enumerable#each_slice) Quintus <sutniuq@gmx.net> - 2011-04-06 13:42 -0500
      Re: String#each_*slice* methods (like Enumerable#each_slice) "Aaron D. Gifford" <astounding@gmail.com> - 2011-04-06 13:49 -0500
        Re: String#each_*slice* methods (like Enumerable#each_slice) "Aaron D. Gifford" <astounding@gmail.com> - 2011-04-06 14:09 -0500
    Re: String#each_*slice* methods (like Enumerable#each_slice) 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-06 18:33 -0500

#2389 — String#each_*slice* methods (like Enumerable#each_slice)

From"Aaron D. Gifford" <astounding@gmail.com>
Date2011-04-06 12:52 -0500
SubjectString#each_*slice* methods (like Enumerable#each_slice)
Message-ID<BANLkTinMv0-L01+pwiiLc0Gruk+56B1GeQ@mail.gmail.com>
Hi,

I find I periodically need to iterate over slices of a string.
Enumerable has the useful each_slice method, but in Ruby 1.9, I don't
see an equivalent for the String class.

So I've monkey-patched String a bit like this:



## Monkeypatch String to add some each_*slice* methods:
class String
  ## Like Enumerable#each_slice() only it yields a string
  ## of chars characters (the slice):
  def each_slice(chars)
    self.scan(/.{1,#{chars}}/m).each do |s|
      yield s
    end
  end

  ## Like Enumerable#each_slice() only it yields an array
  ## of Fixnum bytes from the string (the slice):
  def each_byteslice(bytes)
    self.bytes.to_a.each_slice(bytes) do |s|
      yield s
    end
  end

  ## Like Enumerable#each_slice() only it yields a binary
  ## string of specified bytes (the slice):
  def each_bslice(bytes)
    if encoding == Encoding::BINARY
      str = self
    else
      str = self.dup.force_encoding(Encoding::BINARY)
    end
    str.scan(/.{1,#{bytes}}/m).each do |s|
      yield s
    end
  end

end



So now for the question.  Is there a better way to accomplish
something similar?  I'm not debating whether to do it as a monkey
patch or not--that's irrelevant to me. But is there a more efficient
way to slice up strings and iterate over fixed sized chunks?

One alternative each_bslice implementation I tried used
str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
slower in benchmarks versus the str.scan method.

Aaron out.

[toc] | [next] | [standalone]


#2392

FromQuintus <sutniuq@gmx.net>
Date2011-04-06 13:42 -0500
Message-ID<4D9CB445.8000504@gmx.net>
In reply to#2389
Am 06.04.2011 19:52, schrieb Aaron D. Gifford:
> So now for the question.  Is there a better way to accomplish
> something similar?  I'm not debating whether to do it as a monkey
> patch or not--that's irrelevant to me. But is there a more efficient
> way to slice up strings and iterate over fixed sized chunks?
> 
> One alternative each_bslice implementation I tried used
> str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
> slower in benchmarks versus the str.scan method.
> 
> Aaron out.
> 
> 

Use Enumarators:
================================
irb(main):001:0> str = "ÄÄÄÖÖÖÜÜÜ"
=> "ÄÄÄÖÖÖÜÜÜ"
irb(main):002:0> str.chars.each_slice(3){|x| p x}
["Ä", "Ä", "Ä"]
["Ö", "Ö", "Ö"]
["Ü", "Ü", "Ü"]
=> nil
irb(main):003:0> str.bytes.each_slice(3){|x| p x}
[195, 132, 195]
[132, 195, 132]
[195, 150, 195]
[150, 195, 150]
[195, 156, 195]
[156, 195, 156]
=> nil
irb(main):004:0>
================================

Vale,
Marvin

[toc] | [prev] | [next] | [standalone]


#2393

From"Aaron D. Gifford" <astounding@gmail.com>
Date2011-04-06 13:49 -0500
Message-ID<BANLkTi=HCJY8KW0VR6qZ+cRmXFLBFy0OUg@mail.gmail.com>
In reply to#2392
Quintus <sutniuq@gmx.net> replied:
> Use Enumarators:
> ================================
> irb(main):001:0> str = "ÄÄÄÖÖÖÜÜÜ"
> => "ÄÄÄÖÖÖÜÜÜ"
> irb(main):002:0> str.chars.each_slice(3){|x| p x}
> ["Ä", "Ä", "Ä"]
> ["Ö", "Ö", "Ö"]
> ["Ü", "Ü", "Ü"]
> => nil
> irb(main):003:0> str.bytes.each_slice(3){|x| p x}
> [195, 132, 195]
> [132, 195, 132]
> [195, 150, 195]
> [150, 195, 150]
> [195, 156, 195]
> [156, 195, 156]
> => nil
> irb(main):004:0>
> ================================
>
> Vale,
> Marvin

Yes, I agree, that can work.

As I said in my original post:
>> One alternative each_bslice implementation I tried used
>> str.bytes.to_a.map(&:chr).each_slice(x){|c| p c.join} but it was a bit
>> slower in benchmarks versus the str.scan method.

That implementation did use enumerators.  But it was slower than
str.scan.  Hence my asking if there was a better (faster/more
efficient) way.

I didn't try benchmarking str.chars.each_slice vs str.scan.  I'll have
to check that out.  Thanks for pointing that out to me!

Aaron out.

[toc] | [prev] | [next] | [standalone]


#2396

From"Aaron D. Gifford" <astounding@gmail.com>
Date2011-04-06 14:09 -0500
Message-ID<BANLkTi=7bSbp32tPbDOL1vSBTOR=m+Ce+w@mail.gmail.com>
In reply to#2393
Looking more closely on the use of str.scan vs str.chars.each_slice
string slicing, it appears that the best one to use depends on what
form of slice one needs.

If I need a string yielded that is a substring (a slice) vs. an array
of characters or array of bytes, then the scan method is consistently
faster on my machine.  However, if I want an array of characters or
bytes, then str.chars.each_slice or str.bytes.each_slice is faster.

Most of the time for me, however, I need a substring slice.

Aaron out.

[toc] | [prev] | [next] | [standalone]


#2416

From7stud -- <bbxx789_05ss@yahoo.com>
Date2011-04-06 18:33 -0500
Message-ID<14aa0a86278a4755ef0050004fb530c9@ruby-forum.com>
In reply to#2389
Aaron D. Gifford wrote in post #991274:
> Hi,
>
> I find I periodically need to iterate over slices of a string.
> Enumerable has the useful each_slice method, but in Ruby 1.9, I don't
> see an equivalent for the String class.
>

How about:

str = "hello world"

while str.size > 0
  substr = str.slice!(0, 3)  #(offset, length)
  puts "-->#{substr}<--"
end

--output:--
-->hel<--
-->lo <--
-->wor<--
-->ld<--

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.ruby


csiph-web