Groups > comp.lang.ruby > #3539 > unrolled thread

pattern matching and array methods

Started by	Mfer Dez <emphxl@yahoo.com>
First post	2011-04-26 23:07 -0500
Last post	2011-04-27 13:31 -0500
Articles	8 — 7 participants

Back to article view | Back to comp.lang.ruby

  pattern matching and array methods Mfer Dez <emphxl@yahoo.com> - 2011-04-26 23:07 -0500
    Re: pattern matching and array methods Josh Cheek <josh.cheek@gmail.com> - 2011-04-26 23:27 -0500
      Re: pattern matching and array methods Josh Cheek <josh.cheek@gmail.com> - 2011-04-27 13:00 -0500
    Re: pattern matching and array methods Christopher Dicely <cmdicely@gmail.com> - 2011-04-27 00:19 -0500
    Re: pattern matching and array methods Dhruva Sagar <dhruva.sagar@gmail.com> - 2011-04-27 01:32 -0500
      Re: pattern matching and array methods Brian Candler <b.candler@pobox.com> - 2011-04-27 02:57 -0500
        Re: pattern matching and array methods Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-27 03:52 -0500
    Re: pattern matching and array methods 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-27 13:31 -0500

#3539 — pattern matching and array methods

From	Mfer Dez <emphxl@yahoo.com>
Date	2011-04-26 23:07 -0500
Subject	pattern matching and array methods
Message-ID	<d4d716c8f8db27cf9490dd380e656a03@ruby-forum.com>

I have a text file that is structured like so:

1:1 abcdefg
1:2 abcdefg
1:3 abcdefg
1:4 abcdefg
1:5 abcdefg

I would like to be able to print out a subset of the file ie: print the
line beginning with 1:2 through the line beginning with 1:4

So far, I've started with this;

lines = File.readlines("file.txt")

This puts each line of the text file into an array, so the lines[] array
looks like this:

line[0] is 1:1 abcdefg
line[1] is 1:2 abcdefg
line[2] is 1:3 abcdefg
etc.

If I want to print out the lines that start with 1:2 through 1:4, how
should I proceed? Some of the text files won't be "aligned" in that
line[0] won't always be 1:1. If a user would like to print the lines
containing 3:9 - 3:31, how can I scan each line of the array and pattern
match the boundaries (3:9 - 3:31 in this example)? What array methods
are available to me.

Thanks in advance for any information that could point me in the right
direction.

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [next] | [standalone]

#3540

From	Josh Cheek <josh.cheek@gmail.com>
Date	2011-04-26 23:27 -0500
Message-ID	<BANLkTinq1Y7tAac+cnRK1owyabfqoyX15Q@mail.gmail.com>
In reply to	#3539

[Note:  parts of this message were removed to make it a legal post.]

On Tue, Apr 26, 2011 at 11:07 PM, Mfer Dez <emphxl@yahoo.com> wrote:

> I have a text file that is structured like so:
>
> 1:1 abcdefg
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
> 1:5 abcdefg
>
> I would like to be able to print out a subset of the file ie: print the
> line beginning with 1:2 through the line beginning with 1:4
>
> So far, I've started with this;
>
> lines = File.readlines("file.txt")
>
> This puts each line of the text file into an array, so the lines[] array
> looks like this:
>
> line[0] is 1:1 abcdefg
> line[1] is 1:2 abcdefg
> line[2] is 1:3 abcdefg
> etc.
>
> If I want to print out the lines that start with 1:2 through 1:4, how
> should I proceed? Some of the text files won't be "aligned" in that
> line[0] won't always be 1:1. If a user would like to print the lines
> containing 3:9 - 3:31, how can I scan each line of the array and pattern
> match the boundaries (3:9 - 3:31 in this example)? What array methods
> are available to me.
>
> Thanks in advance for any information that could point me in the right
> direction.
>
> --
> Posted via http://www.ruby-forum.com/.
>
>
I suppose you could use a flip flop... or awk :P



$ cat file.txt
1:1 abcdefg
1:2 abcdefg
1:3 abcdefg
1:4 abcdefg
1:5 abcdefg

$ ruby -e '
> File.foreach ARGV.first do |line|
>   puts line if line.start_with?("1:2")..line.start_with?("1:4")
> end
> ' file.txt
1:2 abcdefg
1:3 abcdefg
1:4 abcdefg

$ awk '$1 == "1:2", $1 == "1:4"' file.txt
1:2 abcdefg
1:3 abcdefg
1:4 abcdefg

[toc] | [prev] | [next] | [standalone]

#3567

From	Josh Cheek <josh.cheek@gmail.com>
Date	2011-04-27 13:00 -0500
Message-ID	<BANLkTinybmNKANH_bTTarVUb5gV5t8c4JA@mail.gmail.com>
In reply to	#3540

[Note:  parts of this message were removed to make it a legal post.]

On Tue, Apr 26, 2011 at 11:27 PM, Josh Cheek <josh.cheek@gmail.com> wrote:

>
> I suppose you could use a flip flop... or awk :P
>
>
>
> $ cat file.txt
> 1:1 abcdefg
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
> 1:5 abcdefg
>
> $ ruby -e '
> > File.foreach ARGV.first do |line|
> >   puts line if line.start_with?("1:2")..line.start_with?("1:4")
> > end
> > ' file.txt
>
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
>
>
Actually, you need a regex here, because start_with?("1:2") will match "1:23
abcdefg" for example. With a regex you can use \b to indicate the word
break, or if you can have leading whitespace, a regex can deal with that.


$ ruby -e '(1..40).each { |big| (1..40).each { |small| puts "#{big}:#{small}
blah" } }' |
> ruby -e '$stdin.each { |line| puts line if line[/^1:2\b/]..line[/^2:2\b/]
}'
1:2 blah
1:3 blah
1:4 blah
1:5 blah
1:6 blah
1:7 blah
1:8 blah
1:9 blah
1:10 blah
1:11 blah
1:12 blah
1:13 blah
1:14 blah
1:15 blah
1:16 blah
1:17 blah
1:18 blah
1:19 blah
1:20 blah
1:21 blah
1:22 blah
1:23 blah
1:24 blah
1:25 blah
1:26 blah
1:27 blah
1:28 blah
1:29 blah
1:30 blah
1:31 blah
1:32 blah
1:33 blah
1:34 blah
1:35 blah
1:36 blah
1:37 blah
1:38 blah
1:39 blah
1:40 blah
2:1 blah
2:2 blah


I don't really understand why everyone else is parsing the numbers. Perhaps
they assume these lines might not be in order?




On Tue, Apr 26, 2011 at 11:07 PM, Mfer Dez <emphxl@yahoo.com> wrote:

> If a user would like to print the lines
> containing 3:9 - 3:31, how can I scan each line of the array and pattern
> match the boundaries (3:9 - 3:31 in this example)? What array methods
> are available to me.
>
>

For custom boundaries, you can just interpolate them into the regex

$ ruby -e '(1..40).each { |big| (1..40).each { |small| puts "#{big}:#{small}
blah" } }' |
> ruby -e '$stdin.each { |line| puts line if
line[/^#{ARGV[0]}\b/]..line[/^#{ARGV[1]}\b/] }' 3:9 3:31
3:9 blah
3:10 blah
3:11 blah
3:12 blah
3:13 blah
3:14 blah
3:15 blah
3:16 blah
3:17 blah
3:18 blah
3:19 blah
3:20 blah
3:21 blah
3:22 blah
3:23 blah
3:24 blah
3:25 blah
3:26 blah
3:27 blah
3:28 blah
3:29 blah
3:30 blah
3:31 blah


I'm reading these line by line from the file, that is most efficient (what
if your file is enormous, do you really want to read it all into an array?)
but the interface to an array is exactly the same, instead of iterating over
the file, you just iterate over the array. Just change $stdin.each to
$stdin.readlines.each, everything works the same, but uses an array now.

$ ruby -e '(1..40).each { |big| (1..40).each { |small| puts "#{big}:#{small}
blah" } }' |
> ruby -e '$stdin.readlines.each { |line| puts line if
line[/^#{ARGV[0]}\b/]..line[/^#{ARGV[1]}\b/] }' 3:9 3:31
3:9 blah
3:10 blah
3:11 blah
3:12 blah
3:13 blah
3:14 blah
3:15 blah
3:16 blah
3:17 blah
3:18 blah
3:19 blah
3:20 blah
3:21 blah
3:22 blah
3:23 blah
3:24 blah
3:25 blah
3:26 blah
3:27 blah
3:28 blah
3:29 blah
3:30 blah
3:31 blah

[toc] | [prev] | [next] | [standalone]

#3542

From	Christopher Dicely <cmdicely@gmail.com>
Date	2011-04-27 00:19 -0500
Message-ID	<BANLkTimmGOh6wPdx52WJxgSQx-mV02x0Gg@mail.gmail.com>
In reply to	#3539

On Tue, Apr 26, 2011 at 9:07 PM, Mfer Dez <emphxl@yahoo.com> wrote:
> I have a text file that is structured like so:
>
> 1:1 abcdefg
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
> 1:5 abcdefg
>
> I would like to be able to print out a subset of the file ie: print the
> line beginning with 1:2 through the line beginning with 1:4
>
> So far, I've started with this;
>
> lines = File.readlines("file.txt")
>
> This puts each line of the text file into an array, so the lines[] array
> looks like this:
>
> line[0] is 1:1 abcdefg
> line[1] is 1:2 abcdefg
> line[2] is 1:3 abcdefg
> etc.
>
> If I want to print out the lines that start with 1:2 through 1:4, how
> should I proceed? Some of the text files won't be "aligned" in that
> line[0] won't always be 1:1. If a user would like to print the lines
> containing 3:9 - 3:31, how can I scan each line of the array and pattern
> match the boundaries (3:9 - 3:31 in this example)? What array methods
> are available to me.


If you just want to print the selected lines, the array methods
available aren't the interesting ones (Array#each is probably enough),
the interesting part is parsing the tag part of the lines, for which
you probably want to consider using regular expressions.

[toc] | [prev] | [next] | [standalone]

#3550

From	Dhruva Sagar <dhruva.sagar@gmail.com>
Date	2011-04-27 01:32 -0500
Message-ID	<BANLkTikqzThxF5ufpz-VExi9DZ17LxTkiw@mail.gmail.com>
In reply to	#3539

[Note:  parts of this message were removed to make it a legal post.]

Something like this should do it :

line.each {|l| puts l if l =~ /^(1:2)|(1:3)|(1:4)/}

On Wed, Apr 27, 2011 at 09:37, Mfer Dez <emphxl@yahoo.com> wrote:

> I have a text file that is structured like so:
>
> 1:1 abcdefg
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
> 1:5 abcdefg
>
> I would like to be able to print out a subset of the file ie: print the
> line beginning with 1:2 through the line beginning with 1:4
>
> So far, I've started with this;
>
> lines = File.readlines("file.txt")
>
> This puts each line of the text file into an array, so the lines[] array
> looks like this:
>
> line[0] is 1:1 abcdefg
> line[1] is 1:2 abcdefg
> line[2] is 1:3 abcdefg
> etc.
>
> If I want to print out the lines that start with 1:2 through 1:4, how
> should I proceed? Some of the text files won't be "aligned" in that
> line[0] won't always be 1:1. If a user would like to print the lines
> containing 3:9 - 3:31, how can I scan each line of the array and pattern
> match the boundaries (3:9 - 3:31 in this example)? What array methods
> are available to me.
>
> Thanks in advance for any information that could point me in the right
> direction.
>
> --
> Posted via http://www.ruby-forum.com/.
>
>


-- 
Thanks & Regards,
Dhruva Sagar <http://dhruvasagar.net>
----------------------------
Technical Developer - Mentor,
Artha42 Innovations Pvt. Ltd. <http://www.artha42.com/>

Become an expert in Rails. Join our 3 day Rails workshop and learn Ruby,
Rails 3, Cucumber and Git.
http://www.railspundit.com

[toc] | [prev] | [next] | [standalone]

#3554

From	Brian Candler <b.candler@pobox.com>
Date	2011-04-27 02:57 -0500
Message-ID	<379e9ad9628af6f9972a8255e435d28d@ruby-forum.com>
In reply to	#3550

Dhruva Sagar wrote in post #995263:
> Something like this should do it :
>
> line.each {|l| puts l if l =~ /^(1:2)|(1:3)|(1:4)/}

That's a poor answer, because your regexp isn't anchored properly. It 
would match "5:6 abc1:3def" and "1:23 foobar" for example.

I suggest using the regexp to parse the line, then using numeric 
testing. This makes it easier to solve the other example of 3:9 to 3:31

lines.each do |line|
  if line =~ /^(\d+):(\d+)/
    major, minor = $1.to_i, $2.to_i
    puts line if major == 3 and (9..31).include?(minor)
  end
end

Note that you don't need to read the whole file in at once using 
readlines; you can read and process it one line at a time. This lets it 
work on huge files which are too big to fit into RAM.

File.open("...") do |file|
  file.each_line do |line|
    if line =~ ... as before
      ...
    end
  end
end

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [next] | [standalone]

#3555

From	Jesús Gabriel y Galán <jgabrielygalan@gmail.com>
Date	2011-04-27 03:52 -0500
Message-ID	<BANLkTi=Z0YawE7T0i+0gkBm8Zix8A3Kjog@mail.gmail.com>
In reply to	#3554

On Wed, Apr 27, 2011 at 9:57 AM, Brian Candler <b.candler@pobox.com> wrote:
> Dhruva Sagar wrote in post #995263:
>> Something like this should do it :
>>
>> line.each {|l| puts l if l =~ /^(1:2)|(1:3)|(1:4)/}
>
> That's a poor answer, because your regexp isn't anchored properly. It
> would match "5:6 abc1:3def" and "1:23 foobar" for example.
>
> I suggest using the regexp to parse the line, then using numeric
> testing. This makes it easier to solve the other example of 3:9 to 3:31
>
> lines.each do |line|
>  if line =~ /^(\d+):(\d+)/
>    major, minor = $1.to_i, $2.to_i
>    puts line if major == 3 and (9..31).include?(minor)
>  end
> end

Generalizing a bit more:

lower_major, lower_minor = "3:9".split(":").map {|x| x.to_i}
upper_major, upper_minor = "3:31".split(":").map {|x| x.to_i}
major_range = lower_major..upper_major
minor_range = lower_minor..upper_minor

lines.each do |line|
  if line =~ /^(\d+):(\d+)/
   major, minor = $1.to_i, $2.to_i
   puts line if major_range.include?(major) and minor_range.include?(minor)
 end
end

Jesus.

[toc] | [prev] | [next] | [standalone]

#3568

From	7stud -- <bbxx789_05ss@yahoo.com>
Date	2011-04-27 13:31 -0500
Message-ID	<aae8b2e2509739ac0e9fbc259ce1f30c@ruby-forum.com>
In reply to	#3539

Mfer Dez wrote in post #995245:
> I have a text file that is structured like so:
>
> 1:1 abcdefg
> 1:2 abcdefg
> 1:3 abcdefg
> 1:4 abcdefg
> 1:5 abcdefg
>
> I would like to be able to print out a subset of the file ie: print the
> line beginning with 1:2 through the line beginning with 1:4
>

If do some work to save the lines in an easily accessible structure, you 
can make the lookup much easier:

lines = [

  '1:1 xxxxxx',
  '1:2 xxxxxx',
  '1:3 xxxxxx',
  '1:4 xxxxxx',
  '1:5 xxxxxx',
  '2:1 xxxxxx',
  '2:2 xxxxxx',
  '2:3 xxxxxx',
  '2:4 xxxxxx',
  '2:5 xxxxxx'
]


#Create a hash whose non-existent keys
#are automatically assigned an empty array:
h = Hash.new {|hash, key| hash[key] = []}

lines.each do |line|
  numbers, str = line.split(' ', 2)
  key, index = numbers.split(':')
  h[key][index.to_i] = line
  #If h[key] does not exist it will automatically
  #be assigned an empty array, which you can then
  #index into.
end

target = '2:2 - 2:5'
start, stop = target.split(/\s* - \s*/xms)
key1, index1 = start.split(':')
key2, index2 = stop.split(':')

index1, index2 = index1.to_i, index2.to_i
p h[key1][index1..index2]

--output:--
["2:2 xxxxxx", "2:3 xxxxxx", "2:4 xxxxxx", "2:5 xxxxxx"]

-- 
Posted via http://www.ruby-forum.com/.

[toc] | [prev] | [standalone]

csiph-web

pattern matching and array methods

Contents

#3539 — pattern matching and array methods

#3540

#3567

#3542

#3550

#3554

#3555

#3568