Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.ruby > #3574 > unrolled thread
| Started by | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| First post | 2011-04-27 15:02 -0500 |
| Last post | 2011-04-29 16:10 -0500 |
| Articles | 20 on this page of 22 — 5 participants |
Back to article view | Back to comp.lang.ruby
File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-27 15:02 -0500
Re: File position and buffers Jesús Gabriel y Galán <jgabrielygalan@gmail.com> - 2011-04-27 16:47 -0500
Re: File position and buffers jake kaiden <jakekaiden@yahoo.com> - 2011-04-27 17:33 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-27 19:08 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-27 19:50 -0500
Re: File position and buffers Robert Klemme <shortcutter@googlemail.com> - 2011-04-28 02:54 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-28 13:06 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-28 13:25 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-28 13:29 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-28 09:06 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-28 12:47 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-28 13:27 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-28 18:31 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-28 20:05 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-28 21:58 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-29 10:20 -0500
Re: File position and buffers jake kaiden <jakekaiden@yahoo.com> - 2011-04-28 22:36 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-29 12:50 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-29 13:32 -0500
Re: File position and buffers 7stud -- <bbxx789_05ss@yahoo.com> - 2011-04-29 17:45 -0500
Re: File position and buffers jake kaiden <jakekaiden@yahoo.com> - 2011-04-29 15:38 -0500
Re: File position and buffers Cee Joe <cyril_jose@ymail.com> - 2011-04-29 16:10 -0500
Page 1 of 2 [1] 2 Next page →
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-27 15:02 -0500 |
| Subject | File position and buffers |
| Message-ID | <10d8ae57765e21626a7c64873dcba807@ruby-forum.com> |
Hi all, In a bit of a rut. Have a file with a lot of text. I want to seperate the text in this file as entries. Each entry that I would be seperating, would be done so using IO.pos and when that cursor reaches a certain character in the file, it will ideally place all the content before that character into a buffer. Then the cursor will continue reading until it hits that same character again and put that content into a buffer, so on and so forth. (Character I'll be reading would be a greater than symbol) Would I use a do iterator or use a while loop with a gets method? Or readlines perhaps? File: >entry 1 rubyrubyrubyrubyrubyrubyrubyruby (newline here which I don't want) >entry 2 rubyrubyrubyrubyrubyrubyrubyruby Entry1 and entry2 will be in seperate buffers which I would be able to access again. buffer1 = >entry 1 rubyrubyrubyrubyrubyrubyrubyruby buffer2 = >entry 2 rubyrubyrubyrubyrubyrubyrubyruby PS. The file is huge, so I don't want to read it into memory. What is the best way to approach this? Any suggestions or comments would be helpful. Thanks! -- Posted via http://www.ruby-forum.com/.
[toc] | [next] | [standalone]
| From | Jesús Gabriel y Galán <jgabrielygalan@gmail.com> |
|---|---|
| Date | 2011-04-27 16:47 -0500 |
| Message-ID | <BANLkTimwmQpU3E73MNnGRzf+BZsBFtDqyw@mail.gmail.com> |
| In reply to | #3574 |
You could use foreach checking if each line starts with '>'. If it doesn't you accumulate in a buffer; if it does you do something with the current buffer and start a new one. Jesus El 27/04/2011 22:04, "Cee Joe" <cyril_jose@ymail.com> escribió:
[toc] | [prev] | [next] | [standalone]
| From | jake kaiden <jakekaiden@yahoo.com> |
|---|---|
| Date | 2011-04-27 17:33 -0500 |
| Message-ID | <e7fe81ff36d39f75e03940dd92433130@ruby-forum.com> |
| In reply to | #3574 |
hi Cee -
this may well be WAY to simple for your needs, but it seems to me you
could so something like this:
(0text.txt is a file with 7 lines that say rubyrubyrubyetc.)
f = "0text.txt"
file = File.open(f)
buffer = []
bufferindex = 0
file.each_line{|line|
buffer[bufferindex] = line.chomp
bufferindex += 1
}
p buffer[0]
p buffer[1]
p buffer[2]
#etc...
of course you could also set a maximum number of lines per buffer:
f = "0text.txt"
file = File.open(f)
buffer = Hash.new{|key, value| key[value]= []}
bufferkey = 0
maxbuflength = 3
file.each_line{|line|
if buffer[bufferkey].length == maxbuflength
bufferkey +=1
buffer[bufferkey] << line.chomp
else
buffer[bufferkey] << line.chomp
end
}
p buffer[0]
p buffer[1]
p buffer[2]
if the file's extremely long i guess you'd want to write a method to
dump the buffers at some point too.
maybe this is dumb, i hope not!
cheers,
-j
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-27 19:08 -0500 |
| Message-ID | <5fcf83686c90d0c89ba3cdbb67b7255c@ruby-forum.com> |
| In reply to | #3574 |
Cee Joe wrote in post #995381: > Hi all, > > In a bit of a rut. Have a file with a lot of text. I want to seperate > the text in this file as entries. Each entry that I would be seperating, > would be done so using IO.pos and when that cursor reaches a certain > character in the file, it will ideally place all the content before that > character into a buffer. Then the cursor will continue reading until it > hits that same character again and put that content into a buffer, so on > and so forth. (Character I'll be reading would be a greater than symbol) > There is absolutely no reason to use pos() to read that file. > Would I use a do iterator or use a while loop with a gets method? Or > readlines perhaps? > > File: >>entry 1 > rubyrubyrubyrubyrubyrubyrubyruby > (newline here which I don't want) > chomp() removes one newline, if present, at the end of a string. > > PS. The file is huge, so I don't want to read it into memory. What is > the best way to approach this? Any suggestions or comments would be > helpful. Thanks! Well, then you have to tell us what you want to do with the segments of the file. If you store each chunk in a variable, then you will have read the whole file into memory. You say your file looks like this: >entry 1 <---WHAT'S AT THE END OF THIS LINE?? rubyrubyrubyrubyruby <---WHAT'S AT THE END OF THIS LINE?? (newline here which I don't want) Those look like newlines. Are you saying that your data is organized into paragraphs, i.e. separated by two newlines? Like this: >entry1\n rubyrubyruby\n \n >entry2\n rubyrubyruby\n \n >entry3 A paragraph is defined as two consective newlines between lines. Note that in ruby the default line separator is one newline. But you can change that to two newlines--or any other character: require 'stringio' str =<<ENDOFSTRING >entry1 11111111111 >entry2 22222222222 >entry3 33333333333 ENDOFSTRING input = StringIO.new(str) $/ = "\n\n" input.each do |para| p para.sub(/\n+ \z/xms, "") end --output:-- ">entry1\n11111111111" ">entry2\n22222222222" ">entry3\n33333333333" -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-27 19:50 -0500 |
| Message-ID | <bd42da2f59207f9958b6377fbdea7517@ruby-forum.com> |
| In reply to | #3574 |
This shows the output better:
e = input.enum_for(:each) #You can do this for a File too.
e.each_slice(2) do |buffer1, buffer2|
puts "buffer1: #{buffer1.inspect}"
puts "buffer2: #{buffer2.inspect}"
puts "-" * 10
end
--output:--
buffer1: ">entry1\n11111111111\n\n"
buffer2: ">entry2\n22222222222\n\n"
----------
buffer1: ">entry3\n33333333333\n"
buffer2: nil
----------
Before doing the sub() on buffer2, you will have to check if its nil:
if buffer2.nil?
#don't do a sub()
else
#do the sub()
end
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Robert Klemme <shortcutter@googlemail.com> |
|---|---|
| Date | 2011-04-28 02:54 -0500 |
| Message-ID | <BANLkTi=SoLUqREXGZKu1yjzfik=OtUb+iQ@mail.gmail.com> |
| In reply to | #3574 |
On Wed, Apr 27, 2011 at 10:02 PM, Cee Joe <cyril_jose@ymail.com> wrote: > Hi all, > > In a bit of a rut. Have a file with a lot of text. I want to seperate > the text in this file as entries. Each entry that I would be seperating, > would be done so using IO.pos and when that cursor reaches a certain > character in the file, it will ideally place all the content before that > character into a buffer. Then the cursor will continue reading until it > hits that same character again and put that content into a buffer, so on > and so forth. (Character I'll be reading would be a greater than symbol) > > Would I use a do iterator or use a while loop with a gets method? Or > readlines perhaps? > > File: >>entry 1 > rubyrubyrubyrubyrubyrubyrubyruby > (newline here which I don't want) >>entry 2 > rubyrubyrubyrubyrubyrubyrubyruby > > Entry1 and entry2 will be in seperate buffers which I would be able to > access again. > > buffer1 = >entry 1 > rubyrubyrubyrubyrubyrubyrubyruby > > buffer2 = >entry 2 > rubyrubyrubyrubyrubyrubyrubyruby > > > PS. The file is huge, so I don't want to read it into memory. What is > the best way to approach this? Any suggestions or comments would be > helpful. Thanks! One of the simplest approaches is to use Ruby's ability to use arbitrary record delimiters: File.foreach file_name, ">" do |chunk| chunk.chomp! ">" chunk.gsub! /\r\n?|\n/, '' # remove line terminators # if you need the leading ">": # chunk[0,0] = ">" p chunk end Kind regards robert -- remember.guy do |as, often| as.you_can - without end http://blog.rubybestpractices.com/
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-28 13:06 -0500 |
| Message-ID | <6f3710886bbfa66d9bb6cb71ea04b6ee@ruby-forum.com> |
| In reply to | #3612 |
Robert K. wrote in post #995478: > On Wed, Apr 27, 2011 at 10:02 PM, Cee Joe <cyril_jose@ymail.com> wrote: >> Would I use a do iterator or use a while loop with a gets method? Or >> access again. >> helpful. Thanks! > One of the simplest approaches is to use Ruby's ability to use > arbitrary record delimiters: > > File.foreach file_name, ">" do |chunk| > chunk.chomp! ">" > chunk.gsub! /\r\n?|\n/, '' # remove line terminators > Cee Joe, are you reading the file in binary mode or text mode? -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-28 13:25 -0500 |
| Message-ID | <95426fd6f1235f8f1a26cd8383acff10@ruby-forum.com> |
| In reply to | #3644 |
7stud -- wrote in post #995589: > > Cee Joe, are you reading the file in binary mode or > text mode? If you don't know, then show us the line in your code where you open the file. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-28 13:29 -0500 |
| Message-ID | <591a0a6013482db87dd9625b6033fe62@ruby-forum.com> |
| In reply to | #3646 |
7stud -- wrote in post #995596:
> 7stud -- wrote in post #995589:
>>
>> Cee Joe, are you reading the file in binary mode or
>> text mode?
>
> If you don't know, then show us the line in your code where you open the
> file.
f = File.open("test.fasta", "r")
Where test.fasta contains the entries i posted earlier..
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-28 09:06 -0500 |
| Message-ID | <0af64604aa1f5420f16889a3f19dbd0b@ruby-forum.com> |
| In reply to | #3574 |
Thanks guys for your helpful comments. I will be more descriptive. I am an intern and my mentor wants me to use the IO.pos to read the characters of the file until the character reaches the ">" symbol. SO upon the cursor reaching the ">" symbol(which is the start of a new entry), he wants me to place that previous entry in a buffer. Here is the actual test file I am working with: >gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA\n AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG\n \n >gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA\n GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG\n CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG\n \n >gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA\n CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAG\n TTAGTCGCTGACGCATGCACG\n \n 7stud, you are right there are two consecutive newlines which I failed to mention. This should be the output of a buffer for one entry: >gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA\n GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG <-- no "\n" CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG <-- no "\n" Notice how the newlines are gone. So with the exception of the header in each entry, the newlines should be gone and be placed in a buffer. I am lost on how to use the IO.pos and a file iterator to make sure each respective entry goes into a buffer without the file being indexed into memory. Thanks in advance, I'm new to the language and trying to wrap my head around it. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-28 12:47 -0500 |
| Message-ID | <97869b15e6f881b951b4b230011182e0@ruby-forum.com> |
| In reply to | #3574 |
You still have not told us what you are supposed to do with the stuff you read in?? You can read a file line by line and print out each line as you go and the maximum amount of memory used will be one line's worth. However, if you are supposed to store all the lines in an array, then you will read the whole file into memory. > Thanks guys for your helpful comments. I will be more > descriptive. I am an intern and my mentor wants me to > use the IO.pos to read the characters of the file > until the character reaches the ">" symbol. What problems is that giving you? You can create a loop, read the character at pos(i), then increment i, and do what Jesús Gabriel y Galán suggested. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-28 13:27 -0500 |
| Message-ID | <a8bfb33ef4ce932d8afc29b01bb29252@ruby-forum.com> |
| In reply to | #3642 |
7stud -- wrote in post #995581: > You still have not told us what you are supposed to do with the stuff > you read in?? You can read a file line by line and print out each line > as you go and the maximum amount of memory used will be one line's > worth. However, if you are supposed to store all the lines in an > array, then you will read the whole file into memory. > >> Thanks guys for your helpful comments. I will be more >> descriptive. I am an intern and my mentor wants me to >> use the IO.pos to read the characters of the file >> until the character reaches the ">" symbol. > I am extracting text from each entry I read in, something I have figured out already. I want to read the file line by line and just store each entry into a buffer when it reaches the ">" symbol. THen extract specific info from it later. The entry lengths all vary as there long and short lengths. File is in text mode. > What problems is that giving you? You can create a loop, read the > character at pos(i), then increment i, and do what Jesús Gabriel y Galán > suggested. Could you show me a simple example or refer me to a link? -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-28 18:31 -0500 |
| Message-ID | <c73bca63732c4f7e0943455cdc55a935@ruby-forum.com> |
| In reply to | #3647 |
Cee Joe wrote in post #995597: > > my mentor wants me to use the IO.pos to read the > characters of the file until the character reaches the ">" symbol. > IO.pos() does not read in data, so you are going to have to ask your mentor what he means. You should also ask your mentor if this is a lesson in how not to do things. If he doesn't reply in the affirmative, then you should find a new mentor. > I am extracting text from each entry I read in, something I have figured > out already. I want to read the file line by line and just store each > entry into a buffer when it reaches the ">" symbol. THen extract > specific info from it later. > You told us you were not supposed to read the whole file into memory. If you store every line in an array, then you will have read the whole file into memory. Once again, you are not being clear on what you want to do with the data. You need to tell us which of the following you want to do: 1) Store every entry in an array, and "extract specific info from it later". 2) Read one entry, do something to the entry, then discard it and read in the next entry. > The entry lengths all vary as there long > and short lengths. File is in text mode. > Ok. >> What problems is that giving you? You can create a loop, read the >> character at pos(i), then increment i, and do what Jesús Gabriel y Galán >> suggested. > You could use each_byte to read the file char by char (that assumes your file contains all ascii characters), then when you find a '>', seek() back to the start of the file, and use IO.sysread() to read: old_pos = 0 pos() - old_pos number of characters. Then do something like: old_pos = pos() and keep doing that. But, you will be reading every entry twice, which is stupid. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-28 20:05 -0500 |
| Message-ID | <859f1a6ec8eae502d91f42f274c1d8aa@ruby-forum.com> |
| In reply to | #3664 |
> 2) Read one entry, do something to the entry, then discard it and read
> in the next entry.
This is what I want to do. Read one entry, extract information from it,
then read next entry. He says using an array will take up a lot of
memory so he said use a buffer.
> But, you will end up reading every entry twice, which
> is stupid. The easiest way to read in the file and prepare each entry
> is to set the input separator to "\n\n", then use each() to read in a
> paragraph, then use split("\n") to split each entry into lines, then add
> back a \n to the first line.
>
> Also, are you aware that this:
>
>>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA\n
> GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG <-- no "\n"
> CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG <-- no "\n"
>
> is equivalent to:
>
>>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
>
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
Yes I am aware of that - I just put "no \n" for emphasis. Regarding the
pos(), I think he said to use it as a guide to help with the detection
of each ">" . Thanks for being patient and helping out.
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-28 21:58 -0500 |
| Message-ID | <1d5c91d93983445a08235be2797f7f0b@ruby-forum.com> |
| In reply to | #3574 |
If you don't have to use pos(), then see my first post. -- Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-29 10:20 -0500 |
| Message-ID | <8279ca181d004aab4664d4fe4e1fc33c@ruby-forum.com> |
| In reply to | #3670 |
7stud -- wrote in post #995683:
> If you don't have to use pos(), then see my first post. At some point,
> you might ask him why he thinks that pos() would be of any help at all!
Thanks jake and 7stud for replying. I tried this in irb for your first
post:
>> e = File.open("test/test.fasta").enum_for(:each)
=> #<Enumerable::Enumerator:0x1005777a8>
>> $/ = "\n\n"
=> "\n\n"
>Before doing the sub() on buffer2, you will have to check if it's nil:
>if buffer2.nil?
> #don't do a sub()
> else
> #do the sub()
>end
>> e.each_slice(2) do |buf1, buf2|
?> p buf1, buf2
>> if buf2.nil?
>> puts "Done"
>> else
?> buf2.sub(/\n+ \z/xms, "")
>> end
>> end
Output:
">gi|329299107|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895),
mRNA\nAGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG\n\n"
">gi|329299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895),
mRNA\nGTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG\nCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG\n\n"
">gi|329299107|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895),
mRNA\nCGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAG\nTTAGTCGCTGACGCATGCACG\n"
nil
Done
=> nil
It still returns nil, am I doing what you suggested wrong?
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | jake kaiden <jakekaiden@yahoo.com> |
|---|---|
| Date | 2011-04-28 22:36 -0500 |
| Message-ID | <79209ef84d9aba12fbb512ee6f18e427@ruby-forum.com> |
| In reply to | #3574 |
hi Cee -
copying the text you posted above into the file "0text.txt" and
running this:
f = "0text.txt"
file = File.open(f)
buffer = []
bufferindex = 0
file.each_line(sep=">"){|line|
buffer[bufferindex] = line.chomp
bufferkey+=1
}
p buffer[0]
p buffer[1]
p buffer[2]
p buffer[3]
i get this as output:
#=> ">"
#=> "gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895),
mRNA\\n\nAGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG\\n\n\\n\n>"
#=> "gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895),
mRNA\\n\nGTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG\\n\nCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG\\n\n\\n\n>"
#=> "gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895),
mRNA\\n\nCGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAG\\n\nTTAGTCGCTGACGCATGCACG\\n\n\\n"
does this work for you? you could easily write ways to deal with,
dump, and reset the buffers when they fill up. you can of course also
clean up all the "\n"'s...
i agree with 7stud that using #.pos and #.gets seems like a long walk
off a short pier. i'm pretty green myself, and there are probably
better ways to iterate through the file, but #.each_line(sep=">") works
just fine, and doesn't eat up memory.
- j
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-29 12:50 -0500 |
| Message-ID | <12ea664fa8ebe548db95a756061e6489@ruby-forum.com> |
| In reply to | #3574 |
The first thing everyone in this thread needs to realize is that '>' is
not the separator you want to look for. That's because you don't care
what character marks the beginning of every entry, rather you care what
character marks the end of every entry. The end of every entry is
marked by the string "\n\n", so you should use that has your input line
terminator. Remember, ruby uses "\n" for the input line separator by
default, which means that when you read a file using IO#each, ruby reads
lines--where the end of a line is marked by a newline. However, you can
change the input line separator to the string "\n\n" (or any other
string):
$/ = "\n\n"
Once you have an entry, then you just need to do a little housekeeping
and remove some "\n" characters.
require 'stringio'
str =<<ENDOFSTRING
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG
CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAG
TTAGTCGCTGACGCATGCACG
ENDOFSTRING
input = StringIO.new(str) #Now input is just like a File
input.each(sep = "\n\n") do |para|
buffer = ''
lines = para.split("\n")
buffer << lines.shift << "\n"
lines.each do |line|
buffer << line
end
puts buffer
puts "-" * 20
end
p $/
--output:--
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
--------------------
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
--------------------
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAGTTAGTCGCTGACGCATGCACG
--------------------
"\n"
Note that specifying the new input line separator as an argument to
each() serves to restore the original input line separator once the
block has finished--which is a good thing.
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | Cee Joe <cyril_jose@ymail.com> |
|---|---|
| Date | 2011-04-29 13:32 -0500 |
| Message-ID | <ae08ac328a97a170cbe366bc7fd10a2c@ruby-forum.com> |
| In reply to | #3703 |
7stud -- wrote in post #995821:
> I suggest that people never use irb because it has too many quirks.
>
> The first thing you need to realize is that '>' is
> not the separator you want to look for. That is the second bit of
> erroneous advice your mentor gave you. That's because you don't care
> what character marks the beginning of every entry, rather you care what
> character marks the end of every entry. The end of every entry in your
> file is marked by the string "\n\n", so you should use that as your
> input line terminator. Remember, ruby uses "\n" for the input line
> separator by default, which means that when you read a file using
> IO#each, ruby reads lines--where the end of a line is marked by a
> newline.
I understand the logic, it makes sense. What if the file looked like
this, where there is one newline seperating the entries? :
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG
CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAG
TTAGTCGCTGACGCATGCACG
Would an if-else(regarding"\n" and "\n\n") do the trick? I wanted to
write my code to where it would handle both scenarios. Or maybe:
case
when "\n\n"
<code>
when "\n"
<code>
end
something to that extent? Suggestions?
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
| From | 7stud -- <bbxx789_05ss@yahoo.com> |
|---|---|
| Date | 2011-04-29 17:45 -0500 |
| Message-ID | <43b7442097d0943daf8ee458d7329f8c@ruby-forum.com> |
| In reply to | #3707 |
Cee Joe wrote in post #995830:
> 7stud -- wrote in post #995821:
>> I suggest that people never use irb because it has too many quirks.
>>
>> The first thing you need to realize is that '>' is
>> not the separator you want to look for. That is the second bit of
>> erroneous advice your mentor gave you. That's because you don't care
>> what character marks the beginning of every entry, rather you care what
>> character marks the end of every entry. The end of every entry in your
>> file is marked by the string "\n\n", so you should use that as your
>> input line terminator. Remember, ruby uses "\n" for the input line
>> separator by default, which means that when you read a file using
>> IO#each, ruby reads lines--where the end of a line is marked by a
>> newline.
>
> I understand the logic, it makes sense. What if the file looked like
> this, where there is one newline seperating the entries? :
What if you had presented that possibility from the very beginning?
require 'stringio'
str =<<ENDOFSTRING
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATG
CGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTT
AATAGCGCGCCATCTGAGCAG
TTAGTCGCTGACGCATGCACG
ENDOFSTRING
input = StringIO.new(str)
buffer = ''
input.each do |line|
if line[0, 1] == '>'
if buffer != ''
puts buffer #or do something else to buffer
puts '-' * 20
end
buffer = ''
buffer << line
else
buffer << line.sub(/ \n+ \z /xms, '')
end
end
puts buffer #or do something else to buffer
--output:--
>gi|329295464|ref|NM_2005745.3Acc1| Def1 zgc:65895 (zgc:65895), mRNA
AGCTCGGGGGCTCTAGCGATTTAAGGAGCGATGCGATCGAGCTGACCGTCGCG
--------------------
>gi|456299107|ref|NM_2342343.3Acc2| Def2 zgc:65895 (zgc:65895), mRNA
GTCGCTGGGTCGAAAAGTGGTGCTATATCGCGGCTCGCGTCGATGTCGCGATGCGTGCGCGCGAGAGCGCGCTATGATGAAAGGATGAGAGAG
--------------------
>gi|3542945647|ref|NM_7453343.5Acc3| Def3 zgc:65895 (zgc:65895), mRNA
CGTGCGGGGABCCGTACGTGCCGTGGGGGTTTAATAGCGCGCCATCTGAGCAGTTAGTCGCTGACGCATGCACG
--
Posted via http://www.ruby-forum.com/.
[toc] | [prev] | [next] | [standalone]
Page 1 of 2 [1] 2 Next page →
Back to top | Article view | comp.lang.ruby
csiph-web