Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #19213 > unrolled thread
| Started by | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| First post | 2012-01-22 06:32 -0800 |
| Last post | 2012-01-22 20:55 +0000 |
| Articles | 15 — 6 participants |
Back to article view | Back to comp.lang.python
Splitting a file from specific column content Yigit Turgut <y.turgut@gmail.com> - 2012-01-22 06:32 -0800
Re: Splitting a file from specific column content Roy Smith <roy@panix.com> - 2012-01-22 09:45 -0500
Re: Splitting a file from specific column content Roy Smith <roy@panix.com> - 2012-01-22 14:26 -0500
Re: Splitting a file from specific column content Tim Chase <python.list@tim.thechases.com> - 2012-01-22 13:34 -0600
Re: Splitting a file from specific column content Roy Smith <roy@panix.com> - 2012-01-22 14:37 -0500
Re: Splitting a file from specific column content Yigit Turgut <y.turgut@gmail.com> - 2012-01-22 12:16 -0800
Re: Splitting a file from specific column content MRAB <python@mrabarnett.plus.com> - 2012-01-22 15:19 +0000
Re: Splitting a file from specific column content Arnaud Delobelle <arnodel@gmail.com> - 2012-01-22 15:39 +0000
Re: Splitting a file from specific column content Yigit Turgut <y.turgut@gmail.com> - 2012-01-22 08:17 -0800
Re: Splitting a file from specific column content MRAB <python@mrabarnett.plus.com> - 2012-01-22 16:56 +0000
Re: Splitting a file from specific column content Yigit Turgut <y.turgut@gmail.com> - 2012-01-22 09:47 -0800
Re: Splitting a file from specific column content Eelco <hoogendoorn.eelco@gmail.com> - 2012-01-22 12:43 -0800
Re: Splitting a file from specific column content MRAB <python@mrabarnett.plus.com> - 2012-01-22 16:09 +0000
Re: Splitting a file from specific column content Arnaud Delobelle <arnodel@gmail.com> - 2012-01-22 19:58 +0000
Re: Splitting a file from specific column content MRAB <python@mrabarnett.plus.com> - 2012-01-22 20:55 +0000
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2012-01-22 06:32 -0800 |
| Subject | Splitting a file from specific column content |
| Message-ID | <e1f0636a-195c-4fbb-931a-4d619d5f0d18@g27g2000yqa.googlegroups.com> |
Hi all, I have a text file approximately 20mb in size and contains about one million lines. I was doing some processing on the data but then the data rate increased and it takes very long time to process. I import using numpy.loadtxt, here is a fragment of the data ; 0.000006 -0.0004 0.000071 0.0028 0.000079 0.0044 0.000086 0.0104 . . . First column is the timestamp in seconds and second column is the data. File contains 8seconds of measurement, and I would like to be able to split the file into 3 parts seperated from specific time locations. For example I want to divide the file into 3 parts, first part containing 3 seconds of data, second containing 2 seconds of data and third containing 3 seconds. Splitting based on file size doesn't work that accurately for this specific data, some columns become missing and etc. I need to split depending on the column content ; 1 - read file until first character of column1 is 3 (3 seconds) 2 - save this region to another file 3 - read the file where first characters of column1 are between 3 to 5 (2 seconds) 4 - save this region to another file 5 - read the file where first characters of column1 are between 5 to 5 (3 seconds) 6 - save this region to another file I need to do this exactly because numpy.loadtxt or genfromtxt doesn't get well with missing columns / rows. I even tried the invalidraise parameter of genfromtxt but no luck. I am sure it's a few lines of code for experienced users and I would appreciate some guidance.
[toc] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-01-22 09:45 -0500 |
| Message-ID | <roy-125359.09450822012012@news.panix.com> |
| In reply to | #19213 |
In article <e1f0636a-195c-4fbb-931a-4d619d5f0d18@g27g2000yqa.googlegroups.com>, Yigit Turgut <y.turgut@gmail.com> wrote: > Hi all, > > I have a text file approximately 20mb in size and contains about one > million lines. I was doing some processing on the data but then the > data rate increased and it takes very long time to process. I import > using numpy.loadtxt, here is a fragment of the data ; > > 0.000006 -0.0004 > 0.000071 0.0028 > 0.000079 0.0044 > 0.000086 0.0104 > . > . > . > > First column is the timestamp in seconds and second column is the > data. File contains 8seconds of measurement, and I would like to be > able to split the file into 3 parts seperated from specific time > locations. For example I want to divide the file into 3 parts, first > part containing 3 seconds of data, second containing 2 seconds of data > and third containing 3 seconds. I would do this with standard unix tools: grep '^[012]' input.txt > first-three-seconds.txt grep '^[34]' input.txt > next-two-seconds.txt grep '^[567]' input.txt > next-three-seconds.txt Sure, it makes three passes over the data, but for 20 MB of data, you could have the whole job done in less time than it took me to type this. As a sanity check, I would run "wc -l" on each of the files and confirm that they add up to the original line count.
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-01-22 14:26 -0500 |
| Message-ID | <mailman.4933.1327260402.27778.python-list@python.org> |
| In reply to | #19214 |
I stand humbled. On Jan 22, 2012, at 2:25 PM, Tim Chase wrote: > On 01/22/12 08:45, Roy Smith wrote: >> I would do this with standard unix tools: >> >> grep '^[012]' input.txt> first-three-seconds.txt >> grep '^[34]' input.txt> next-two-seconds.txt >> grep '^[567]' input.txt> next-three-seconds.txt >> >> Sure, it makes three passes over the data, but for 20 MB of data, you >> could have the whole job done in less time than it took me to type this. > > > If you wanted to do it in one pass using standard unix tools, you can use: > > sed -n -e'/^[0-2]/w first-three.txt' -e'/^[34]/w next-two.txt' -e'/^[5-7]/w next-three.txt' > > -tkc > > > -- Roy Smith roy@panix.com
[toc] | [prev] | [next] | [standalone]
| From | Tim Chase <python.list@tim.thechases.com> |
|---|---|
| Date | 2012-01-22 13:34 -0600 |
| Message-ID | <mailman.4934.1327260861.27778.python-list@python.org> |
| In reply to | #19214 |
On 01/22/12 13:26, Roy Smith wrote: >> If you wanted to do it in one pass using standard unix >> tools, you can use: >> >> sed -n -e'/^[0-2]/w first-three.txt' -e'/^[34]/w >> next-two.txt' -e'/^[5-7]/w next-three.txt' >> > I stand humbled. In all likelyhood, you stand *younger*, not so much humbled ;-) -tkc
[toc] | [prev] | [next] | [standalone]
| From | Roy Smith <roy@panix.com> |
|---|---|
| Date | 2012-01-22 14:37 -0500 |
| Message-ID | <mailman.4935.1327261051.27778.python-list@python.org> |
| In reply to | #19214 |
On Jan 22, 2012, at 2:34 PM, Tim Chase wrote: > On 01/22/12 13:26, Roy Smith wrote: >>> If you wanted to do it in one pass using standard unix >>> tools, you can use: >>> >>> sed -n -e'/^[0-2]/w first-three.txt' -e'/^[34]/w >>> next-two.txt' -e'/^[5-7]/w next-three.txt' >>> >> I stand humbled. > > In all likelyhood, you stand *younger*, not so much humbled ;-) Oh, yeah? That must explain my grey hair and bifocals. I go back to Unix v6 in 1977. Humbled it is. -- Roy Smith roy@panix.com
[toc] | [prev] | [next] | [standalone]
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2012-01-22 12:16 -0800 |
| Message-ID | <61ba8425-a793-4141-adf9-212cc01233f5@cf6g2000vbb.googlegroups.com> |
| In reply to | #19230 |
On Jan 22, 9:37 pm, Roy Smith <r...@panix.com> wrote: > On Jan 22, 2012, at 2:34 PM, Tim Chase wrote: > > > On 01/22/12 13:26, Roy Smith wrote: > >>> If you wanted to do it in one pass using standard unix > >>> tools, you can use: > > >>> sed -n -e'/^[0-2]/w first-three.txt' -e'/^[34]/w > >>> next-two.txt' -e'/^[5-7]/w next-three.txt' > > >> I stand humbled. > > > In all likelyhood, you stand *younger*, not so much humbled ;-) > > Oh, yeah? That must explain my grey hair and bifocals. I go back to Unix v6 in 1977. Humbled it is. Those times were much better IMHO (:
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-01-22 15:19 +0000 |
| Message-ID | <mailman.4923.1327245574.27778.python-list@python.org> |
| In reply to | #19213 |
On 22/01/2012 14:32, Yigit Turgut wrote:
> Hi all,
>
> I have a text file approximately 20mb in size and contains about one
> million lines. I was doing some processing on the data but then the
> data rate increased and it takes very long time to process. I import
> using numpy.loadtxt, here is a fragment of the data ;
>
> 0.000006 -0.0004
> 0.000071 0.0028
> 0.000079 0.0044
> 0.000086 0.0104
> .
> .
> .
>
> First column is the timestamp in seconds and second column is the
> data. File contains 8seconds of measurement, and I would like to be
> able to split the file into 3 parts seperated from specific time
> locations. For example I want to divide the file into 3 parts, first
> part containing 3 seconds of data, second containing 2 seconds of data
> and third containing 3 seconds. Splitting based on file size doesn't
> work that accurately for this specific data, some columns become
> missing and etc. I need to split depending on the column content ;
>
> 1 - read file until first character of column1 is 3 (3 seconds)
> 2 - save this region to another file
> 3 - read the file where first characters of column1 are between 3 to
> 5 (2 seconds)
> 4 - save this region to another file
> 5 - read the file where first characters of column1 are between 5 to
> 5 (3 seconds)
> 6 - save this region to another file
>
> I need to do this exactly because numpy.loadtxt or genfromtxt doesn't
> get well with missing columns / rows. I even tried the invalidraise
> parameter of genfromtxt but no luck.
>
> I am sure it's a few lines of code for experienced users and I would
> appreciate some guidance.
>
Here's a solution in Python 3:
input_path = "..."
section_1_path = "..."
section_2_path = "..."
section_3_path = "..."
with open(input_path) as input_file:
try:
line = next(input_file)
# Copy section 1.
with open(section_1_path, "w") as output_file:
while line[0] < "3":
output_file.write(line)
line = next(input_file)
# Copy section 2.
with open(section_2_path, "w") as output_file:
while line[5] < "5":
output_file.write(line)
line = next(input_file)
# Copy section 3.
with open(section_3_path, "w") as output_file:
while True:
output_file.write(line)
line = next(input_file)
except StopIteration:
pass
[toc] | [prev] | [next] | [standalone]
| From | Arnaud Delobelle <arnodel@gmail.com> |
|---|---|
| Date | 2012-01-22 15:39 +0000 |
| Message-ID | <mailman.4924.1327246798.27778.python-list@python.org> |
| In reply to | #19213 |
On 22 January 2012 15:19, MRAB <python@mrabarnett.plus.com> wrote:
> Here's a solution in Python 3:
>
> input_path = "..."
> section_1_path = "..."
> section_2_path = "..."
> section_3_path = "..."
>
> with open(input_path) as input_file:
> try:
> line = next(input_file)
>
> # Copy section 1.
> with open(section_1_path, "w") as output_file:
> while line[0] < "3":
> output_file.write(line)
> line = next(input_file)
>
> # Copy section 2.
> with open(section_2_path, "w") as output_file:
> while line[5] < "5":
> output_file.write(line)
> line = next(input_file)
>
> # Copy section 3.
> with open(section_3_path, "w") as output_file:
> while True:
> output_file.write(line)
> line = next(input_file)
> except StopIteration:
> pass
> --
> http://mail.python.org/mailman/listinfo/python-list
Or more succintly (but not tested):
sections = [
("3", "section_1")
("5", "section_2")
("\xFF", "section_3")
]
with open(input_path) as input_file:
lines = iter(input_file)
for end, path in sections:
with open(path, "w") as output_file:
for line in lines:
if line >= end:
break
output_file.write(line)
--
Arnaud
[toc] | [prev] | [next] | [standalone]
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2012-01-22 08:17 -0800 |
| Message-ID | <849e46d1-b3bb-481d-8a8e-17cb51b0523f@cf6g2000vbb.googlegroups.com> |
| In reply to | #19216 |
On Jan 22, 4:45 pm, Roy Smith <r...@panix.com> wrote:
> In article
> <e1f0636a-195c-4fbb-931a-4d619d5f0...@g27g2000yqa.googlegroups.com>,
> Yigit Turgut <y.tur...@gmail.com> wrote:
> > Hi all,
>
> > I have a text file approximately 20mb in size and contains about one
> > million lines. I was doing some processing on the data but then the
> > data rate increased and it takes very long time to process. I import
> > using numpy.loadtxt, here is a fragment of the data ;
>
> > 0.000006 -0.0004
> > 0.000071 0.0028
> > 0.000079 0.0044
> > 0.000086 0.0104
> > .
> > .
> > .
>
> > First column is the timestamp in seconds and second column is the
> > data. File contains 8seconds of measurement, and I would like to be
> > able to split the file into 3 parts seperated from specific time
> > locations. For example I want to divide the file into 3 parts, first
> > part containing 3 seconds of data, second containing 2 seconds of data
> > and third containing 3 seconds.
>
> I would do this with standard unix tools:
>
> grep '^[012]' input.txt > first-three-seconds.txt
> grep '^[34]' input.txt > next-two-seconds.txt
> grep '^[567]' input.txt > next-three-seconds.txt
>
> Sure, it makes three passes over the data, but for 20 MB of data, you
> could have the whole job done in less time than it took me to type this.
>
> As a sanity check, I would run "wc -l" on each of the files and confirm
> that they add up to the original line count.
This works and is very fast but it missed a few hundred lines
unfortunately.
On Jan 22, 5:19 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 22/01/2012 14:32, Yigit Turgut wrote:
> > Hi all,
>
> > I have a text file approximately 20mb in size and contains about one
> > million lines. I was doing some processing on the data but then the
> > data rate increased and it takes very long time to process. I import
> > using numpy.loadtxt, here is a fragment of the data ;
>
> > 0.000006 -0.0004
> > 0.000071 0.0028
> > 0.000079 0.0044
> > 0.000086 0.0104
> > .
> > .
> > .
>
> > First column is the timestamp in seconds and second column is the
> > data. File contains 8seconds of measurement, and I would like to be
> > able to split the file into 3 parts seperated from specific time
> > locations. For example I want to divide the file into 3 parts, first
> > part containing 3 seconds of data, second containing 2 seconds of data
> > and third containing 3 seconds. Splitting based on file size doesn't
> > work that accurately for this specific data, some columns become
> > missing and etc. I need to split depending on the column content ;
>
> > 1 - read file until first character of column1 is 3 (3 seconds)
> > 2 - save this region to another file
> > 3 - read the file where first characters of column1 are between 3 to
> > 5 (2 seconds)
> > 4 - save this region to another file
> > 5 - read the file where first characters of column1 are between 5 to
> > 5 (3 seconds)
> > 6 - save this region to another file
>
> > I need to do this exactly because numpy.loadtxt or genfromtxt doesn't
> > get well with missing columns / rows. I even tried the invalidraise
> > parameter of genfromtxt but no luck.
>
> > I am sure it's a few lines of code for experienced users and I would
> > appreciate some guidance.
>
> Here's a solution in Python 3:
>
> input_path = "..."
> section_1_path = "..."
> section_2_path = "..."
> section_3_path = "..."
>
> with open(input_path) as input_file:
> try:
> line = next(input_file)
>
> # Copy section 1.
> with open(section_1_path, "w") as output_file:
> while line[0] < "3":
> output_file.write(line)
> line = next(input_file)
>
> # Copy section 2.
> with open(section_2_path, "w") as output_file:
> while line[5] < "5":
> output_file.write(line)
> line = next(input_file)
>
> # Copy section 3.
> with open(section_3_path, "w") as output_file:
> while True:
> output_file.write(line)
> line = next(input_file)
> except StopIteration:
> pass
With the following correction ;
while line[5] < "5":
should be
while line[0] < "5":
This works well.
On Jan 22, 5:39 pm, Arnaud Delobelle <arno...@gmail.com> wrote:
> On 22 January 2012 15:19, MRAB <pyt...@mrabarnett.plus.com> wrote:
> > Here's a solution in Python 3:
>
> > input_path = "..."
> > section_1_path = "..."
> > section_2_path = "..."
> > section_3_path = "..."
>
> > with open(input_path) as input_file:
> > try:
> > line = next(input_file)
>
> > # Copy section 1.
> > with open(section_1_path, "w") as output_file:
> > while line[0] < "3":
> > output_file.write(line)
> > line = next(input_file)
>
> > # Copy section 2.
> > with open(section_2_path, "w") as output_file:
> > while line[5] < "5":
> > output_file.write(line)
> > line = next(input_file)
>
> > # Copy section 3.
> > with open(section_3_path, "w") as output_file:
> > while True:
> > output_file.write(line)
> > line = next(input_file)
> > except StopIteration:
> > pass
> > --
> >http://mail.python.org/mailman/listinfo/python-list
>
> Or more succintly (but not tested):
>
> sections = [
> ("3", "section_1")
> ("5", "section_2")
> ("\xFF", "section_3")
> ]
>
> with open(input_path) as input_file:
> lines = iter(input_file)
> for end, path in sections:
> with open(path, "w") as output_file:
> for line in lines:
> if line >= end:
> break
> output_file.write(line)
>
> --
> Arnaud
Good idea. Especially when dealing with variable numbers of sections.
But somehow I got ;
("5", "section_2")
TypeError: 'tuple' object is not callable
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-01-22 16:56 +0000 |
| Message-ID | <mailman.4928.1327251361.27778.python-list@python.org> |
| In reply to | #19220 |
On 22/01/2012 16:17, Yigit Turgut wrote:
[snip]
> On Jan 22, 5:39 pm, Arnaud Delobelle<arno...@gmail.com> wrote:
[snip]
>> Or more succintly (but not tested):
>>
>> sections = [
>> ("3", "section_1")
>> ("5", "section_2")
>> ("\xFF", "section_3")
>> ]
>>
>> with open(input_path) as input_file:
>> lines = iter(input_file)
>> for end, path in sections:
>> with open(path, "w") as output_file:
>> for line in lines:
>> if line>= end:
>> break
>> output_file.write(line)
>>
>> --
>> Arnaud
>
> Good idea. Especially when dealing with variable numbers of sections.
> But somehow I got ;
>
> ("5", "section_2")
> TypeError: 'tuple' object is not callable
>
That's due to missing commas:
sections = [
("3", "section_1"),
("5", "section_2"),
("\xFF", "section_3")
]
[toc] | [prev] | [next] | [standalone]
| From | Yigit Turgut <y.turgut@gmail.com> |
|---|---|
| Date | 2012-01-22 09:47 -0800 |
| Message-ID | <729dc0f2-6bb8-4331-a13c-1cb5924519e4@o14g2000vbo.googlegroups.com> |
| In reply to | #19223 |
On Jan 22, 6:56 pm, MRAB <pyt...@mrabarnett.plus.com> wrote:
> On 22/01/2012 16:17, Yigit Turgut wrote:
> [snip]
>
>
>
>
>
>
>
> > On Jan 22, 5:39 pm, Arnaud Delobelle<arno...@gmail.com> wrote:
> [snip]
> >> Or more succintly (but not tested):
>
> >> sections = [
> >> ("3", "section_1")
> >> ("5", "section_2")
> >> ("\xFF", "section_3")
> >> ]
>
> >> with open(input_path) as input_file:
> >> lines = iter(input_file)
> >> for end, path in sections:
> >> with open(path, "w") as output_file:
> >> for line in lines:
> >> if line>= end:
> >> break
> >> output_file.write(line)
>
> >> --
> >> Arnaud
>
> > Good idea. Especially when dealing with variable numbers of sections.
> > But somehow I got ;
>
> > ("5", "section_2")
> > TypeError: 'tuple' object is not callable
>
> That's due to missing commas:
>
> sections = [
> ("3", "section_1"),
> ("5", "section_2"),
> ("\xFF", "section_3")
> ]
Thank you.
[toc] | [prev] | [next] | [standalone]
| From | Eelco <hoogendoorn.eelco@gmail.com> |
|---|---|
| Date | 2012-01-22 12:43 -0800 |
| Message-ID | <d3725ab2-2a8f-43fc-9381-7bbba30510ac@k6g2000vbz.googlegroups.com> |
| In reply to | #19216 |
The grep solution is not cross-platform, and not really an answer to a
question about python.
The by-line iteration examples are inefficient and bad practice from a
numpy/vectorization perspective.
I would advice to do it the numpythonic way (untested code):
breakpoints = [3, 5, 7]
data = np.loadtxt('data.txt')
time = data[:,0]
indices = np.searchsorted(time, breakpoints)
chunks = np.split(data, indices, axis=0)
for i, d in enumerate(chunks):
np.savetxt('data'+str(i)+'.txt', d)
Not sure how it compared to the grep solution in terms of performance,
but that should be quite a non-issue for 20mb of data, and its sure to
blow the by-line iteration out of the water. If you want to be more
efficient, you are going to have to cut the text-to-numeric parsing
out of the loop, which is the vast majority of the computational load
here; but if thats possible at all depends on how structured your
timestamps are; there must be a really compelling performance gain to
justify throwing the elegance of the np.split based solution out of
the window, in my opinion.
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-01-22 16:09 +0000 |
| Message-ID | <mailman.4926.1327248722.27778.python-list@python.org> |
| In reply to | #19213 |
On 22/01/2012 15:39, Arnaud Delobelle wrote:
> On 22 January 2012 15:19, MRAB<python@mrabarnett.plus.com> wrote:
>
>> Here's a solution in Python 3:
>>
>> input_path = "..."
>> section_1_path = "..."
>> section_2_path = "..."
>> section_3_path = "..."
>>
>> with open(input_path) as input_file:
>> try:
>> line = next(input_file)
>>
>> # Copy section 1.
>> with open(section_1_path, "w") as output_file:
>> while line[0]< "3":
>> output_file.write(line)
>> line = next(input_file)
>>
>> # Copy section 2.
>> with open(section_2_path, "w") as output_file:
>> while line[5]< "5":
>> output_file.write(line)
>> line = next(input_file)
>>
>> # Copy section 3.
>> with open(section_3_path, "w") as output_file:
>> while True:
>> output_file.write(line)
>> line = next(input_file)
>> except StopIteration:
>> pass
>> --
>> http://mail.python.org/mailman/listinfo/python-list
>
> Or more succintly (but not tested):
>
>
> sections = [
> ("3", "section_1")
> ("5", "section_2")
> ("\xFF", "section_3")
> ]
>
> with open(input_path) as input_file:
> lines = iter(input_file)
> for end, path in sections:
> with open(path, "w") as output_file:
> for line in lines:
> if line>= end:
> break
> output_file.write(line)
>
Consider the condition "line >= end".
If it's true, then control will break out of the inner loop and start
the inner loop again, getting the next line.
But what of the line which caused it to break out? It'll be lost.
[toc] | [prev] | [next] | [standalone]
| From | Arnaud Delobelle <arnodel@gmail.com> |
|---|---|
| Date | 2012-01-22 19:58 +0000 |
| Message-ID | <mailman.4936.1327262289.27778.python-list@python.org> |
| In reply to | #19213 |
On 22 January 2012 16:09, MRAB <python@mrabarnett.plus.com> wrote:
> On 22/01/2012 15:39, Arnaud Delobelle wrote:
[...]
>> Or more succintly (but not tested):
>>
>>
>> sections = [
>> ("3", "section_1")
>> ("5", "section_2")
>> ("\xFF", "section_3")
>> ]
>>
>> with open(input_path) as input_file:
>> lines = iter(input_file)
>> for end, path in sections:
>> with open(path, "w") as output_file:
>> for line in lines:
>> if line>= end:
>> break
>> output_file.write(line)
>>
> Consider the condition "line >= end".
>
> If it's true, then control will break out of the inner loop and start
> the inner loop again, getting the next line.
>
> But what of the line which caused it to break out? It'll be lost.
Of course you're correct - my reply was too rushed. Here's a
hopefully working version (but still untested :).
sections = [
("3", "section_1")
("5", "section_2")
("\xFF", "section_3")
]
with open(input_path) as input_file:
line, lines = "", iter(input_file)
for end, path in sections:
with open(path, "w") as output_file:
output_file.write(line)
for line in lines:
if line >= end:
break
output_file.write(line)
--
Arnaud
[toc] | [prev] | [next] | [standalone]
| From | MRAB <python@mrabarnett.plus.com> |
|---|---|
| Date | 2012-01-22 20:55 +0000 |
| Message-ID | <mailman.4940.1327265707.27778.python-list@python.org> |
| In reply to | #19213 |
On 22/01/2012 19:58, Arnaud Delobelle wrote:
> On 22 January 2012 16:09, MRAB<python@mrabarnett.plus.com> wrote:
>> On 22/01/2012 15:39, Arnaud Delobelle wrote:
> [...]
>>> Or more succintly (but not tested):
>>>
>>>
>>> sections = [
>>> ("3", "section_1")
>>> ("5", "section_2")
>>> ("\xFF", "section_3")
>>> ]
>>>
>>> with open(input_path) as input_file:
>>> lines = iter(input_file)
>>> for end, path in sections:
>>> with open(path, "w") as output_file:
>>> for line in lines:
>>> if line>= end:
>>> break
>>> output_file.write(line)
>>>
>> Consider the condition "line>= end".
>>
>> If it's true, then control will break out of the inner loop and start
>> the inner loop again, getting the next line.
>>
>> But what of the line which caused it to break out? It'll be lost.
>
> Of course you're correct - my reply was too rushed. Here's a
> hopefully working version (but still untested :).
>
> sections = [
> ("3", "section_1")
> ("5", "section_2")
> ("\xFF", "section_3")
> ]
>
[snip]
Missing commas! :-)
[toc] | [prev] | [standalone]
Back to top | Article view | comp.lang.python
csiph-web