Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #29363
| Newsgroups | comp.lang.python |
|---|---|
| Date | 2012-09-16 21:39 -0700 |
| References | <CEE8C35195DB944D9C75ABB15A04193B14E77085@EHKG17P32001A.csfb.cs-group.com> <5056871E.7050206@davea.name> <mailman.818.1347849124.27098.python-list@python.org> <59f8c664-8f11-439e-8002-ca76ee24a632@g7g2000pbh.googlegroups.com> |
| Message-ID | <f0370abf-303a-47b7-81ea-a3d8e4f012bc@googlegroups.com> (permalink) |
| Subject | Re: Python garbage collector/memory manager behaving strangely |
| From | 88888 Dihedral <dihedral88888@googlemail.com> |
alex23於 2012年9月17日星期一UTC+8上午11時25分06秒寫道:
> On Sep 17, 12:32 pm, "Jadhav, Alok" <alok.jad...@credit-suisse.com>
>
> wrote:
>
> > - As you have seen, the line separator is not '\n' but its '|\n'.
>
> > Sometimes the data itself has '\n' characters in the middle of the line
>
> > and only way to find true end of the line is that previous character
>
> > should be a bar '|'. I was not able specify end of line using
>
> > readlines() function, but I could do it using split() function.
>
> > (One hack would be to readlines and combine them until I find '|\n'. is
>
> > there a cleaner way to do this?)
>
>
>
> You can use a generator to take care of your readlines requirements:
>
>
>
> def readlines(f):
>
> lines = []
>
> while "f is not empty":
>
> line = f.readline()
>
> if not line: break
>
> if len(line) > 2 and line[-2:] == '|\n':
>
> lines.append(line)
>
> yield ''.join(lines)
>
> lines = []
>
> else:
>
> lines.append(line)
>
>
>
> > - Reading whole file at once and processing line by line was must
>
> > faster. Though speed is not of very important issue here but I think the
>
> > tie it took to parse complete file was reduced to one third of original
>
> > time.
>
>
>
> With the readlines generator above, it'll read lines from the file
>
> until it has a complete "line" by your requirement, at which point
>
> it'll yield it. If you don't need the entire file in memory for the
>
> end result, you'll be able to process each "line" one at a time and
>
> perform whatever you need against it before asking for the next.
>
>
>
> with open(u'infile.txt','r') as infile:
>
> for line in readlines(infile):
>
> ...
>
>
>
> Generators are a very efficient way of processing large amounts of
>
> data. You can chain them together very easily:
>
>
>
> real_lines = readlines(infile)
>
> marker_lines = (l for l in real_lines if l.startswith('#'))
>
> every_second_marker = (l for i,l in enumerate(marker_lines) if (i
>
> +1) % 2 == 0)
>
> map(some_function, every_second_marker)
>
>
>
> The real_lines generator returns your definition of a line. The
>
> marker_lines generator filters out everything that doesn't start with
>
> #, while every_second_marker returns only half of those. (Yes, these
>
> could all be written as a single generator, but this is very useful
>
> for more complex pipelines).
>
>
>
> The big advantage of this approach is that nothing is read from the
>
> file into memory until map is called, and given the way they're
>
> chained together, only one of your lines should be in memory at any
>
> given time.
The basic problem is whether the output items really need
all lines of the input text file to be buffered to
produce the results.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
RE: Python garbage collector/memory manager behaving strangely "Jadhav, Alok" <alok.jadhav@credit-suisse.com> - 2012-09-17 10:28 +0800
Re: Python garbage collector/memory manager behaving strangely alex23 <wuwei23@gmail.com> - 2012-09-16 20:25 -0700
Re: Python garbage collector/memory manager behaving strangely 88888 Dihedral <dihedral88888@googlemail.com> - 2012-09-16 21:39 -0700
Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-17 06:46 -0400
Re: Python garbage collector/memory manager behaving strangely Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-17 11:47 +0000
Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-17 08:03 -0400
Re: Python garbage collector/memory manager behaving strangely aahz@pythoncraft.com (Aahz) - 2012-11-14 06:19 -0800
Re: Python garbage collector/memory manager behaving strangely Dieter Maurer <dieter@handshake.de> - 2012-11-15 08:31 +0100
RE: Python garbage collector/memory manager behaving strangely "Jadhav, Alok" <alok.jadhav@credit-suisse.com> - 2012-09-17 19:00 +0800
Re: Python garbage collector/memory manager behaving strangely Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-11-15 12:20 +0100
csiph-web