Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #29374
| Path | csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <alok.jadhav@credit-suisse.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'subject:Python': 0.05; 'result,': 0.05; 'function,': 0.07; 'line:': 0.07; 'python': 0.09; 'garbage': 0.09; 'generators': 0.09; 'read()': 0.09; 'seen,': 0.09; 'sep': 0.09; 'cc:addr:python- list': 0.10; 'skip:= 70': 0.10; 'def': 0.10; 'called,': 0.16; 'chained': 0.16; 'cleaner': 0.16; 'generator.': 0.16; 'hyperlink': 0.16; 'len(line)': 0.16; 'next.': 0.16; 'skip:[ 60': 0.16; 'wrote:': 0.17; 'byte': 0.17; 'else,': 0.17; 'message-----': 0.17; 'specify': 0.17; 'yield': 0.17; 'hack': 0.18; 'saying': 0.18; 'memory': 0.18; 'changes': 0.20; 'written': 0.20; 'amounts': 0.22; 'parse': 0.22; 'received:198': 0.22; "i'd": 0.22; 'cc:2**0': 0.23; 'ignored.': 0.23; 'monday,': 0.23; 'statement': 0.23; 'to:2**1': 0.23; 'cc:no real name:2**0': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'replace': 0.27; "doesn't": 0.28; 'lines': 0.28; 'went': 0.28; 'subject:/': 0.28; 'behaving': 0.29; 'helpful.': 0.29; 'reduced': 0.29; 'url:mailman': 0.29; 'character': 0.29; 'definition': 0.29; "skip:' 10": 0.30; 'code': 0.31; 'point': 0.31; 'asking': 0.32; 'url:python': 0.32; 'file': 0.32; 'skip:- 10': 0.32; 'could': 0.32; 'url:listinfo': 0.32; 'function.': 0.33; 'third': 0.34; 'thanks': 0.34; 'clear': 0.35; 'whatever': 0.35; 'built-in': 0.35; 'faster': 0.35; 'doing': 0.35; 'pm,': 0.35; 'sometimes': 0.35; 'something': 0.35; 'there': 0.35; 'add': 0.36; 'subject:': 0.36; 'but': 0.36; 'url:org': 0.36; 'characters': 0.36; 'data.': 0.36; 'email addr:python.org': 0.36; 'useful': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'turn': 0.36; 'itself': 0.37; 'uses': 0.37; 'sent:': 0.37; 'previous': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'perform': 0.38; 'from:': 0.38; 'comment': 0.38; 'skip:l 20': 0.38; 'skip:o 20': 0.38; 'nothing': 0.38; 'url:en': 0.38; 'end': 0.40; 'url:mail': 0.40; 'think': 0.40; 'your': 0.60; 'valuable': 0.60; 'map': 0.61; "you'll": 0.62; 'email name:python-list': 0.62; 'more': 0.63; 'here': 0.65; 'middle': 0.66; '(yes,': 0.84; 'extreme,': 0.84; 'tie': 0.84; 'together,': 0.84; 'faster.': 0.91; 'shadow': 0.91; 'angel': 0.93 |
| Content-class | urn:content-classes:message |
| MIME-Version | 1.0 |
| Content-Type | text/plain; charset=US-ASCII |
| Content-Transfer-Encoding | 7bit |
| X-MimeOLE | Produced By Microsoft Exchange V6.5 |
| Subject | RE: Python garbage collector/memory manager behaving strangely |
| Date | Mon, 17 Sep 2012 19:00:46 +0800 |
| In-Reply-To | <5056FF9F.1020305@davea.name> |
| X-MS-Has-Attach | |
| X-MS-TNEF-Correlator | |
| Thread-Topic | Python garbage collector/memory manager behaving strangely |
| Thread-Index | Ac2UwhWSZGTq3DvBSv+ELxGBsZNl3wAAY21w |
| References | <CEE8C35195DB944D9C75ABB15A04193B14E77085@EHKG17P32001A.csfb.cs-group.com><5056871E.7050206@davea.name><mailman.818.1347849124.27098.python-list@python.org><59f8c664-8f11-439e-8002-ca76ee24a632@g7g2000pbh.googlegroups.com> <5056FF9F.1020305@davea.name> |
| From | "Jadhav, Alok" <alok.jadhav@credit-suisse.com> |
| To | <d@davea.name>, "alex23" <wuwei23@gmail.com> |
| X-OriginalArrivalTime | 17 Sep 2012 11:00:50.0623 (UTC) FILETIME=[B2FF6CF0:01CD94C3] |
| Cc | python-list@python.org |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.831.1347879875.27098.python-list@python.org> (permalink) |
| Lines | 100 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1347879876 news.xs4all.nl 6986 [2001:888:2000:d::a6]:54048 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:29374 |
Show key headers only | View raw
Thanks for your valuable inputs. This is very helpful.
-----Original Message-----
From: Python-list
[mailto:python-list-bounces+alok.jadhav=credit-suisse.com@python.org] On
Behalf Of Dave Angel
Sent: Monday, September 17, 2012 6:47 PM
To: alex23
Cc: python-list@python.org
Subject: Re: Python garbage collector/memory manager behaving strangely
On 09/16/2012 11:25 PM, alex23 wrote:
> On Sep 17, 12:32 pm, "Jadhav, Alok" <alok.jad...@credit-suisse.com>
> wrote:
>> - As you have seen, the line separator is not '\n' but its '|\n'.
>> Sometimes the data itself has '\n' characters in the middle of the
line
>> and only way to find true end of the line is that previous character
>> should be a bar '|'. I was not able specify end of line using
>> readlines() function, but I could do it using split() function.
>> (One hack would be to readlines and combine them until I find '|\n'.
is
>> there a cleaner way to do this?)
> You can use a generator to take care of your readlines requirements:
>
> def readlines(f):
> lines = []
> while "f is not empty":
> line = f.readline()
> if not line: break
> if len(line) > 2 and line[-2:] == '|\n':
> lines.append(line)
> yield ''.join(lines)
> lines = []
> else:
> lines.append(line)
There's a few changes I'd make:
I'd change the name to something else, so as not to shadow the built-in,
and to make it clear in caller's code that it's not the built-in one.
I'd replace that compound if statement with
if line.endswith("|\n":
I'd add a comment saying that partial lines at the end of file are
ignored.
>> - Reading whole file at once and processing line by line was must
>> faster. Though speed is not of very important issue here but I think
the
>> tie it took to parse complete file was reduced to one third of
original
>> time.
You don't say what it was faster than. Chances are you went to the
other extreme, of doing a read() of 1 byte at a time. Using Alex's
approach of a generator which in turn uses the readline() generator.
> With the readlines generator above, it'll read lines from the file
> until it has a complete "line" by your requirement, at which point
> it'll yield it. If you don't need the entire file in memory for the
> end result, you'll be able to process each "line" one at a time and
> perform whatever you need against it before asking for the next.
>
> with open(u'infile.txt','r') as infile:
> for line in readlines(infile):
> ...
>
> Generators are a very efficient way of processing large amounts of
> data. You can chain them together very easily:
>
> real_lines = readlines(infile)
> marker_lines = (l for l in real_lines if l.startswith('#'))
> every_second_marker = (l for i,l in enumerate(marker_lines) if (i
> +1) % 2 == 0)
> map(some_function, every_second_marker)
>
> The real_lines generator returns your definition of a line. The
> marker_lines generator filters out everything that doesn't start with
> #, while every_second_marker returns only half of those. (Yes, these
> could all be written as a single generator, but this is very useful
> for more complex pipelines).
>
> The big advantage of this approach is that nothing is read from the
> file into memory until map is called, and given the way they're
> chained together, only one of your lines should be in memory at any
> given time.
--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list
===============================================================================
Please access the attached hyperlink for an important electronic communications disclaimer:
http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html
===============================================================================
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
RE: Python garbage collector/memory manager behaving strangely "Jadhav, Alok" <alok.jadhav@credit-suisse.com> - 2012-09-17 10:28 +0800
Re: Python garbage collector/memory manager behaving strangely alex23 <wuwei23@gmail.com> - 2012-09-16 20:25 -0700
Re: Python garbage collector/memory manager behaving strangely 88888 Dihedral <dihedral88888@googlemail.com> - 2012-09-16 21:39 -0700
Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-17 06:46 -0400
Re: Python garbage collector/memory manager behaving strangely Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-09-17 11:47 +0000
Re: Python garbage collector/memory manager behaving strangely Dave Angel <d@davea.name> - 2012-09-17 08:03 -0400
Re: Python garbage collector/memory manager behaving strangely aahz@pythoncraft.com (Aahz) - 2012-11-14 06:19 -0800
Re: Python garbage collector/memory manager behaving strangely Dieter Maurer <dieter@handshake.de> - 2012-11-15 08:31 +0100
RE: Python garbage collector/memory manager behaving strangely "Jadhav, Alok" <alok.jadhav@credit-suisse.com> - 2012-09-17 19:00 +0800
Re: Python garbage collector/memory manager behaving strangely Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-11-15 12:20 +0100
csiph-web