Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed6.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.004 X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'python,': 0.02; 'subject:Python': 0.05; 'memory.': 0.05; '(of': 0.07; 'ascii': 0.07; 'completeness': 0.07; 'python': 0.09; 'available)': 0.09; 'construed': 0.09; 'friday,': 0.09; 'lines:': 0.09; 'retained': 0.09; 'itself.': 0.11; 'subject:not': 0.11; 'assume': 0.11; 'slightly': 0.15; 'buy,': 0.16; 'expected,': 0.16; 'matlab': 0.16; 'oct': 0.16; 'solicitation': 0.16; 'structure.': 0.16; 'string': 0.17; 'wrote:': 0.17; 'byte': 0.17; 'bytes': 0.17; 'instance': 0.17; 'message-----': 0.17; 'unicode': 0.17; 'thanks,': 0.18; 'memory': 0.18; 'issue.': 0.20; 'to:name:python-list@python.org': 0.20; 'sort': 0.21; 'not,': 0.21; 'object.': 0.22; 'originally': 0.23; 'purposes': 0.23; "haven't": 0.23; 'tried': 0.25; 'least': 0.25; 'header:In-Reply-To:1': 0.25; 'accuracy': 0.27; 'disk': 0.27; 'object,': 0.27; 'reflect': 0.27; 'lines': 0.28; 'noticed': 0.28; "d'aprano": 0.29; 'overhead': 0.29; 'statements': 0.29; 'steven': 0.29; 'time:': 0.29; 'url:mailman': 0.29; "i'm": 0.29; '(including': 0.30; 'that.': 0.30; 'fri,': 0.30; 'law.': 0.30; 'error': 0.30; 'url:python': 0.32; 'file': 0.32; 'skip:- 10': 0.32; 'url:listinfo': 0.32; 'comments': 0.33; 'material': 0.33; 'profit': 0.33; 'substantial': 0.33; 'to:addr:python-list': 0.33; 'languages': 0.33; 'likely': 0.33; 'version': 0.34; 'text': 0.34; 'compared': 0.35; 'similar': 0.35; 'list.': 0.35; 'subject:': 0.36; 'but': 0.36; 'url:org': 0.36; 'data.': 0.36; 'email addr:python.org': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'october': 0.37; 'does': 0.37; 'sent:': 0.37; 'rather': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'from:': 0.38; 'object': 0.38; 'received:10': 0.38; 'advice': 0.39; 'performance': 0.39; 'to:addr:python.org': 0.39; 'takes': 0.39; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'url:mail': 0.40; 'lower': 0.61; 'materials': 0.61; 'received:network': 0.61; 'received:216': 0.62; 'interest': 0.62; 'time,': 0.62; 'email name:python-list': 0.62; 'necessarily': 0.63; 'different': 0.63; 'information': 0.63; 'more': 0.63; 'offering': 0.64; 'within': 0.64; 'gave': 0.65; 'results': 0.65; 'management': 0.65; 'notified': 0.65; 'offer': 0.65; 'subject': 0.66; 'reliance': 0.66; 'investment': 0.67; 'fund': 0.67; 'income': 0.67; 'contact': 0.68; 'physical': 0.69; 'funds': 0.78; 'disclosure,': 0.78; 'comparable': 0.84; 'confirms': 0.91; 'investor': 0.93; 'subject:available': 0.96 X-Env-Sender: pradipto.banerjee@adainvestments.com X-Msg-Ref: server-8.tower-209.messagelabs.com!1350828958!11018436!1 X-Originating-IP: [216.166.12.97] X-StarScan-Received: X-StarScan-Version: 6.6.1.3; banners=-,-,- X-VirusChecked: Checked From: Pradipto Banerjee To: "python-list@python.org" Date: Sun, 21 Oct 2012 09:14:59 -0500 Subject: RE: Python does not take up available physical memory Thread-Topic: Python does not take up available physical memory Thread-Index: Ac2uR6BsEFqQ3hYVSECojHGC4knIXgBTfATA References: <76D03718A3233B4C8CC236C169B535B5A23E063E45@AUSP01VMBX08.collaborationhost.net> <5081d04b$0$30003$c3e8da3$5496439d@news.astraweb.com> In-Reply-To: <5081d04b$0$30003$c3e8da3$5496439d@news.astraweb.com> Accept-Language: en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: acceptlanguage: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 91 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1350828968 news.xs4all.nl 6929 [2001:888:2000:d::a6]:43077 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:31852 I tried this on a different PC with 12 GB RAM. As expected, this time, read= ing the data was no issue. I noticed that for large files, Python takes up = 2.5x size in memory compared to size on disk, for the case when each line i= n the file is retained as a string within a Python list. As an anecdote, fo= r MATLAB, the similar overhead is 2x, slightly lower than Python, and each = line in the file was retained as string within a MATLAB cell. I'm curious, = has any one compared the overhead of data in memory for other languages lik= e for instance Ruby? -----Original Message----- From: Python-list [mailto:python-list-bounces+pradipto.banerjee=3Dadainvest= ments.com@python.org] On Behalf Of Steven D'Aprano Sent: Friday, October 19, 2012 6:12 PM To: python-list@python.org Subject: Re: Python does not take up available physical memory On Fri, 19 Oct 2012 14:03:37 -0500, Pradipto Banerjee wrote: > Thanks, I tried that. Still got MemoryError, but at least this time > python tried to use the physical memory. What I noticed is that before > it gave me the error it used up to 1.5GB (of the 2.23 GB originally > showed as available) - so in general, python takes up more memory than > the size of the file itself. Well of course it does. Once you read the data into memory, it has its own overhead for the object structure. You haven't told us what the file is or how you are reading it. I'm going to assume it is ASCII text and you are using Python 2. py> open("test file", "w").write("abcde") py> os.stat("test file").st_size 5L py> text =3D open("test file", "r").read() py> len(text) 5 py> sys.getsizeof(text) 26 So that confirms that a five byte ASCII string takes up five bytes on disk but 26 bytes in memory as an object. That overhead will depend on what sort of object, whether Unicode or not, the version of Python, and how you read the data. In general, if you have a huge amount of data to work with, you should try to work with it one line at a time: for line in open("some file"): process(line) rather than reading the whole file into memory at once: lines =3D open("some file").readlines() for line in lines: process(line) -- Steven -- http://mail.python.org/mailman/listinfo/python-list This communication is for informational purposes only. It is not intended = to be, nor should it be construed or used as, financial, legal, tax or inve= stment advice or an offer to sell, or a solicitation of any offer to buy, a= n interest in any fund advised by Ada Investment Management LP, the Investm= ent advisor. Any offer or solicitation of an investment in any of the Fund= s may be made only by delivery of such Funds confidential offering material= s to authorized prospective investors. An investment in any of the Funds i= s not suitable for all investors. No representation is made that the Funds= will or are likely to achieve their objectives, or that any investor will = or is likely to achieve results comparable to those shown, or will make any= profit at all or will be able to avoid incurring substantial losses. Perf= ormance results are net of applicable fees, are unaudited and reflect reinv= estment of income and profits. Past performance is no guarantee of future = results. All financial data and other information are not warranted as to c= ompleteness or accuracy and are subject to change without notice. Any comments or statements made herein do not necessarily reflect those of = Ada Investment Management LP and its affiliates. This transmission may cont= ain information that is confidential, legally privileged, and/or exempt fro= m disclosure under applicable law. If you are not the intended recipient, y= ou are hereby notified that any disclosure, copying, distribution, or use o= f the information contained herein (including any reliance thereon) is stri= ctly prohibited. If you received this transmission in error, please immedia= tely contact the sender and destroy the material in its entirety, whether i= n electronic or hard copy format.