Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!feeds.phibee-telecom.net!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'memory.': 0.07; 'sys': 0.07; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'python': 0.11; 'jan': 0.12; "'be',": 0.16; "'not',": 0.16; "'or',": 0.16; "'to',": 0.16; '10:00': 0.16; '>on': 0.16; 'message-id:@4ax.com': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'splitting': 0.16; 'words.': 0.16; 'wrote:': 0.18; 'wed,': 0.18; 'bit': 0.19; 'split': 0.19; 'feb': 0.22; '>>>': 0.22; 'import': 0.22; 'url:home': 0.24; 'header:X-Complaints-To:1': 0.27; 'fixed': 0.29; 'chris': 0.29; 'words': 0.29; 'expansion': 0.30; 'skip:( 20': 0.30; '>>>>': 0.31; "d'aprano": 0.31; 'steven': 0.31; 'subject:size': 0.31; 'file': 0.32; 'probably': 0.32; 'guess': 0.33; 'words,': 0.36; 'charset:us-ascii': 0.36; 'list': 0.37; 'received:76': 0.38; 'to:addr:python-list': 0.38; 'list,': 0.38; 'pm,': 0.38; 'rather': 0.38; 'to:addr:python.org': 0.39; 'received:org': 0.40; "you'll": 0.62; 'more': 0.64; '2014,': 0.84 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dennis Lee Bieber Subject: Re: Finding size of Variable Date: Wed, 05 Feb 2014 22:14:53 -0500 Organization: IISS Elusive Unicorn References: <8e4c1ab1-e65d-483f-ad9d-6933ae2052c3@googlegroups.com> <7e7d3200-a4ae-4842-ad8d-68b4435b9006@googlegroups.com> <52f219c5$0$29972$c3e8da3$5496439d@news.astraweb.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: adsl-76-249-23-35.dsl.klmzmi.sbcglobal.net X-Newsreader: Forte Agent 6.00/32.1186 X-No-Archive: YES X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 37 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1391656482 news.xs4all.nl 2870 [2001:888:2000:d::a6]:48402 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:65504 On Wed, 5 Feb 2014 22:44:47 +1100, Chris Angelico declaimed the following: >On Wed, Feb 5, 2014 at 10:00 PM, Steven D'Aprano > wrote: >>> where stopWords.txt is a file of size 4KB >> >> My guess is that if you split a 4K file into words, then put the words >> into a list, you'll probably end up with 6-8K in memory. > >I'd guess rather more; Python strings have a fair bit of fixed >overhead, so with a whole lot of small strings, it will get more >costly. > >>>> sys.version >'3.4.0b2 (v3.4.0b2:ba32913eb13e, Jan 5 2014, 16:23:43) [MSC v.1600 32 >bit (Intel)]' >>>> sys.getsizeof("asdf") >29 > >>> import sys >>> indata = "221B or not to be seeing you again" >>> sys.getsizeof(indata) 67 >>> worddata = indata.split() >>> worddata ['221B', 'or', 'not', 'to', 'be', 'seeing', 'you', 'again'] >>> sys.getsizeof(worddata) + sum(sys.getsizeof(wd) for wd in worddata) 451 That's a 7X expansion for just splitting a single line into a list of words. -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/