Path: csiph.com!usenet.pasdenom.info!news.redatomik.org!newsfeed.xs4all.nl!newsfeed3a.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'broken': 0.04; 'elements.': 0.07; 'linux,': 0.07; 'arrays': 0.09; 'bytes,': 0.09; 'fixed,': 0.09; 'integers': 0.09; 'objects,': 0.09; 'worse': 0.09; 'cc:addr:python-list': 0.11; 'python': 0.11; "wouldn't": 0.14; 'amd64': 0.16; 'dwarfed': 0.16; 'from:addr:rosuav': 0.16; 'from:name:chris angelico': 0.16; 'integers,': 0.16; 'mean,': 0.16; 'sorting': 0.16; 'vary.': 0.16; 'elements': 0.16; 'fix': 0.17; 'wrote:': 0.18; 'figures': 0.19; 'thu,': 0.19; 'feb': 0.22; 'cc:addr:python.org': 0.22; 'bytes': 0.24; 'pointer': 0.24; 'stick': 0.24; 'cc:2**0': 0.24; 'sort': 0.25; 'references': 0.26; 'header:In-Reply-To:1': 0.27; 'rest': 0.29; '[1]': 0.29; 'am,': 0.29; 'absolute': 0.30; 'message-id:@mail.gmail.com': 0.30; 'that.': 0.31; 'calculated': 0.31; "they'll": 0.31; 'another': 0.32; 'actual': 0.34; 'could': 0.34; 'but': 0.35; 'received:google.com': 0.35; 'add': 0.35; 'really': 0.36; 'indexed': 0.36; 'ram': 0.36; 'two': 0.37; 'list': 0.37; 'how': 0.40; 'even': 0.60; "you're": 0.61; 'our': 0.64; 'taking': 0.65; '26,': 0.68; 'soon.': 0.71; '2015': 0.84; 'each,': 0.84; 'to:none': 0.92 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:cc :content-type; bh=pvoxzfEYi64cHldpTk8z2GalDebhUhjvbaH4k6W8lwU=; b=UPTRQWFk5cJ1JTwFe5cjQqeNGZVtSj10NoJS30aSw9Y2LhFr+j1H1GLVitKerSwLW/ t94lhubMTEFR2v9mWxA49w9w+Zz24k9bEJwBASgUf06dMJFf40PuxXFn7J54tJosWVTW zR/0CIFkxMcl6M/0Uy7Tkb/OSFobKhSVMm8os8vuIviOqriai1A7mtkfsuVP9LrhthUV WGYsHFrl/5T4yCWcUd76oN6UvXdKO3i8keWsQpbeRTEFknVZiP3vgAKRl3C8TFJBPo7e ALmHJMm6O07O7WB9RwjSXDug4W0GCf/zpeYvSlFbPpHTQLIrioGbOPqfDIrevLOgFfYr f8qA== MIME-Version: 1.0 X-Received: by 10.107.160.212 with SMTP id j203mr4815320ioe.43.1424874835037; Wed, 25 Feb 2015 06:33:55 -0800 (PST) In-Reply-To: References: <1cf84559-3a63-4799-a879-ae8e513d387e@googlegroups.com> Date: Thu, 26 Feb 2015 01:33:54 +1100 Subject: Re: Bug in timsort!? From: Chris Angelico Cc: "python-list@python.org" Content-Type: text/plain; charset=UTF-8 X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 22 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1424874843 news.xs4all.nl 2854 [2001:888:2000:d::a6]:46880 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:86400 On Thu, Feb 26, 2015 at 12:58 AM, Sturla Molden wrote: > This is awful. It is broken for arrays longer than 2**49 elements. With 8 > bytes per PyObject* pointer this is >4096 terabytes of RAM. I don't see how > we can fix this in time. It's even worse than that. Unless you have a list of 2**49 references to the same few objects, you're going to need to have some actual content for each one. The absolute best you could do is to sort integers, which would take 32 bytes each [1]; if you're sorting strings, they'll be 90 bytes each, so the integers are our best bet. So add another *five* powers of two to the RAM requirements. Also, who has that many elements to sort, and wouldn't do better to stick 'em into an indexed database of some description? I mean, good to know that this'll be fixed, but it really isn't likely to come up any time soon. ChrisA [1] Calculated using Python 3.5 amd64 Linux, your figures may vary. Small integers (less than 2**30) take only 28 bytes, but that's dwarfed by the rest of them taking 32.