Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #86364

Re: Bug in timsort!?

References <1cf84559-3a63-4799-a879-ae8e513d387e@googlegroups.com> <CAKJDb-M82fBvFUVf=P65NnVFdcEUjd+932OJQKGQ1mH6E4vwFA@mail.gmail.com> <54ED0B7E.7030801@mrabarnett.plus.com> <CANc-5UzsujX-0X5tLCik+5LP=w5Xwx4N3cd8wxDfG5Ur-E=xiw@mail.gmail.com> <CAPTjJmo=wq-SbqH0RNt4F8RRCy80wCR+KeBPPvoX_TWqwAQ09A@mail.gmail.com>
From Chris Kaynor <ckaynor@zindagigames.com>
Date 2015-02-24 16:38 -0800
Subject Re: Bug in timsort!?
Newsgroups comp.lang.python
Message-ID <mailman.19160.1424824756.18130.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

On Tue, Feb 24, 2015 at 4:07 PM, Chris Angelico <rosuav@gmail.com> wrote:

> On Wed, Feb 25, 2015 at 10:50 AM, Skip Montanaro
> <skip.montanaro@gmail.com> wrote:
> > Even if/when we get to the point where machines can hold an array of
> > 2**49 elements, I suspect people won't be using straight Python to
> > wrangle them.
>
> Looking just at CPython, what is the absolute minimum storage space
> for a massive list like that? My guess is that a 64-bit CPython might
> get as good as eight bytes per element; playing around with smaller
> figures (up to circa a million elements) shows it ranging from 8 to 9
> bytes per element, bottoming out at just a smidge over 8, presumably
> at the moments when there's no slack space (there's a fixed overhead,
> which gets diminishingly significant). So an array of 2**49 elements
> would take 2**52 bytes (4 petabytes) of storage - addressable, to be
> sure, but an awful lot to be sorting.
>
> Would it be sufficient to stick a comment into the source saying "This
> may have problems with lists in excess of 2**49 elements", and leave
> it until that's actually likely to happen?
>

I would agree that it will be quite a while as it stands before the bug
appears, so documenting it MAY be appropriate, as it is likely to be far
enough in the future that a number of unforeseen things may block the bug
from being an issue (TimSort may be replaced by something better, CPython
could die, etc).

That said, from what I understand, their proposed fix would be reasonably
easy to use, and have minimal cost, so it may be worthwhile to use just
because the issue is likely to be in use for quite a while before somebody
remembers the issue exists. Once somebody does hit it, it may take a while
to realize that TimSort is at fault, as it will only occur on some data
sets (my understanding is that the elements must be in certain
configurations with at-least the minimum number), so it will likely appear
as just a random crash occurring.

If it were fully up to me, I would do one of the following, in order of
preference:
1) Implement and test the fix provided on the blog. This requires the most
work due to licensing, reviewing, and testing.
2) Add a noisy warning/error to the code when sorting lists large enough to
potentially hit the error (probably 2**48 to leave some wiggle room). This
is much easier to do, but merely pushes the can down the road to be dealt
with later. It does still make it obvious what is going long when it
finally breaks.
3) Add a comment explaining the issue (likely linking back to the blog).
While by far the easiest "solution", again, this merely pushes the can down
the road, and it may be very non-obvious what went wrong when it breaks,
despite the comment.

Chris

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Bug in timsort!? Roy Smith <roy@panix.com> - 2015-02-24 13:34 -0800
  Re: Bug in timsort!? Zachary Ware <zachary.ware+pylist@gmail.com> - 2015-02-24 15:40 -0600
  Re: Bug in timsort!? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-24 21:49 +0000
    Re: Bug in timsort!? sohcahtoa82@gmail.com - 2015-02-24 14:36 -0800
      Re: Bug in timsort!? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-25 00:38 +0000
  Re: Bug in timsort!? Grant Edwards <invalid@invalid.invalid> - 2015-02-24 22:45 +0000
    Re: Bug in timsort!? Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-24 15:59 -0700
    Re: Bug in timsort!? Robert Kern <robert.kern@gmail.com> - 2015-02-25 11:34 +0000
      Re: Bug in timsort!? Grant Edwards <invalid@invalid.invalid> - 2015-02-25 15:49 +0000
  Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-25 10:33 +1100
  Re: Bug in timsort!? MRAB <python@mrabarnett.plus.com> - 2015-02-24 23:38 +0000
  Re: Bug in timsort!? Skip Montanaro <skip.montanaro@gmail.com> - 2015-02-24 17:50 -0600
  Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-25 11:07 +1100
    Re: Bug in timsort!? Paddy <paddy3118@gmail.com> - 2015-02-25 01:10 -0800
      Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-25 21:11 +1100
  Re: Bug in timsort!? Chris Kaynor <ckaynor@zindagigames.com> - 2015-02-24 16:38 -0800
  Re: Bug in timsort!? Terry Reedy <tjreedy@udel.edu> - 2015-02-24 21:52 -0500
  Re: Bug in timsort!? Dave Angel <davea@davea.name> - 2015-02-24 21:41 -0500
  Re: Bug in timsort!? Ned Deily <nad@acm.org> - 2015-02-25 01:04 -0800
  Re: Bug in timsort!? Sturla Molden <sturla.molden@gmail.com> - 2015-02-25 14:58 +0100
    Re: Bug in timsort!? alister <alister.nospam.ware@ntlworld.com> - 2015-02-25 15:03 +0000
      Re: Bug in timsort!? Zachary Ware <zachary.ware+pylist@gmail.com> - 2015-02-25 09:23 -0600
      Re: Bug in timsort!? Terry Reedy <tjreedy@udel.edu> - 2015-02-25 16:08 -0500
    Re: Bug in timsort!? Roy Smith <roy@panix.com> - 2015-02-25 19:12 -0500
  Re: Bug in timsort!? Jonas Wielicki <jonas@wielicki.name> - 2015-02-25 15:07 +0100
  Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-26 01:33 +1100
  Re: Bug in timsort!? Sturla Molden <sturla.molden@gmail.com> - 2015-02-25 16:05 +0100
  Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-26 02:11 +1100
  Re: Bug in timsort!? Peter Otten <__peter__@web.de> - 2015-02-25 17:04 +0100
    Re: Bug in timsort!? Mario Figueiredo <marfig@gmail.com> - 2015-02-25 18:22 +0100
      Re: Bug in timsort!? Sturla Molden <sturla.molden@gmail.com> - 2015-02-25 19:41 +0100
      Re: Bug in timsort!? Jonas Wielicki <jonas@wielicki.name> - 2015-02-25 19:46 +0100
      Re: Bug in timsort!? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-25 20:07 +0000
  Re: Bug in timsort!? Sturla Molden <sturla.molden@gmail.com> - 2015-02-25 17:44 +0100
    Re: Bug in timsort!? Mario Figueiredo <marfig@gmail.com> - 2015-02-25 18:29 +0100
    Re: Bug in timsort!? Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2015-02-26 10:13 +1100
      Re: Bug in timsort!? Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-25 22:51 -0700
  Re: Bug in timsort!? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-25 16:50 +0000
  Re: Bug in timsort!? Mark Lawrence <breamoreboy@yahoo.co.uk> - 2015-02-25 16:55 +0000
  Re: Bug in timsort!? Chris Kaynor <ckaynor@zindagigames.com> - 2015-02-25 08:56 -0800
  Re: Bug in timsort!? Chris Angelico <rosuav@gmail.com> - 2015-02-26 04:18 +1100
  Re: Bug in timsort!? Peter Otten <__peter__@web.de> - 2015-02-25 18:21 +0100
  Re: Bug in timsort!? Ian Kelly <ian.g.kelly@gmail.com> - 2015-02-25 12:42 -0700
  Re: Bug in timsort!? Serhiy Storchaka <storchaka@gmail.com> - 2015-02-26 09:33 +0200

csiph-web