Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #76618

Re: efficient partial sort in Python ?

References <51dfbe9b-f6e0-4532-bc2d-e7ce2fc282b5@googlegroups.com> <mailman.13116.1408398154.18130.python-list@python.org> <3fb3b4d1-a7e2-4912-a878-7d5e1798aee6@googlegroups.com>
Date 2014-08-19 16:05 -0700
Subject Re: efficient partial sort in Python ?
From Dan Stromberg <drsalists@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.13175.1408489553.18130.python-list@python.org> (permalink)

Show all headers | View raw


On Tue, Aug 19, 2014 at 12:37 PM, Chiu Hsiang Hsu <wdv4758h@gmail.com> wrote:
> On Tuesday, August 19, 2014 5:42:27 AM UTC+8, Dan Stromberg wrote:
>> On Mon, Aug 18, 2014 at 10:18 AM, Chiu Hsiang Hsu <wdv4758h@gmail.com> wrote:
>>
>> > I know that Python use Timsort as default sorting algorithm and it is efficient,
>>
>> > but I just wanna have a partial sorting (n-largest/smallest elements).
>>
>>
>>
>> Perhaps heapq with Pypy?  Or with nuitka?  Or with numba?

> Another problem with heapq is the memory usage, it cost a lot of more memory with heapq in CPython (I test it in 3.4 with 1000000 float numbers) compare to sorted.

This surprises me.  I believe heapq probably keeps values in a python
list with no extra references, by making node i's left child and right
child be array elements 2*i and 2*i+1, respectively.

A heap of some sort probably is best algorithmically.  You're probably
just up against a high constant.  On the other hand, there are many
kinds of heaps.

> For curiosity, there are many speed up solution in Python (like Cython, PyPy), I hasn't use Cython before,
> I guess PyPy is a more convient way to speed up current Python code (?),
> so how does Cython compare to PyPy ? (speed, code, flexibility, or anything else)

PyPy is really fast for CPU-intensive workloads, but CPython is better for I/O.

I tested a single CPU-intensive microbenchmark of Cython and PyPy
(also Jython and CPython).  PyPy was fastest
(http://stromberg.dnsalias.org/~strombrg/backshift/documentation/performance/index.html).

I haven't yet compared numba or nuitka or Shedskin.

When you use heapq, are you putting all the values in the heap, or
just up to n at a time (evicting the worst value, one at a time as you
go)?  If you're doing the former, it's basically a heapsort which
probably won't beat timsort.  If you're doing the latter, that should
be pretty good.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

efficient partial sort in Python ? Chiu Hsiang Hsu <wdv4758h@gmail.com> - 2014-08-18 10:18 -0700
  Re: efficient partial sort in Python ? Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-18 12:48 -0600
  Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-18 14:42 -0700
    Re: efficient partial sort in Python ? Chiu Hsiang Hsu <wdv4758h@gmail.com> - 2014-08-19 12:37 -0700
      Re: efficient partial sort in Python ? Terry Reedy <tjreedy@udel.edu> - 2014-08-19 18:11 -0400
      Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:05 -0700
      Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:10 -0700
      Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:22 -0700
      Re: efficient partial sort in Python ? Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-19 18:00 -0600

csiph-web