Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #76618
| References | <51dfbe9b-f6e0-4532-bc2d-e7ce2fc282b5@googlegroups.com> <mailman.13116.1408398154.18130.python-list@python.org> <3fb3b4d1-a7e2-4912-a878-7d5e1798aee6@googlegroups.com> |
|---|---|
| Date | 2014-08-19 16:05 -0700 |
| Subject | Re: efficient partial sort in Python ? |
| From | Dan Stromberg <drsalists@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.13175.1408489553.18130.python-list@python.org> (permalink) |
On Tue, Aug 19, 2014 at 12:37 PM, Chiu Hsiang Hsu <wdv4758h@gmail.com> wrote: > On Tuesday, August 19, 2014 5:42:27 AM UTC+8, Dan Stromberg wrote: >> On Mon, Aug 18, 2014 at 10:18 AM, Chiu Hsiang Hsu <wdv4758h@gmail.com> wrote: >> >> > I know that Python use Timsort as default sorting algorithm and it is efficient, >> >> > but I just wanna have a partial sorting (n-largest/smallest elements). >> >> >> >> Perhaps heapq with Pypy? Or with nuitka? Or with numba? > Another problem with heapq is the memory usage, it cost a lot of more memory with heapq in CPython (I test it in 3.4 with 1000000 float numbers) compare to sorted. This surprises me. I believe heapq probably keeps values in a python list with no extra references, by making node i's left child and right child be array elements 2*i and 2*i+1, respectively. A heap of some sort probably is best algorithmically. You're probably just up against a high constant. On the other hand, there are many kinds of heaps. > For curiosity, there are many speed up solution in Python (like Cython, PyPy), I hasn't use Cython before, > I guess PyPy is a more convient way to speed up current Python code (?), > so how does Cython compare to PyPy ? (speed, code, flexibility, or anything else) PyPy is really fast for CPU-intensive workloads, but CPython is better for I/O. I tested a single CPU-intensive microbenchmark of Cython and PyPy (also Jython and CPython). PyPy was fastest (http://stromberg.dnsalias.org/~strombrg/backshift/documentation/performance/index.html). I haven't yet compared numba or nuitka or Shedskin. When you use heapq, are you putting all the values in the heap, or just up to n at a time (evicting the worst value, one at a time as you go)? If you're doing the former, it's basically a heapsort which probably won't beat timsort. If you're doing the latter, that should be pretty good.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
efficient partial sort in Python ? Chiu Hsiang Hsu <wdv4758h@gmail.com> - 2014-08-18 10:18 -0700
Re: efficient partial sort in Python ? Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-18 12:48 -0600
Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-18 14:42 -0700
Re: efficient partial sort in Python ? Chiu Hsiang Hsu <wdv4758h@gmail.com> - 2014-08-19 12:37 -0700
Re: efficient partial sort in Python ? Terry Reedy <tjreedy@udel.edu> - 2014-08-19 18:11 -0400
Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:05 -0700
Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:10 -0700
Re: efficient partial sort in Python ? Dan Stromberg <drsalists@gmail.com> - 2014-08-19 16:22 -0700
Re: efficient partial sort in Python ? Ian Kelly <ian.g.kelly@gmail.com> - 2014-08-19 18:00 -0600
csiph-web