Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #10675

Re: Complex sort on big files

From Peter Otten <__peter__@web.de>
Newsgroups comp.lang.python
Subject Re: Complex sort on big files
Followup-To comp.lang.python
Date 2011-08-01 19:00 +0200
Organization None
Message-ID <j16m24$4q4$1@solani.org> (permalink)
References <062306c6-3ee6-43ce-936f-ca9cc0f013c9@eb9g2000vbb.googlegroups.com>

Followups directed to: comp.lang.python

Show all headers | View raw


aliman wrote:

> Apologies I'm sure this has been asked many times, but I'm trying to
> figure out the most efficient way to do a complex sort on very large
> files.
> 
> I've read the recipe at [1] and understand that the way to sort a
> large file is to break it into chunks, sort each chunk and write
> sorted chunks to disk, then use heapq.merge to combine the chunks as
> you read them.
> 
> What I'm having trouble figuring out is what to do when I want to sort
> by one key ascending then another key descending (a "complex sort").
> 
> I understand that sorts are stable, so I could just repeat the whole
> sort process once for each key in turn, but that would involve going
> to and from disk once for each step in the sort, and I'm wondering if
> there is a better way.
> 
> I also thought you could apply the complex sort to each chunk before
> writing it to disk, so each chunk was completely sorted, but then the
> heapq.merge wouldn't work properly, because afaik you can only give it
> one key.

You can make that key as complex as needed:

>>> class Key(object):
...     def __init__(self, obj):
...             self.asc = obj[1]
...             self.desc = obj[2]
...     def __cmp__(self, other):
...             return cmp(self.asc, other.asc) or -cmp(self.desc, 
other.desc)
...
>>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key)
['aab', 'aaa', 'abc', 'bbb', 'aba']

See also

http://docs.python.org/library/functools.html#functools.total_ordering

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Complex sort on big files aliman <alimanfoo@googlemail.com> - 2011-08-01 08:33 -0700
  Re: Complex sort on big files Peter Otten <__peter__@web.de> - 2011-08-01 19:00 +0200
  Re: Complex sort on big files Alistair Miles <alimanfoo@googlemail.com> - 2011-08-02 11:25 +0100
  Re: python module to determine if a machine is idle/free Chris Rebert <clp2@rebertia.com> - 2011-08-03 21:38 -0700
  Re: Complex sort on big files sturlamolden <sturlamolden@yahoo.no> - 2011-08-05 18:31 -0700
    Re: Complex sort on big files Roy Smith <roy@panix.com> - 2011-08-05 22:54 -0400
      Re: Complex sort on big files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-06 13:30 +1000
  Re: Complex sort on big files sturlamolden <sturlamolden@yahoo.no> - 2011-08-06 10:53 -0700
    Re: Complex sort on big files John Nagle <nagle@animats.com> - 2011-08-09 15:20 -0700

csiph-web