Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #10675
| From | Peter Otten <__peter__@web.de> |
|---|---|
| Newsgroups | comp.lang.python |
| Subject | Re: Complex sort on big files |
| Followup-To | comp.lang.python |
| Date | 2011-08-01 19:00 +0200 |
| Organization | None |
| Message-ID | <j16m24$4q4$1@solani.org> (permalink) |
| References | <062306c6-3ee6-43ce-936f-ca9cc0f013c9@eb9g2000vbb.googlegroups.com> |
Followups directed to: comp.lang.python
aliman wrote: > Apologies I'm sure this has been asked many times, but I'm trying to > figure out the most efficient way to do a complex sort on very large > files. > > I've read the recipe at [1] and understand that the way to sort a > large file is to break it into chunks, sort each chunk and write > sorted chunks to disk, then use heapq.merge to combine the chunks as > you read them. > > What I'm having trouble figuring out is what to do when I want to sort > by one key ascending then another key descending (a "complex sort"). > > I understand that sorts are stable, so I could just repeat the whole > sort process once for each key in turn, but that would involve going > to and from disk once for each step in the sort, and I'm wondering if > there is a better way. > > I also thought you could apply the complex sort to each chunk before > writing it to disk, so each chunk was completely sorted, but then the > heapq.merge wouldn't work properly, because afaik you can only give it > one key. You can make that key as complex as needed: >>> class Key(object): ... def __init__(self, obj): ... self.asc = obj[1] ... self.desc = obj[2] ... def __cmp__(self, other): ... return cmp(self.asc, other.asc) or -cmp(self.desc, other.desc) ... >>> sorted(["abc", "aba", "bbb", "aaa", "aab"], key=Key) ['aab', 'aaa', 'abc', 'bbb', 'aba'] See also http://docs.python.org/library/functools.html#functools.total_ordering
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Complex sort on big files aliman <alimanfoo@googlemail.com> - 2011-08-01 08:33 -0700
Re: Complex sort on big files Peter Otten <__peter__@web.de> - 2011-08-01 19:00 +0200
Re: Complex sort on big files Alistair Miles <alimanfoo@googlemail.com> - 2011-08-02 11:25 +0100
Re: python module to determine if a machine is idle/free Chris Rebert <clp2@rebertia.com> - 2011-08-03 21:38 -0700
Re: Complex sort on big files sturlamolden <sturlamolden@yahoo.no> - 2011-08-05 18:31 -0700
Re: Complex sort on big files Roy Smith <roy@panix.com> - 2011-08-05 22:54 -0400
Re: Complex sort on big files Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-08-06 13:30 +1000
Re: Complex sort on big files sturlamolden <sturlamolden@yahoo.no> - 2011-08-06 10:53 -0700
Re: Complex sort on big files John Nagle <nagle@animats.com> - 2011-08-09 15:20 -0700
csiph-web