Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35477

Re: Custom alphabetical sort

References <roy-BEEA73.11183724122012@news.panix.com>
From Joshua Landau <joshua.landau.ws@gmail.com>
Date 2012-12-24 18:12 +0000
Subject Re: Custom alphabetical sort
Newsgroups comp.lang.python
Message-ID <mailman.1262.1356372812.29569.python-list@python.org> (permalink)

Show all headers | View raw


[Multipart message — attachments visible in raw view] - view raw

On 24 December 2012 16:18, Roy Smith <roy@panix.com> wrote:

> In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>,
>  Pander Musubi <pander.musubi@gmail.com> wrote:
>
> > Hi all,
> >
> > I would like to sort according to this order:
> >
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> 'a',
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c',
> 'C',
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?',
> 'f',
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?',
> '?',
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O',
> '?',
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r',
> 'R',
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?',
> 'v',
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> >
> > How can I do this? The default sorted() does not give the desired result.
>

<snip>

Given all that, I would start by writing some code which turned your
> alphabet into a pair of dicts.  One maps from the code point to a
> collating sequence number (i.e. ordinals), the other maps back.
> Something like (for python 2.7):
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>             '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>             [...]
>             'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> map1 = {c: n for n, c in enumerate(alphabet)}
> map2 = {n: c for n, c in enumerate(alphabet)}
>
> Next, I would write some functions which encode your strings as lists of
> ordinals (and back again)
>
> def encode(s):
>    "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
>    return [map1[c] for c in s]
>
> def decode(l):
>    "decode([34, 19, 19]) ==> 'foo'"
>     return ''.join(map2[i] for i in l)
>
> Use these to convert your strings to lists of ints which will sort as
> per your specified collating order, and then back again:
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>

This isn't needed and the not-so-new way to do this is through .sort's key
attribute.

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

changes to

encoded_strings.sort(key=encode)

[Which happens to be faster </reasonable_guess>]

Hence you neither need map2 or decode:

## CODE ##

alphabet = (
' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë',
'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò',
'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù',
'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
 'y', 'Y', 'z', 'Z'
)

hashindex = {character:index for index, character in enumerate(alphabet)}
def string2sortlist(string):
return [hashindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what's
suggested.
import random
things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6)))
for _ in range(200000)]

print(things_to_sort[:15])

things_to_sort.sort(key=string2sortlist)

print(things_to_sort[:15])

## END CODE ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to
Tomas Bach's method.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 11:18 -0500
  Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 08:40 -0800
    Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 12:40 -0500
      Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 09:53 -0800
      Re: Custom alphabetical sort Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-12-24 18:07 +0000
  Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-24 18:12 +0000
    Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
      Re: Custom alphabetical sort Dave Angel <d@davea.name> - 2012-12-25 01:18 -0500
      Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-27 01:13 +0000
    Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
  Re: Custom alphabetical sort Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-24 22:57 +0000

csiph-web