Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #35477

Re: Custom alphabetical sort

Path csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
Return-Path <joshua.landau.ws@gmail.com>
X-Original-To python-list@python.org
Delivered-To python-list@mail.python.org
X-Spam-Status OK 0.000
X-Spam-Evidence '*H*': 1.00; '*S*': 0.00; 'method.': 0.05; 'next,': 0.07; 'python': 0.09; "'w',": 0.09; 'attribute.': 0.09; 'encode': 0.09; 'skip:[ 30': 0.09; 'skip:[ 40': 0.09; 'cc:addr:python-list': 0.10; 'def': 0.10; 'extension': 0.13; 'result.': 0.15; "'0',": 0.16; "'a',": 0.16; "'b',": 0.16; "'c',": 0.16; "'d',": 0.16; "'e',": 0.16; "'o',": 0.16; "'r',": 0.16; "'z')": 0.16; 'alphabet': 0.16; 'made-up': 0.16; 'ordinals': 0.16; 'roy': 0.16; 'sorted()': 0.16; 'wrote:': 0.17; 'changes': 0.20; 'email addr:gmail.com&gt;': 0.20; 'sort': 0.21; 'all,': 0.21; 'import': 0.21; 'back.': 0.22; 'cc:2**0': 0.23; '&gt;': 0.23; 'specified': 0.23; 'random': 0.24; 'cc:addr:python.org': 0.25; 'header:In- Reply-To:1': 0.25; 'skip:[ 10': 0.26; 'skip:& 60': 0.27; 'skip:e 30': 0.27; 'message-id:@mail.gmail.com': 0.27; 'this?': 0.28; 'index,': 0.29; 'character': 0.29; 'convert': 0.29; '8bit%:5': 0.29; 'skip:& 10': 0.29; "skip:' 10": 0.30; 'stuff': 0.30; 'lists': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'december': 0.32; 'quickly': 0.32; 'skip:& 20': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'needed': 0.35; 'faster': 0.35; 'sequence': 0.35; 'received:209.85': 0.35; 'something': 0.35; '(i.e.': 0.36; 'skip:{ 10': 0.36; 'skip:p 20': 0.36; 'skip:t 40': 0.37; 'does': 0.37; '(for': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'end': 0.40; 'your': 0.60; 'back': 0.62; 'smith': 0.71; 'article': 0.78; "'2',": 0.84; "'3',": 0.84; '==&gt;': 0.84; 'order:': 0.84; '\xa0one': 0.84
DKIM-Signature v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=uPC+waMzdjcSBXD9aoRsfF5uz3VBV94YhBVtYJ9K0M0=; b=LSfNiNrUIP9oWk9xfyBgJuIpIhsljpkCLl/vE7NsDxgC/smUz1M5o8T9MKHtliHQVI tkKW1JmRIXusbTu/c3kk7axvcc/nHE4LlItvhJq7DrIyjORH9fa10XSJxdJYqQ+CxQxk V6D0i11MweUZQzt+UCRnuh/qyeKSeeTPA/pudDW/NG9CrQ2XM9Xz6ef+KIFjDm0ynVu7 lQkNKYrzrUCLoJVPWqN3066Ag57eNWAM1nlQ4fnPjNq/NWC5jk8Own20iyY7m/jtKnU9 E6t7KdzU5rf2FrhCPATtueeBT2Ii0Vuh3b4SybhnosGvIagGC+c+lK7do8v37NFn9bI8 LZvA==
MIME-Version 1.0
In-Reply-To <roy-BEEA73.11183724122012@news.panix.com>
References <roy-BEEA73.11183724122012@news.panix.com>
From Joshua Landau <joshua.landau.ws@gmail.com>
Date Mon, 24 Dec 2012 18:12:43 +0000
Subject Re: Custom alphabetical sort
To Roy Smith <roy@panix.com>
Content-Type multipart/alternative; boundary=047d7b343e3c23f51a04d19d2745
Cc python-list <python-list@python.org>
X-BeenThere python-list@python.org
X-Mailman-Version 2.1.15
Precedence list
List-Id General discussion list for the Python programming language <python-list.python.org>
List-Unsubscribe <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive <http://mail.python.org/pipermail/python-list/>
List-Post <mailto:python-list@python.org>
List-Help <mailto:python-list-request@python.org?subject=help>
List-Subscribe <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe>
Newsgroups comp.lang.python
Message-ID <mailman.1262.1356372812.29569.python-list@python.org> (permalink)
Lines 300
NNTP-Posting-Host 2001:888:2000:d::a6
X-Trace 1356372812 news.xs4all.nl 6989 [2001:888:2000:d::a6]:54217
X-Complaints-To abuse@xs4all.nl
Xref csiph.com comp.lang.python:35477

Show key headers only | View raw


[Multipart message — attachments visible in raw view] - view raw

On 24 December 2012 16:18, Roy Smith <roy@panix.com> wrote:

> In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>,
>  Pander Musubi <pander.musubi@gmail.com> wrote:
>
> > Hi all,
> >
> > I would like to sort according to this order:
> >
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',
> 'a',
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c',
> 'C',
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?',
> 'f',
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?',
> '?',
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O',
> '?',
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r',
> 'R',
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?',
> 'v',
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
> >
> > How can I do this? The default sorted() does not give the desired result.
>

<snip>

Given all that, I would start by writing some code which turned your
> alphabet into a pair of dicts.  One maps from the code point to a
> collating sequence number (i.e. ordinals), the other maps back.
> Something like (for python 2.7):
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>             '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>             [...]
>             'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> map1 = {c: n for n, c in enumerate(alphabet)}
> map2 = {n: c for n, c in enumerate(alphabet)}
>
> Next, I would write some functions which encode your strings as lists of
> ordinals (and back again)
>
> def encode(s):
>    "encode('foo') ==> [34, 19, 19]"  # made-up ordinals
>    return [map1[c] for c in s]
>
> def decode(l):
>    "decode([34, 19, 19]) ==> 'foo'"
>     return ''.join(map2[i] for i in l)
>
> Use these to convert your strings to lists of ints which will sort as
> per your specified collating order, and then back again:
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>

This isn't needed and the not-so-new way to do this is through .sort's key
attribute.

encoded_strings = [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings = [decode(l) for l in encoded_strings]

changes to

encoded_strings.sort(key=encode)

[Which happens to be faster </reasonable_guess>]

Hence you neither need map2 or decode:

## CODE ##

alphabet = (
' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë',
'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì',
'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò',
'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù',
'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
 'y', 'Y', 'z', 'Z'
)

hashindex = {character:index for index, character in enumerate(alphabet)}
def string2sortlist(string):
return [hashindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what's
suggested.
import random
things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6)))
for _ in range(200000)]

print(things_to_sort[:15])

things_to_sort.sort(key=string2sortlist)

print(things_to_sort[:15])

## END CODE ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to
Tomas Bach's method.

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 11:18 -0500
  Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 08:40 -0800
    Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 12:40 -0500
      Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 09:53 -0800
      Re: Custom alphabetical sort Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-12-24 18:07 +0000
  Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-24 18:12 +0000
    Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
      Re: Custom alphabetical sort Dave Angel <d@davea.name> - 2012-12-25 01:18 -0500
      Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-27 01:13 +0000
    Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
  Re: Custom alphabetical sort Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-24 22:57 +0000

csiph-web