Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder4.news.weretis.net!ecngs!feeder2.ecngs.de!newsfeed.freenet.ag!news2.euro.net!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'method.': 0.05; 'next,': 0.07; 'python': 0.09; "'w',": 0.09; 'attribute.': 0.09; 'encode': 0.09; 'skip:[ 30': 0.09; 'skip:[ 40': 0.09; 'cc:addr:python-list': 0.10; 'def': 0.10; 'extension': 0.13; 'result.': 0.15; "'0',": 0.16; "'a',": 0.16; "'b',": 0.16; "'c',": 0.16; "'d',": 0.16; "'e',": 0.16; "'o',": 0.16; "'r',": 0.16; "'z')": 0.16; 'alphabet': 0.16; 'made-up': 0.16; 'ordinals': 0.16; 'roy': 0.16; 'sorted()': 0.16; 'wrote:': 0.17; 'changes': 0.20; 'email addr:gmail.com>': 0.20; 'sort': 0.21; 'all,': 0.21; 'import': 0.21; 'back.': 0.22; 'cc:2**0': 0.23; '>': 0.23; 'specified': 0.23; 'random': 0.24; 'cc:addr:python.org': 0.25; 'header:In- Reply-To:1': 0.25; 'skip:[ 10': 0.26; 'skip:& 60': 0.27; 'skip:e 30': 0.27; 'message-id:@mail.gmail.com': 0.27; 'this?': 0.28; 'index,': 0.29; 'character': 0.29; 'convert': 0.29; '8bit%:5': 0.29; 'skip:& 10': 0.29; "skip:' 10": 0.30; 'stuff': 0.30; 'lists': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'december': 0.32; 'quickly': 0.32; 'skip:& 20': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'needed': 0.35; 'faster': 0.35; 'sequence': 0.35; 'received:209.85': 0.35; 'something': 0.35; '(i.e.': 0.36; 'skip:{ 10': 0.36; 'skip:p 20': 0.36; 'skip:t 40': 0.37; 'does': 0.37; '(for': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'skip:" 10': 0.40; 'header:Received:5': 0.40; 'end': 0.40; 'your': 0.60; 'back': 0.62; 'smith': 0.71; 'article': 0.78; "'2',": 0.84; "'3',": 0.84; '==>': 0.84; 'order:': 0.84; '\xa0one': 0.84 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=uPC+waMzdjcSBXD9aoRsfF5uz3VBV94YhBVtYJ9K0M0=; b=LSfNiNrUIP9oWk9xfyBgJuIpIhsljpkCLl/vE7NsDxgC/smUz1M5o8T9MKHtliHQVI tkKW1JmRIXusbTu/c3kk7axvcc/nHE4LlItvhJq7DrIyjORH9fa10XSJxdJYqQ+CxQxk V6D0i11MweUZQzt+UCRnuh/qyeKSeeTPA/pudDW/NG9CrQ2XM9Xz6ef+KIFjDm0ynVu7 lQkNKYrzrUCLoJVPWqN3066Ag57eNWAM1nlQ4fnPjNq/NWC5jk8Own20iyY7m/jtKnU9 E6t7KdzU5rf2FrhCPATtueeBT2Ii0Vuh3b4SybhnosGvIagGC+c+lK7do8v37NFn9bI8 LZvA== MIME-Version: 1.0 In-Reply-To: References: From: Joshua Landau Date: Mon, 24 Dec 2012 18:12:43 +0000 Subject: Re: Custom alphabetical sort To: Roy Smith Content-Type: multipart/alternative; boundary=047d7b343e3c23f51a04d19d2745 Cc: python-list X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 300 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356372812 news.xs4all.nl 6989 [2001:888:2000:d::a6]:54217 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35477 --047d7b343e3c23f51a04d19d2745 Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable On 24 December 2012 16:18, Roy Smith wrote: > In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>, > Pander Musubi wrote: > > > Hi all, > > > > I would like to sort according to this order: > > > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', > 'a', > > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', > 'C', > > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', > 'f', > > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', > '?', > > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', > '?', > > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', > 'R', > > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', > 'v', > > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > > > How can I do this? The default sorted() does not give the desired resul= t. > Given all that, I would start by writing some code which turned your > alphabet into a pair of dicts. One maps from the code point to a > collating sequence number (i.e. ordinals), the other maps back. > Something like (for python 2.7): > > alphabet =3D (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', > '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?', > [...] > 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') > > map1 =3D {c: n for n, c in enumerate(alphabet)} > map2 =3D {n: c for n, c in enumerate(alphabet)} > > Next, I would write some functions which encode your strings as lists of > ordinals (and back again) > > def encode(s): > "encode('foo') =3D=3D> [34, 19, 19]" # made-up ordinals > return [map1[c] for c in s] > > def decode(l): > "decode([34, 19, 19]) =3D=3D> 'foo'" > return ''.join(map2[i] for i in l) > > Use these to convert your strings to lists of ints which will sort as > per your specified collating order, and then back again: > > encoded_strings =3D [encode(s) for s in original_list] > encoded_strings.sort() > sorted_strings =3D [decode(l) for l in encoded_strings] > This isn't needed and the not-so-new way to do this is through .sort's key attribute. encoded_strings =3D [encode(s) for s in original_list] encoded_strings.sort() sorted_strings =3D [decode(l) for l in encoded_strings] changes to encoded_strings.sort(key=3Dencode) [Which happens to be faster ] Hence you neither need map2 or decode: ## CODE ## alphabet =3D ( ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', '=E4', '=C4', '=E1', '=C1', '=E2', '=C2', '=E0', '=C0', '=E5', '=C5', 'b', 'B', 'c', 'C', '=E7', '=C7', 'd', 'D', 'e= ', 'E', '=EB', '=CB', '=E9', '=C9', '=EA', '=CA', '=E8', '=C8', 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', '=EF', '=CF', '=ED', '=CD', '=EE',= '=CE', '=EC', '=CC', 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '=F1', 'N', '=D1', 'o', 'O', '=F6', '=D6', '=F3', '=D3', '= =F4', '=D4', '=F2', '=D2', '=F8', '=D8', 'p', 'P', 'q', 'Q', 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', '=FC', '=DC', '=FA', '=DA', '=FB',= '=DB', '=F9', '=D9', 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z' ) hashindex =3D {character:index for index, character in enumerate(alphabet)} def string2sortlist(string): return [hashindex[s] for s in string] # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested. import random things_to_sort =3D ["".join(random.sample(alphabet, random.randint(4, 6))) for _ in range(200000)] print(things_to_sort[:15]) things_to_sort.sort(key=3Dstring2sortlist) print(things_to_sort[:15]) ## END CODE ## Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to Tomas Bach's method. --047d7b343e3c23f51a04d19d2745 Content-Type: text/html; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable
On 24 December 2012 16:18, Roy Smith <= ;roy@panix.com> wrote:
In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>,=
=A0Pander Musubi <pander.musu= bi@gmail.com> wrote:

> Hi all,
>
> I would like to sort according to this order:
>
> (' ', '.', '\'', '-', '0',= '1', '2', '3', '4', '5', '6= 9;, '7', '8', '9', 'a',
> 'A', '?', '?', '?', '?',= '?', '?', '?', '?', '?', '?= 9;, 'b', 'B', 'c', 'C',
> '?', '?', 'd', 'D', 'e', '= E', '?', '?', '?', '?', '?', &#= 39;?', '?', '?', 'f',
> 'F', 'g', 'G', 'h', 'H', '= i', 'I', '?', '?', '?', '?', &#= 39;?', '?', '?', '?',
> 'j', 'J', 'k', 'K', 'l', '= L', 'm', 'M', 'n', '?', 'N', &#= 39;?', 'o', 'O', '?',
> '?', '?', '?', '?', '?', '= ?', '?', '?', '?', 'p', 'P', &#= 39;q', 'Q', 'r', 'R',
> 's', 'S', 't', 'T', 'u', '= U', '?', '?', '?', '?', '?', &#= 39;?', '?', '?', 'v',
> 'V', 'w', 'W', 'x', = 'X', 'y', 'Y', 'z', 'Z')
>
> How can I do this? The default sorted() does n= ot give the desired result.

<snip>=A0

Given all that, I would start by writing some code which turned your
alphabet into a pair of dicts. =A0One maps from the code point to a
collating sequence number (i.e. ordinals), the other maps back.
Something like (for python 2.7):

alphabet =3D (' ', '.', '\'', '-', '= ;0', '1', '2', '3', '4', '5', =A0 =A0 =A0 =A0 =A0 =A0 '6', '7', '8', '9',= 'a', 'A', '?', '?', '?', '?= 9;,
=A0 =A0 =A0 =A0 =A0 =A0 [...]
=A0 =A0 =A0 =A0 =A0 =A0 'v', 'V', 'w&= #39;, 'W', 'x', 'X', 'y', 'Y', '= ;z', 'Z')

map1 =3D {c: n for n, c in enumerate(alphabet)}
map2 =3D {n: c for n, c in enumerate(alphabet)}

Next, I would write some functions which encode your strings as lists of ordinals (and back again)

def encode(s):
=A0 =A0"encode('foo') =3D=3D> [34, 19, 19]" =A0# made-= up ordinals
=A0 =A0return [map1[c] for c in s]

def decode(l):
=A0 =A0"decode([34, 19, 19]) =3D=3D> 'foo'"
=A0 =A0 return ''.join(map2[i] for i in l)

Use these to convert your strings to lists of ints which will sort as
per your specified collating order, and then back again:

encoded_strings =3D [encode(s) for s in original_list]
encoded_strings.sort()
sorted_strings =3D [decode(l) for l in encoded_strings]

This isn't needed and the not-so-new way to do this is through .sort&#= 39;s key attribute.

encoded_strings =3D [encode(s) for s in original_list]
encoded_strings.s= ort()
sorted_strings =3D [decode(l) for l in encoded_strings]
<= div class=3D"gmail_extra" style>
= changes to

encoded_strings.sort(key=3Dencode)

[Which happens to be faster &= lt;/reasonable_guess>]

Hence you neither need map2 or decode:

## CODE ##

alpha= bet =3D (
' ', '.', '\'', '-', = '0', '1', '2', '3', '4', '5'= ;, '6', '7', '8', '9', 'a', 'A&= #39;, '=E4', '=C4', '=E1', '=C1', '=E2&= #39;, '=C2',
'=E0', '=C0', '=E5', '=C5', 'b',= 'B', 'c', 'C', '=E7', '=C7', '= d', 'D', 'e', 'E', '=EB', '=CB'= , '=E9', '=C9', '=EA', '=CA', '=E8'= , '=C8',
'f', 'F', 'g', 'G', 'h', 'H&= #39;, 'i', 'I', '=EF', '=CF', '=ED'= , '=CD', '=EE', '=CE', '=EC', '=CC'= , 'j', 'J', 'k', 'K', 'l', 'L&#= 39;,
'm', 'M', 'n', '=F1', 'N', '= =D1', 'o', 'O', '=F6', '=D6', '=F3&= #39;, '=D3', '=F4', '=D4', '=F2', '=D2&= #39;, '=F8', '=D8', 'p', 'P', 'q', = 'Q',
'r', 'R', 's', 'S', 't', 'T&= #39;, 'u', 'U', '=FC', '=DC', '=FA'= , '=DA', '=FB', '=DB', '=F9', '=D9'= , 'v', 'V', 'w', 'W', 'x', 'X&#= 39;,
'y', 'Y', 'z', 'Z'
)

hashindex =3D {character:index for index, character in enumerate(alphabet)}=
def string2sortlist(string):
return [has= hindex[s] for s in string]

# Quickly make some stuff to sort. Let's try 200k, as that's what&#= 39;s suggested.
import random
things_to_sort =3D ["".join(random.sample(alph= abet, random.randint(4, 6))) for _ in range(200000)]

print(thing= s_to_sort[:15])

things_to_sort.sort(key=3Dstring2sortlist)

print(things_to_sort[:15])

## END CODE= ##

Not-so-coincidentally, this is exactly the same as Ian Kelly's extensio= n to Tomas Bach's method.
--047d7b343e3c23f51a04d19d2745--