Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'method.': 0.05; 'next,': 0.07; 'python': 0.09; "'w',": 0.09; 'attribute.': 0.09; 'encode': 0.09; 'skip:[ 30': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'def': 0.10; 'extension': 0.13; 'result.': 0.15; "'0',": 0.16; "'a',": 0.16; "'b',": 0.16; "'c',": 0.16; "'d',": 0.16; "'e',": 0.16; "'o',": 0.16; "'r',": 0.16; "'z')": 0.16; '24,': 0.16; 'alphabet': 0.16; 'made-up': 0.16; 'ordinals': 0.16; 'roy': 0.16; 'sorted()': 0.16; 'wrote:': 0.17; 'code.': 0.20; 'changes': 0.20; 'sort': 0.21; 'all,': 0.21; 'import': 0.21; 'back.': 0.22; 'keyerror:': 0.22; '15,': 0.23; 'monday,': 0.23; 'specified': 0.23; 'random': 0.24; 'cc:2**1': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:[ 10': 0.26; '(most': 0.27; 'prevent': 0.27; 'skip:e 30': 0.27; 'this?': 0.28; 'index,': 0.29; 'python2.7': 0.29; 'character': 0.29; 'convert': 0.29; "skip:' 10": 0.30; 'stuff': 0.30; 'lists': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'december': 0.32; 'file': 0.32; 'quickly': 0.32; 'traceback': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'thanks': 0.34; 'needed': 0.35; 'faster': 0.35; 'sequence': 0.35; 'received:209.85': 0.35; 'something': 0.35; '(i.e.': 0.36; 'skip:{ 10': 0.36; 'skip:p 20': 0.36; 'skip:t 40': 0.37; 'does': 0.37; '(for': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'end': 0.40; 'your': 0.60; 'back': 0.62; 'smith': 0.71; 'article': 0.78; "'2',": 0.84; "'3',": 0.84; 'order:': 0.84; '\xa0one': 0.84 Newsgroups: comp.lang.python Date: Mon, 24 Dec 2012 15:19:14 -0800 (PST) In-Reply-To: Complaints-To: groups-abuse@google.com Injection-Info: glegroupsg2000goo.googlegroups.com; posting-host=87.212.130.235; posting-account=_qTNOQkAAAAIEjEY1tV1q-dc2n_jsg5V References: User-Agent: G2/1.0 X-Google-Web-Client: true X-Google-IP: 87.212.130.235 MIME-Version: 1.0 Subject: Re: Custom alphabetical sort From: Pander Musubi To: comp.lang.python@googlegroups.com Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list , Roy Smith X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Message-ID: Lines: 247 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1356391162 news.xs4all.nl 6929 [2001:888:2000:d::a6]:36643 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:35484 On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote: > On 24 December 2012 16:18, Roy Smith wrote: >=20 >=20 >=20 >=20 > In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>, >=20 > =A0Pander Musubi wrote: >=20 >=20 >=20 > > Hi all, >=20 >=20 > > >=20 > > I would like to sort according to this order: >=20 > > >=20 > > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9',= 'a', >=20 > > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', '= C', >=20 > > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', '= f', >=20 > > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '= ?', >=20 > > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '= ?', >=20 > > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', '= R', >=20 > > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', '= v', >=20 >=20 > > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') >=20 > > >=20 >=20 > > How can I do this? The default sorted() does not give the desired resul= t. >=20 >=20 >=20 > =A0 >=20 >=20 >=20 >=20 > Given all that, I would start by writing some code which turned your >=20 > alphabet into a pair of dicts. =A0One maps from the code point to a >=20 > collating sequence number (i.e. ordinals), the other maps back. >=20 > Something like (for python 2.7): >=20 >=20 >=20 > alphabet =3D (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', >=20 > =A0 =A0 =A0 =A0 =A0 =A0 '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?', >=20 > =A0 =A0 =A0 =A0 =A0 =A0 [...] >=20 >=20 > =A0 =A0 =A0 =A0 =A0 =A0 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z') >=20 >=20 >=20 > map1 =3D {c: n for n, c in enumerate(alphabet)} >=20 > map2 =3D {n: c for n, c in enumerate(alphabet)} >=20 >=20 >=20 > Next, I would write some functions which encode your strings as lists of >=20 > ordinals (and back again) >=20 >=20 >=20 > def encode(s): >=20 > =A0 =A0"encode('foo') =3D=3D> [34, 19, 19]" =A0# made-up ordinals >=20 > =A0 =A0return [map1[c] for c in s] >=20 >=20 >=20 > def decode(l): >=20 > =A0 =A0"decode([34, 19, 19]) =3D=3D> 'foo'" >=20 > =A0 =A0 return ''.join(map2[i] for i in l) >=20 >=20 >=20 > Use these to convert your strings to lists of ints which will sort as >=20 > per your specified collating order, and then back again: >=20 >=20 >=20 > encoded_strings =3D [encode(s) for s in original_list] >=20 > encoded_strings.sort() >=20 > sorted_strings =3D [decode(l) for l in encoded_strings] >=20 >=20 >=20 > This isn't needed and the not-so-new way to do this is through .sort's ke= y attribute. >=20 >=20 >=20 >=20 > encoded_strings =3D [encode(s) for s in original_list] > encoded_strings.sort() > sorted_strings =3D [decode(l) for l in encoded_strings] >=20 >=20 >=20 > changes to >=20 >=20 >=20 >=20 > encoded_strings.sort(key=3Dencode) >=20 >=20 >=20 > [Which happens to be faster ] >=20 >=20 >=20 >=20 > Hence you neither need map2 or decode: >=20 >=20 > ## CODE ## >=20 >=20 >=20 >=20 >=20 > alphabet =3D ( > ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '= a', 'A', '=E4', '=C4', '=E1', '=C1', '=E2', '=C2', >=20 >=20 > '=E0', '=C0', '=E5', '=C5', 'b', 'B', 'c', 'C', '=E7', '=C7', 'd', 'D', = 'e', 'E', '=EB', '=CB', '=E9', '=C9', '=EA', '=CA', '=E8', '=C8', >=20 >=20 > 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', '=EF', '=CF', '=ED', '=CD', '=EE= ', '=CE', '=EC', '=CC', 'j', 'J', 'k', 'K', 'l', 'L', >=20 >=20 > 'm', 'M', 'n', '=F1', 'N', '=D1', 'o', 'O', '=F6', '=D6', '=F3', '=D3', = '=F4', '=D4', '=F2', '=D2', '=F8', '=D8', 'p', 'P', 'q', 'Q', >=20 >=20 > 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', '=FC', '=DC', '=FA', '=DA', '=FB= ', '=DB', '=F9', '=D9', 'v', 'V', 'w', 'W', 'x', 'X', >=20 >=20 > 'y', 'Y', 'z', 'Z' > ) >=20 >=20 >=20 > hashindex =3D {character:index for index, character in enumerate(alphabet= )} >=20 > def string2sortlist(string): > return [hashindex[s] for s in string] >=20 >=20 >=20 >=20 > # Quickly make some stuff to sort. Let's try 200k, as that's what's sugge= sted. > import random > things_to_sort =3D ["".join(random.sample(alphabet, random.randint(4, 6))= ) for _ in range(200000)] >=20 >=20 >=20 >=20 > print(things_to_sort[:15]) >=20 >=20 > things_to_sort.sort(key=3Dstring2sortlist) >=20 >=20 >=20 >=20 > print(things_to_sort[:15]) >=20 >=20 > ## END CODE ## >=20 >=20 >=20 >=20 > Not-so-coincidentally, this is exactly the same as Ian Kelly's extension = to Tomas Bach's method. With Python2.7 I had to use alphabet =3D ( u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'= 8', u'9', u'a', u'A', u'=E4', u'=C4', u'=E1', u'=C1', u'=E2', u'=C2', u'=E0', u'=C0', u'=E5', u'=C5', u'b', u'B', u'c', u'C', u'=E7', u'=C7', u'd= ', u'D', u'e', u'E', u'=EB', u'=CB', u'=E9', u'=C9', u'=EA', u'=CA', u'=E8'= , u'=C8', u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'=EF', u'=CF', u'=ED', u'= =CD', u'=EE', u'=CE', u'=EC', u'=CC', u'j', u'J', u'k', u'K', u'l', u'L', u'm', u'M', u'n', u'=F1', u'N', u'=D1', u'o', u'O', u'=F6', u'=D6', u'=F3',= u'=D3', u'=F4', u'=D4', u'=F2', u'=D2', u'=F8', u'=D8', u'p', u'P', u'q', = u'Q', u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'=FC', u'=DC', u'=FA', u'= =DA', u'=FB', u'=DB', u'=F9', u'=D9', u'v', u'V', u'w', u'W', u'x', u'X', u'y', u'Y', u'z', u'Z' ) to prevent Traceback (most recent call last): File "./sort.py", line 23, in things_to_sort.sort(key=3Dstring2sortlist) File "./sort.py", line 15, in string2sortlist return [hashindex[s] for s in string] KeyError: '\xc3' Thanks very much for this efficient code.