Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #35484
| Path | csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed1.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail |
|---|---|
| Return-Path | <pander.musubi@gmail.com> |
| X-Original-To | python-list@python.org |
| Delivered-To | python-list@mail.python.org |
| X-Spam-Status | OK 0.000 |
| X-Spam-Evidence | '*H*': 1.00; '*S*': 0.00; 'method.': 0.05; 'next,': 0.07; 'python': 0.09; "'w',": 0.09; 'attribute.': 0.09; 'encode': 0.09; 'skip:[ 30': 0.09; 'to:addr:comp.lang.python': 0.09; 'cc:addr:python-list': 0.10; 'def': 0.10; 'extension': 0.13; 'result.': 0.15; "'0',": 0.16; "'a',": 0.16; "'b',": 0.16; "'c',": 0.16; "'d',": 0.16; "'e',": 0.16; "'o',": 0.16; "'r',": 0.16; "'z')": 0.16; '24,': 0.16; 'alphabet': 0.16; 'made-up': 0.16; 'ordinals': 0.16; 'roy': 0.16; 'sorted()': 0.16; 'wrote:': 0.17; 'code.': 0.20; 'changes': 0.20; 'sort': 0.21; 'all,': 0.21; 'import': 0.21; 'back.': 0.22; 'keyerror:': 0.22; '15,': 0.23; 'monday,': 0.23; 'specified': 0.23; 'random': 0.24; 'cc:2**1': 0.24; 'cc:addr:python.org': 0.25; 'header:In-Reply-To:1': 0.25; 'header:User-Agent:1': 0.26; 'skip:[ 10': 0.26; '(most': 0.27; 'prevent': 0.27; 'skip:e 30': 0.27; 'this?': 0.28; 'index,': 0.29; 'python2.7': 0.29; 'character': 0.29; 'convert': 0.29; "skip:' 10": 0.30; 'stuff': 0.30; 'lists': 0.31; 'code': 0.31; 'point': 0.31; '(and': 0.32; 'december': 0.32; 'file': 0.32; 'quickly': 0.32; 'traceback': 0.33; 'that,': 0.34; 'received:google.com': 0.34; 'thanks': 0.34; 'needed': 0.35; 'faster': 0.35; 'sequence': 0.35; 'received:209.85': 0.35; 'something': 0.35; '(i.e.': 0.36; 'skip:{ 10': 0.36; 'skip:p 20': 0.36; 'skip:t 40': 0.37; 'does': 0.37; '(for': 0.37; 'received:209': 0.37; 'subject:: ': 0.38; 'some': 0.38; 'end': 0.40; 'your': 0.60; 'back': 0.62; 'smith': 0.71; 'article': 0.78; "'2',": 0.84; "'3',": 0.84; 'order:': 0.84; '\xa0one': 0.84 |
| Newsgroups | comp.lang.python |
| Date | Mon, 24 Dec 2012 15:19:14 -0800 (PST) |
| In-Reply-To | <mailman.1262.1356372812.29569.python-list@python.org> |
| Complaints-To | groups-abuse@google.com |
| Injection-Info | glegroupsg2000goo.googlegroups.com; posting-host=87.212.130.235; posting-account=_qTNOQkAAAAIEjEY1tV1q-dc2n_jsg5V |
| References | <roy-BEEA73.11183724122012@news.panix.com> <mailman.1262.1356372812.29569.python-list@python.org> |
| User-Agent | G2/1.0 |
| X-Google-Web-Client | true |
| X-Google-IP | 87.212.130.235 |
| MIME-Version | 1.0 |
| Subject | Re: Custom alphabetical sort |
| From | Pander Musubi <pander.musubi@gmail.com> |
| To | comp.lang.python@googlegroups.com |
| Content-Type | text/plain; charset=ISO-8859-1 |
| Content-Transfer-Encoding | quoted-printable |
| Cc | python-list <python-list@python.org>, Roy Smith <roy@panix.com> |
| X-BeenThere | python-list@python.org |
| X-Mailman-Version | 2.1.15 |
| Precedence | list |
| List-Id | General discussion list for the Python programming language <python-list.python.org> |
| List-Unsubscribe | <http://mail.python.org/mailman/options/python-list>, <mailto:python-list-request@python.org?subject=unsubscribe> |
| List-Archive | <http://mail.python.org/pipermail/python-list/> |
| List-Post | <mailto:python-list@python.org> |
| List-Help | <mailto:python-list-request@python.org?subject=help> |
| List-Subscribe | <http://mail.python.org/mailman/listinfo/python-list>, <mailto:python-list-request@python.org?subject=subscribe> |
| Message-ID | <mailman.1267.1356391162.29569.python-list@python.org> (permalink) |
| Lines | 247 |
| NNTP-Posting-Host | 2001:888:2000:d::a6 |
| X-Trace | 1356391162 news.xs4all.nl 6929 [2001:888:2000:d::a6]:36643 |
| X-Complaints-To | abuse@xs4all.nl |
| Xref | csiph.com comp.lang.python:35484 |
Show key headers only | View raw
On Monday, December 24, 2012 7:12:43 PM UTC+1, Joshua Landau wrote:
> On 24 December 2012 16:18, Roy Smith <r...@panix.com> wrote:
>
>
>
>
> In article <40d108ec-b019-4829-a969-c8ef513866f1@googlegroups.com>,
>
> Pander Musubi <pander...@gmail.com> wrote:
>
>
>
> > Hi all,
>
>
> >
>
> > I would like to sort according to this order:
>
> >
>
> > (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a',
>
> > 'A', '?', '?', '?', '?', '?', '?', '?', '?', '?', '?', 'b', 'B', 'c', 'C',
>
> > '?', '?', 'd', 'D', 'e', 'E', '?', '?', '?', '?', '?', '?', '?', '?', 'f',
>
> > 'F', 'g', 'G', 'h', 'H', 'i', 'I', '?', '?', '?', '?', '?', '?', '?', '?',
>
> > 'j', 'J', 'k', 'K', 'l', 'L', 'm', 'M', 'n', '?', 'N', '?', 'o', 'O', '?',
>
> > '?', '?', '?', '?', '?', '?', '?', '?', '?', 'p', 'P', 'q', 'Q', 'r', 'R',
>
> > 's', 'S', 't', 'T', 'u', 'U', '?', '?', '?', '?', '?', '?', '?', '?', 'v',
>
>
> > 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
> >
>
>
> > How can I do this? The default sorted() does not give the desired result.
>
>
>
> <snip>
>
>
>
>
> Given all that, I would start by writing some code which turned your
>
> alphabet into a pair of dicts. One maps from the code point to a
>
> collating sequence number (i.e. ordinals), the other maps back.
>
> Something like (for python 2.7):
>
>
>
> alphabet = (' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5',
>
> '6', '7', '8', '9', 'a', 'A', '?', '?', '?', '?',
>
> [...]
>
>
> 'v', 'V', 'w', 'W', 'x', 'X', 'y', 'Y', 'z', 'Z')
>
>
>
> map1 = {c: n for n, c in enumerate(alphabet)}
>
> map2 = {n: c for n, c in enumerate(alphabet)}
>
>
>
> Next, I would write some functions which encode your strings as lists of
>
> ordinals (and back again)
>
>
>
> def encode(s):
>
> "encode('foo') ==> [34, 19, 19]" # made-up ordinals
>
> return [map1[c] for c in s]
>
>
>
> def decode(l):
>
> "decode([34, 19, 19]) ==> 'foo'"
>
> return ''.join(map2[i] for i in l)
>
>
>
> Use these to convert your strings to lists of ints which will sort as
>
> per your specified collating order, and then back again:
>
>
>
> encoded_strings = [encode(s) for s in original_list]
>
> encoded_strings.sort()
>
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> This isn't needed and the not-so-new way to do this is through .sort's key attribute.
>
>
>
>
> encoded_strings = [encode(s) for s in original_list]
> encoded_strings.sort()
> sorted_strings = [decode(l) for l in encoded_strings]
>
>
>
> changes to
>
>
>
>
> encoded_strings.sort(key=encode)
>
>
>
> [Which happens to be faster </reasonable_guess>]
>
>
>
>
> Hence you neither need map2 or decode:
>
>
> ## CODE ##
>
>
>
>
>
> alphabet = (
> ' ', '.', '\'', '-', '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', 'a', 'A', 'ä', 'Ä', 'á', 'Á', 'â', 'Â',
>
>
> 'à', 'À', 'å', 'Å', 'b', 'B', 'c', 'C', 'ç', 'Ç', 'd', 'D', 'e', 'E', 'ë', 'Ë', 'é', 'É', 'ê', 'Ê', 'è', 'È',
>
>
> 'f', 'F', 'g', 'G', 'h', 'H', 'i', 'I', 'ï', 'Ï', 'í', 'Í', 'î', 'Î', 'ì', 'Ì', 'j', 'J', 'k', 'K', 'l', 'L',
>
>
> 'm', 'M', 'n', 'ñ', 'N', 'Ñ', 'o', 'O', 'ö', 'Ö', 'ó', 'Ó', 'ô', 'Ô', 'ò', 'Ò', 'ø', 'Ø', 'p', 'P', 'q', 'Q',
>
>
> 'r', 'R', 's', 'S', 't', 'T', 'u', 'U', 'ü', 'Ü', 'ú', 'Ú', 'û', 'Û', 'ù', 'Ù', 'v', 'V', 'w', 'W', 'x', 'X',
>
>
> 'y', 'Y', 'z', 'Z'
> )
>
>
>
> hashindex = {character:index for index, character in enumerate(alphabet)}
>
> def string2sortlist(string):
> return [hashindex[s] for s in string]
>
>
>
>
> # Quickly make some stuff to sort. Let's try 200k, as that's what's suggested.
> import random
> things_to_sort = ["".join(random.sample(alphabet, random.randint(4, 6))) for _ in range(200000)]
>
>
>
>
> print(things_to_sort[:15])
>
>
> things_to_sort.sort(key=string2sortlist)
>
>
>
>
> print(things_to_sort[:15])
>
>
> ## END CODE ##
>
>
>
>
> Not-so-coincidentally, this is exactly the same as Ian Kelly's extension to Tomas Bach's method.
With Python2.7 I had to use
alphabet = (
u' ', u'.', u'\'', u'-', u'0', u'1', u'2', u'3', u'4', u'5', u'6', u'7', u'8', u'9', u'a', u'A', u'ä', u'Ä', u'á', u'Á', u'â', u'Â',
u'à', u'À', u'å', u'Å', u'b', u'B', u'c', u'C', u'ç', u'Ç', u'd', u'D', u'e', u'E', u'ë', u'Ë', u'é', u'É', u'ê', u'Ê', u'è', u'È',
u'f', u'F', u'g', u'G', u'h', u'H', u'i', u'I', u'ï', u'Ï', u'í', u'Í', u'î', u'Î', u'ì', u'Ì', u'j', u'J', u'k', u'K', u'l', u'L',
u'm', u'M', u'n', u'ñ', u'N', u'Ñ', u'o', u'O', u'ö', u'Ö', u'ó', u'Ó', u'ô', u'Ô', u'ò', u'Ò', u'ø', u'Ø', u'p', u'P', u'q', u'Q',
u'r', u'R', u's', u'S', u't', u'T', u'u', u'U', u'ü', u'Ü', u'ú', u'Ú', u'û', u'Û', u'ù', u'Ù', u'v', u'V', u'w', u'W', u'x', u'X',
u'y', u'Y', u'z', u'Z'
)
to prevent
Traceback (most recent call last):
File "./sort.py", line 23, in <module>
things_to_sort.sort(key=string2sortlist)
File "./sort.py", line 15, in string2sortlist
return [hashindex[s] for s in string]
KeyError: '\xc3'
Thanks very much for this efficient code.
Back to comp.lang.python | Previous | Next — Previous in thread | Next in thread | Find similar | Unroll thread
Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 11:18 -0500
Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 08:40 -0800
Re: Custom alphabetical sort Roy Smith <roy@panix.com> - 2012-12-24 12:40 -0500
Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 09:53 -0800
Re: Custom alphabetical sort Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-12-24 18:07 +0000
Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-24 18:12 +0000
Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
Re: Custom alphabetical sort Dave Angel <d@davea.name> - 2012-12-25 01:18 -0500
Re: Custom alphabetical sort Joshua Landau <joshua.landau.ws@gmail.com> - 2012-12-27 01:13 +0000
Re: Custom alphabetical sort Pander Musubi <pander.musubi@gmail.com> - 2012-12-24 15:19 -0800
Re: Custom alphabetical sort Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-12-24 22:57 +0000
csiph-web