Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #32524 > unrolled thread

sort order for strings of digits

Started bydjc <djc@kangoo.invalid>
First post2012-10-31 15:17 +0000
Last post2012-11-01 01:52 -0700
Articles 14 — 10 participants

Back to article view | Back to comp.lang.python


Contents

  sort order for strings of digits djc <djc@kangoo.invalid> - 2012-10-31 15:17 +0000
    Re: sort order for strings of digits Hans Mulder <hansmu@xs4all.nl> - 2012-10-31 16:31 +0100
    Re: sort order for strings of digits Ian Kelly <ian.g.kelly@gmail.com> - 2012-10-31 09:44 -0600
    Re: sort order for strings of digits Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-31 14:17 -0400
    Re: sort order for strings of digits Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-10-31 21:33 +0000
    Re: sort order for strings of digits Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-31 19:05 -0400
      Re: sort order for strings of digits Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-31 23:44 +0000
        Re: sort order for strings of digits Chris Angelico <rosuav@gmail.com> - 2012-11-01 11:53 +1100
          Re: sort order for strings of digits Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-11-02 00:27 +0000
    Re: sort order for strings of digits Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-31 23:09 +0000
      Re: sort order for strings of digits DJC <djc@news.invalid> - 2012-10-31 23:45 +0000
      Re: sort order for strings of digits Arnaud Delobelle <arnodel@gmail.com> - 2012-11-01 00:59 +0000
    Re: sort order for strings of digits Mark Lawrence <breamoreboy@yahoo.co.uk> - 2012-11-01 00:30 +0000
    Re: sort order for strings of digits wxjmfauth@gmail.com - 2012-11-01 01:52 -0700

#32524 — sort order for strings of digits

Fromdjc <djc@kangoo.invalid>
Date2012-10-31 15:17 +0000
Subjectsort order for strings of digits
Message-ID<k6rfdu$rgn$1@news.albasani.net>
I learn lots of useful things from the list, some not always welcome. No 
sooner had I found a solution to a minor inconvenience in my code, than 
a recent thread here drew my attention to the fact that it will not work 
for python 3. So suggestions please:

     TODO 2012-10-22: sort order numbers first then alphanumeric
 >>> n
('1', '10', '101', '3', '40', '31', '13', '2', '2000')
 >>> s
('a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4')

 >>> sorted(n)
['1', '10', '101', '13', '2', '2000', '3', '31', '40']
 >>> sorted(s)
['1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
 >>> sorted(n+s)
['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']



Possibly there is a better way but for Python 2.7 this gives the 
required result

Python 2.7.3 (default, Sep 26 2012, 21:51:14)

 >>> sorted(int(x) if x.isdigit() else x for x in n+s)
[1, 2, 3, 10, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 
'acd', 'b a 4', 'bcd']


[str(x) for x in sorted(int(x) if x.isdigit() else x for x in n+s)]
['1', '2', '3', '10', '13', '31', '40', '101', '2000', '1a', '222 bb', 
'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']


But not for Python 3
Python 3.2.3 (default, Oct 19 2012, 19:53:16)

 >>> sorted(n+s)
['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']

 >>> sorted(int(x) if x.isdigit() else x for x in n+s)
Traceback (most recent call last):
   File "<stdin>", line 1, in <module>
TypeError: unorderable types: str() < int()
 >>>

The best I can think of is to split the input sequence into two lists, 
sort each and then join them.


-- 
djc

[toc] | [next] | [standalone]


#32525

FromHans Mulder <hansmu@xs4all.nl>
Date2012-10-31 16:31 +0100
Message-ID<50914462$0$6982$e4fe514c@news2.news.xs4all.nl>
In reply to#32524
On 31/10/12 16:17:14, djc wrote:
> Python 3.2.3 (default, Oct 19 2012, 19:53:16)
> 
>>>> sorted(n+s)
> ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40',
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
>>>> sorted(int(x) if x.isdigit() else x for x in n+s)
> Traceback (most recent call last):
>   File "<stdin>", line 1, in <module>
> TypeError: unorderable types: str() < int()
>>>>

>>> sorted(n+s, key=lambda x:(x.__class__.__name__, x))
['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40',
'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>>

> The best I can think of is to split the input sequence into two lists,
> sort each and then join them.

That might well be the most readable solution.


Hope this helps,

-- HansM

[toc] | [prev] | [next] | [standalone]


#32526

FromIan Kelly <ian.g.kelly@gmail.com>
Date2012-10-31 09:44 -0600
Message-ID<mailman.3121.1351698283.27098.python-list@python.org>
In reply to#32524
On Wed, Oct 31, 2012 at 9:17 AM, djc <djc@kangoo.invalid> wrote:
> The best I can think of is to split the input sequence into two lists, sort
> each and then join them.

In the example you have given they already seem to be split, so you
could just do:

sorted(n, key=int) + sorted(s)

If that's not really the case, then you could construct (str, int)
tuples as sort keys:

sorted(n+s, key=lambda x: ('', int(x)) if x.isdigit() else (x, -1))

Note that the empty string sorts before all numbers here, which may or
may not be desirable.

[toc] | [prev] | [next] | [standalone]


#32529

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-10-31 14:17 -0400
Message-ID<mailman.3124.1351707476.27098.python-list@python.org>
In reply to#32524
On Wed, 31 Oct 2012 15:17:14 +0000, djc <djc@kangoo.invalid> declaimed
the following in gmane.comp.python.general:

> 
>      TODO 2012-10-22: sort order numbers first then alphanumeric
>  >>> n
> ('1', '10', '101', '3', '40', '31', '13', '2', '2000')
>  >>> s
> ('a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4')
> 
>  >>> sorted(n)
> ['1', '10', '101', '13', '2', '2000', '3', '31', '40']
>  >>> sorted(s)
> ['1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>  >>> sorted(n+s)
> ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>

	Both your subject line, and the above samples are not "sorting
'numbers'"... They are sorting STRINGS that contain values representing
the printable glyphs of decimal digits. 

	The above should work in Python 3 as all the data appears as strings
(probably Unicode in Python 3).

	However, you won't get a "numeric order" for the entries that are
numbers -- the sort is lexicographical {Using 2.7}... Is the requirement
that strings of digits are to be sorted AS integers rather than
lexicographic?

>>> data = [	'1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015',
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data)
['0015', '1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31',
'40', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> data = [	1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015,
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data)
[1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1',
'ab', 'acd', 'b a 4', 'bcd']
>>> #note how "0015" sorted first, but 0015 is an octal value equal to decimal 13
>>> #also note how the string representation put sorted by the first character
>>> #but the mixed list sorted by ascending integer value
>>> 



> 
> 
> Possibly there is a better way but for Python 2.7 this gives the 
> required result
> 
> Python 2.7.3 (default, Sep 26 2012, 21:51:14)
> 
>  >>> sorted(int(x) if x.isdigit() else x for x in n+s)
> [1, 2, 3, 10, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 
> 'acd', 'b a 4', 'bcd']
>
	This, however, is returning a mix of INTEGER and STRING. Is that
what is really wanted? Your input, I presume, was the "all string"
version -- shouldn't the output also be all string? (Okay -- that IS
your next item)

> 
> [str(x) for x in sorted(int(x) if x.isdigit() else x for x in n+s)]
> ['1', '2', '3', '10', '13', '31', '40', '101', '2000', '1a', '222 bb', 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
> 
> But not for Python 3
> Python 3.2.3 (default, Oct 19 2012, 19:53:16)
> 
>  >>> sorted(n+s)
> ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
>  >>> sorted(int(x) if x.isdigit() else x for x in n+s)
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> TypeError: unorderable types: str() < int()
>  >>>
> 
> The best I can think of is to split the input sequence into two lists, 
> sort each and then join them.

	Why -- I doubt Python 3.x .sort() and sorted() have removed the
optional key and cmp keywords.

	Just supply your own comparison function that handles mixed types
and you should be able to replicate the 2.7 process.

	Something like (untested on Python 3.x)

>>> def cmpr(l, r):
... 	if type(l) == type(r):
... 		if l < r: return -1
... 		if l == r: return 0
... 		if l > r: return 1
... 	else:
... 		if type(l) < type(r): return -1
... 		if type(l) > type(r): return 1
... 		# equality is covered above block
... 		
>>> data = [	'1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015',
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data, cmp=cmpr)
['0015', '1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31',
'40', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> data = [	1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015,
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data, cmp=cmpr)
[1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1',
'ab', 'acd', 'b a 4', 'bcd']
>>> 

	You could even expand the cmpr() function to incorporate the
conversion of decimal strings into numbers...

>>> def cmpr(l, r):
... 	myL = l if type(l) == type("") and not l.isdigit() else int(l)
... 	myR = r if type(r) == type("") and not r.isdigit() else int(r)
... 	if type(myL) == type(myR):
... 		if myL < myR: return -1
... 		if myL == myR: return 0
... 		if myL > myR: return 1
... 	else:
... 		if type(myL) < type(myR): return -1
... 		if type(myL) > type(myR): return 1
... 
>>> data = [	'1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015',
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data, cmp=cmpr)
['1', '2', '3', '10', '13', '0015', '31', '40', '101', '2000', '1a',
'222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> data = [	1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015,
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ]
>>> sorted(data, cmp=cmpr)
[1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1',
'ab', 'acd', 'b a 4', 'bcd']
>>> 

	NOTE how the version with all character data has sorted the pure
decimals numerically while still returning them as original strings.
HOWEVER -- this probably needs to be expanded if you might have floating
point STRINGS... Observe:

>>> data = [	'1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015',
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4',
'3.14153', '2.718E0'	]
>>> sorted(data, cmp=cmpr)
['1', '2', '3', '10', '13', '0015', '31', '40', '101', '2000', '1a',
'2.718E0', '222 bb', '3.14153', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> data = [	'1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015',
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4',
3.14153, 2.718E0	]
>>> sorted(data, cmp=cmpr)
['1', '2', 2.718, '3', 3.14153, '10', '13', '0015', '31', '40', '101',
'2000', '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> data = [	1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015,
... 		'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4',
3.14153, 2.718E0]
>>> sorted(data, cmp=cmpr)
[1, 2, 2.718, 3, 3.14153, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb',
'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
>>> 

	"3.14153" and "2.718E0" do not pass the .isdigit() test, so remain
as strings for sort purposes.


-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#32534

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2012-10-31 21:33 +0000
Message-ID<mailman.3127.1351719228.27098.python-list@python.org>
In reply to#32524
On 31/10/2012 18:17, Dennis Lee Bieber wrote:
>
> 	Why -- I doubt Python 3.x .sort() and sorted() have removed the
> optional key and cmp keywords.
>

Nope.  I'm busy porting my own code from 2.7 to 3.3 and cmp seems to be 
very dead.

This doesn't help either.

c:\Users\Mark\Cash\Python>2to3.py
Traceback (most recent call last):
   File "C:\Python33\Tools\Scripts\2to3.py", line 3, in <module>
     from lib2to3.main import main
ImportError: No module named main

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]


#32537

FromDennis Lee Bieber <wlfraed@ix.netcom.com>
Date2012-10-31 19:05 -0400
Message-ID<mailman.3130.1351724723.27098.python-list@python.org>
In reply to#32524
On Wed, 31 Oct 2012 16:24:21 -0600, Ian Kelly <ian.g.kelly@gmail.com>
declaimed the following in gmane.comp.python.general:

> 
> Use functools.cmp_to_key for porting cmp functions.  "sort(x, my_cmp)"
> becomes "sort(x, key=cmp_to_key(my_cmp))"
> 
> The cmp builtin is also gone.  If you need it, the suggested replacement
> for "cmp(a, b)" is "(b < a) - (a < b)".
>
	OUCH... Just another reason for my to hang onto the 2.x series as
long as possible (I only installed 2.7 this summer, I'd been using
2.5... And a project at my former employment was still running 2.3 and
having conflicts with a program with a binary extension written for 2.2
-- and we couldn't find the source for the extension to rebuild it!)
-- 
	Wulfraed                 Dennis Lee Bieber         AF6VN
        wlfraed@ix.netcom.com    HTTP://wlfraed.home.netcom.com/

[toc] | [prev] | [next] | [standalone]


#32539

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-31 23:44 +0000
Message-ID<5091b7cd$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32537
On Wed, 31 Oct 2012 19:05:17 -0400, Dennis Lee Bieber wrote:

>> The cmp builtin is also gone.  If you need it, the suggested
>> replacement for "cmp(a, b)" is "(b < a) - (a < b)".
>>
> 	OUCH... Just another reason for my to hang onto the 2.x series as
> long as possible 

On the contrary. If you are using cmp with sort, your sorts are slow, and 
you should upgrade to using a key function as soon as possible.

For small lists, you may not notice, but for large lists using a 
comparison function is a BAD IDEA.

Here's an example: sorting a list of numbers by absolute value.

py> L = [5, -6, 1, -2, 9, -8, 4, 3, -7, 2, -3]
py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]

But the amount of work done is radically different. Let's temporarily 
shadow the built-ins with patched versions:

py> _abs = abs
py> _abs, _cmp = abs, cmp
py> c1 = c2 = 0
py> def abs(x):
...     global c1
...     c1 += 1
...     return _abs(x)
...
py> def cmp(a, b):
...     global c2
...     c2 += 1
...     return _cmp(a, b)
...

Now we can see just how much work is done under the hood using a key 
function vs a comparison function:

py> sorted(L, key=abs)
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1
11

So the key function is called once for each item in the list. But:


py> c1 = 0  # reset the count
py> sorted(L, lambda a, b: cmp(abs(a), abs(b)))
[1, -2, 2, 3, -3, 4, 5, -6, -7, -8, 9]
py> c1, c2
(54, 27)

The comparison function is called 27 times for a list of nine items (a 
average of 2.5 calls to cmp per item), and abs is called twice for each 
call to cmp. (Well, duh.)

If the list is bigger, it gets worse:

py> c2 = 0
py> x = sorted(L*10, lambda a, b: cmp(abs(a), abs(b)))
py> c2
592

That's an average of 5.4 calls to cmp per item. And it gets even worse as 
the list gets bigger.

As your lists get bigger, the amount of work done calling the comparison 
function gets ever bigger still. Sorting large lists with a comparison 
function is SLOOOW.

py> del abs, cmp  # remove the monkey-patched versions
py> L = L*1000000
py> with Timer():
...     x = sorted(L, key=abs)
...
time taken: 9.165448 seconds
py> with Timer():
...     x = sorted(L, lambda a, b: cmp(abs(a), abs(b)))
...
time taken: 63.579679 seconds


The Timer() context manager used can be found here:

http://code.activestate.com/recipes/577896



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32547

FromChris Angelico <rosuav@gmail.com>
Date2012-11-01 11:53 +1100
Message-ID<mailman.3134.1351731190.27098.python-list@python.org>
In reply to#32539
On Thu, Nov 1, 2012 at 10:44 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On the contrary. If you are using cmp with sort, your sorts are slow, and
> you should upgrade to using a key function as soon as possible.
>

But cmp_to_key doesn't actually improve anything. So I'm not sure how
Py3 has achieved anything; Py2 supported key-based sorting already.

ChrisA

[toc] | [prev] | [next] | [standalone]


#32583

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-11-02 00:27 +0000
Message-ID<50931362$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32547
On Thu, 01 Nov 2012 11:53:06 +1100, Chris Angelico wrote:

> On Thu, Nov 1, 2012 at 10:44 AM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> On the contrary. If you are using cmp with sort, your sorts are slow,
>> and you should upgrade to using a key function as soon as possible.
>>
>>
> But cmp_to_key doesn't actually improve anything. So I'm not sure how
> Py3 has achieved anything; Py2 supported key-based sorting already.

Yes, but there is a lot of old code pre-dating key-based sorts. There's 
also some examples of code where it isn't obvious how to write it as a 
key-based sort, but a comparison function is simple. And people coming 
from other languages that only support comparison-based sorts (C?) will 
probably continue with what they know.

Even though key-based sorting is better, there's a lot of comparison 
sorting that falls under "if it ain't broke, don't fix it".

So even though key-based sorts are better, there are still comparison-
based sorts in the wild. Python 2 has to support them. Python 3, which is 
allowed to break backwards compatibility, does not. So when porting to 3, 
you have to change the sorts.

Most of the time it is simple to convert a comparison-based sort to a key-
based sort. For the cases where you either can't come up with a good key 
function yourself, or were you want to do so mechanically, Python 
provides cmp_to_key.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32538

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-31 23:09 +0000
Message-ID<5091afc0$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32524
On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:

> The best I can think of is to split the input sequence into two lists,
> sort each and then join them.

According to your example code, you don't have to split the input because 
you already have two lists, one filled with numbers and one filled with 
strings.

But I think that what you actually have is a single list of strings, and 
you are supposed to sort the strings such that they come in numeric order 
first, then alphanumerical. E.g.:

['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
=> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

At least that is what I would expect as the useful thing to do when 
sorting.

The trick is to take each string and split it into a leading number and a 
trailing alphanumeric string. Either part may be "empty". Here's a pure 
Python solution:

from sys import maxsize  # use maxint in Python 2
def split(s):
    for i, c in enumerate(s):
        if not c.isdigit():
            break
    else:  # aligned with the FOR, not the IF
        return (int(s), '')
    return (int(s[:i] or maxsize), s[i:])

Now sort using this as a key function:

py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
py> sorted(L, key=split)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']


The above solution is not quite general:

* it doesn't handle negative numbers or numbers with a decimal point;

* it doesn't handle the empty string in any meaningful way;

* in practice, you may or may not want to ignore leading whitespace,
  or trailing whitespace after the number part;

* there's a subtle bug if a string contains a very large numeric prefix,
  finding and fixing that is left as an exercise.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32540

FromDJC <djc@news.invalid>
Date2012-10-31 23:45 +0000
Message-ID<k6sd6j$ie7$1@dont-email.me>
In reply to#32538
On 31/10/12 23:09, Steven D'Aprano wrote:
> On Wed, 31 Oct 2012 15:17:14 +0000, djc wrote:
>
>> The best I can think of is to split the input sequence into two lists,
>> sort each and then join them.
>
> According to your example code, you don't have to split the input because
> you already have two lists, one filled with numbers and one filled with
> strings.

Sorry for the confusion, the pair of strings was just a way of testing 
variations on the input. So a sequence with any combination of strings 
that can be read as numbers and strings of chars that don't look like 
numbers (even if that string includes digits) is the expected input

>
> But I think that what you actually have is a single list of strings, and
> you are supposed to sort the strings such that they come in numeric order
> first, then alphanumerical. E.g.:
>
> ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
> => ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

Not quite, what I want is to ensure that if the strings look like 
numbers they are placed in numerical order. ie 1 2 3 10 100 not 1 10 100 
2 3. Cases where a string has some leading digits can be treated as 
strings like any other.

> At least that is what I would expect as the useful thing to do when
> sorting.

Well it depends on the use case. In my case the strings are column and 
row labels for a report. I want them to be presented in a convenient to 
read sequence. Which the lexical sorting of the strings that look like 
numbers is not. I want a reasonable do-what-i-mean default sort order 
that can handle whatever strings are used.


>
> The trick is to take each string and split it into a leading number and a
> trailing alphanumeric string. Either part may be "empty". Here's a pure
> Python solution:
>
> from sys import maxsize  # use maxint in Python 2
> def split(s):
>      for i, c in enumerate(s):
>          if not c.isdigit():
>              break
>      else:  # aligned with the FOR, not the IF
>          return (int(s), '')
>      return (int(s[:i] or maxsize), s[i:])
>
> Now sort using this as a key function:
>
> py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
> py> sorted(L, key=split)
> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']
>
>
> The above solution is not quite general:
>
> * it doesn't handle negative numbers or numbers with a decimal point;
>
> * it doesn't handle the empty string in any meaningful way;
>
> * in practice, you may or may not want to ignore leading whitespace,
>    or trailing whitespace after the number part;
>
> * there's a subtle bug if a string contains a very large numeric prefix,
>    finding and fixing that is left as an exercise.

That looks more than  general enough for my purposes! I will experiment 
along those lines, thank you.

[toc] | [prev] | [next] | [standalone]


#32548

FromArnaud Delobelle <arnodel@gmail.com>
Date2012-11-01 00:59 +0000
Message-ID<mailman.3135.1351731559.27098.python-list@python.org>
In reply to#32538
On 31 October 2012 23:09, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> The trick is to take each string and split it into a leading number and a
> trailing alphanumeric string. Either part may be "empty". Here's a pure
> Python solution:
>
> from sys import maxsize  # use maxint in Python 2
> def split(s):
>     for i, c in enumerate(s):
>         if not c.isdigit():
>             break
>     else:  # aligned with the FOR, not the IF
>         return (int(s), '')
>     return (int(s[:i] or maxsize), s[i:])
>
> Now sort using this as a key function:
>
> py> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
> py> sorted(L, key=split)
> ['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

You don't actually need to split the string, it's enough to return a
pair consisting of the number of leading digits followed by the string
as the key. Here's an implementation using takewhile:

>>> from itertools import takewhile
>>> def prefix(s):
...     return sum(1 for c in takewhile(str.isdigit, s)) or 1000, s
...
>>> L = ['9', '1000', 'abc2', '55', '1', 'abc', '55a', '1a']
>>> sorted(L, key=prefix)
['1', '1a', '9', '55', '55a', '1000', 'abc', 'abc2']

Here's why it works:

>>> map(prefix, L)
[(1, '9'), (4, '1000'), (1000, 'abc2'), (2, '55'), (1, '1'), (1000,
'abc'), (2, '55a'), (1, '1a')]

-- 
Arnaud

[toc] | [prev] | [next] | [standalone]


#32544

FromMark Lawrence <breamoreboy@yahoo.co.uk>
Date2012-11-01 00:30 +0000
Message-ID<mailman.3132.1351729868.27098.python-list@python.org>
In reply to#32524
On 31/10/2012 22:24, Ian Kelly wrote:
> On Wed, Oct 31, 2012 at 3:33 PM, Mark Lawrence <breamoreboy@yahoo.co.uk>wrote:
>
>> Nope.  I'm busy porting my own code from 2.7 to 3.3 and cmp seems to be
>> very dead.
>>
>> This doesn't help either.
>>
>> c:\Users\Mark\Cash\Python>**2to3.py
>>
>> Traceback (most recent call last):
>>    File "C:\Python33\Tools\Scripts\**2to3.py", line 3, in <module>
>>      from lib2to3.main import main
>> ImportError: No module named main
>>
>
> Perhaps you have a sys.path conflict?

Correct, now fixed, thanks.

>
> Use functools.cmp_to_key for porting cmp functions.  "sort(x, my_cmp)"
> becomes "sort(x, key=cmp_to_key(my_cmp))"
>
> The cmp builtin is also gone.  If you need it, the suggested replacement
> for "cmp(a, b)" is "(b < a) - (a < b)".

As it's my own small code base I've blown away all references to cmp, 
it's rich comparisons all the way.

>
> Cheers,
> Ian
>

-- 
Cheers.

Mark Lawrence.

[toc] | [prev] | [next] | [standalone]


#32550

Fromwxjmfauth@gmail.com
Date2012-11-01 01:52 -0700
Message-ID<9296aee8-23b4-4376-a77f-7a032d1e5d23@googlegroups.com>
In reply to#32524
Le mercredi 31 octobre 2012 16:17:19 UTC+1, djc a écrit :
> I learn lots of useful things from the list, some not always welcome. No 
> 
> sooner had I found a solution to a minor inconvenience in my code, than 
> 
> a recent thread here drew my attention to the fact that it will not work 
> 
> for python 3. So suggestions please:
> 
> 
> 
>      TODO 2012-10-22: sort order numbers first then alphanumeric
> 
>  >>> n
> 
> ('1', '10', '101', '3', '40', '31', '13', '2', '2000')
> 
>  >>> s
> 
> ('a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4')
> 
> 
> 
>  >>> sorted(n)
> 
> ['1', '10', '101', '13', '2', '2000', '3', '31', '40']
> 
>  >>> sorted(s)
> 
> ['1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
>  >>> sorted(n+s)
> 
> ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
> 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
> 
> 
> 
> 
> 
> 
> Possibly there is a better way but for Python 2.7 this gives the 
> 
> required result
> 
> 
> 
> Python 2.7.3 (default, Sep 26 2012, 21:51:14)
> 
> 
> 
>  >>> sorted(int(x) if x.isdigit() else x for x in n+s)
> 
> [1, 2, 3, 10, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 
> 
> 'acd', 'b a 4', 'bcd']
> 
> 
> 
> 
> 
> [str(x) for x in sorted(int(x) if x.isdigit() else x for x in n+s)]
> 
> ['1', '2', '3', '10', '13', '31', '40', '101', '2000', '1a', '222 bb', 
> 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
> 
> 
> 
> 
> But not for Python 3
> 
> Python 3.2.3 (default, Oct 19 2012, 19:53:16)
> 
> 
> 
>  >>> sorted(n+s)
> 
> ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 
> 
> 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']
> 
> 
> 
>  >>> sorted(int(x) if x.isdigit() else x for x in n+s)
> 
> Traceback (most recent call last):
> 
>    File "<stdin>", line 1, in <module>
> 
> TypeError: unorderable types: str() < int()
> 
>  >>>
> 
> 
> 
> The best I can think of is to split the input sequence into two lists, 
> 
> sort each and then join them.
> 
> 
> 
> 
> 
> -- 
> 
> djc

>>> # Py 3.2.3
>>> z = ['1', '10', '101', '13', '1a', '2', '2000',
...     '222 bb', '3', '31', '40', 'a', 'a1', 'ab',
...     'acd', 'b a 4', 'bcd'
... ]
>>> n, s = [], []
>>> for e in z:
...     if e.isdigit():
...         n.append(int(e))
...     else:
...         s.append(e)
...         
>>> n.sort()
>>> s.sort()
>>> ns = [str(e) for e in n]
>>> ns.extend(s)
>>> ns
['1', '2', '3', '10', '13', '31', '40', '101', '2000', '1a',
'222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd']

jmf

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web