Path: csiph.com!usenet.pasdenom.info!weretis.net!feeder1.news.weretis.net!feeder.erje.net!eu.feeder.erje.net!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'else:': 0.03; 'output': 0.04; 'string.': 0.04; 'test,': 0.05; 'strings.': 0.07; 'python': 0.09; '40,': 0.09; '[1,': 0.09; 'input,': 0.09; 'integers': 0.09; 'received:80.91': 0.09; 'received:80.91.229': 0.09; 'received:gmane.org': 0.09; 'received:list': 0.09; 'sep': 0.09; 'typeerror:': 0.09; 'types:': 0.09; 'def': 0.10; '2.7': 0.13; "'a',": 0.16; '2.7.3': 0.16; 'ascending': 0.16; 'cmp': 0.16; 'digits.': 0.16; 'item)': 0.16; 'keywords.': 0.16; 'numerically': 0.16; 'oct': 0.16; 'octal': 0.16; 'purposes.': 0.16; 'received:80.91.229.3': 0.16; 'received:plane.gmane.org': 0.16; 'sorted()': 0.16; 'sorting': 0.16; 'str()': 0.16; 'wed,': 0.16; 'string': 0.17; 'expanded': 0.17; 'integer': 0.17; "shouldn't": 0.17; 'unicode': 0.17; 'handles': 0.18; '>>>': 0.18; 'appears': 0.18; 'input': 0.18; 'sort': 0.21; '"",': 0.22; '2000,': 0.22; '3.x': 0.22; '31,': 0.22; 'split': 0.23; 'pass': 0.25; 'values': 0.26; 'expand': 0.26; '(most': 0.27; 'first,': 0.27; 'possibly': 0.27; 'entries': 0.27; 'mix': 0.27; 'header:X-Complaints-To:1': 0.28; 'comparison': 0.29; 'decimal': 0.29; 'equality': 0.29; 'optional': 0.29; 'character': 0.29; 'covered': 0.29; 'probably': 0.29; "skip:' 10": 0.30; 'function': 0.30; 'point': 0.31; 'file': 0.32; 'could': 0.32; 'doubt': 0.33; 'mixed': 0.33; 'traceback': 0.33; 'url:home': 0.33; 'to:addr:python-list': 0.33; 'equal': 0.33; 'version': 0.34; 'list': 0.35; 'returning': 0.35; 'sequence': 0.35; 'todo': 0.35; "won't": 0.35; 'something': 0.35; 'there': 0.35; 'next': 0.35; 'received:org': 0.36; 'really': 0.36; 'but': 0.36; 'should': 0.36; 'charset:us-ascii': 0.36; 'two': 0.37; 'why': 0.37; 'rather': 0.37; 'data': 0.37; 'subject:: ': 0.38; 'gives': 0.39; 'to:addr:python.org': 0.39; 'header:Received:5': 0.40; 'think': 0.40; 'your': 0.60; 'first': 0.61; 'alphanumeric': 0.65; 'incorporate': 0.65; 'subject': 0.66; "'2',": 0.84; "'3',": 0.84; 'replicate': 0.84; 'dennis': 0.91 X-Injected-Via-Gmane: http://gmane.org/ To: python-list@python.org From: Dennis Lee Bieber Subject: Re: sort order for strings of digits Date: Wed, 31 Oct 2012 14:17:54 -0400 Organization: > Bestiaria Support Staff < References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: 7bit X-Gmane-NNTP-Posting-Host: adsl-76-253-96-207.dsl.klmzmi.sbcglobal.net X-Newsreader: Forte Agent 3.3/32.846 X-No-Archive: YES X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.15 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 175 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1351707476 news.xs4all.nl 6981 [2001:888:2000:d::a6]:49843 X-Complaints-To: abuse@xs4all.nl Xref: csiph.com comp.lang.python:32529 On Wed, 31 Oct 2012 15:17:14 +0000, djc declaimed the following in gmane.comp.python.general: > > TODO 2012-10-22: sort order numbers first then alphanumeric > >>> n > ('1', '10', '101', '3', '40', '31', '13', '2', '2000') > >>> s > ('a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4') > > >>> sorted(n) > ['1', '10', '101', '13', '2', '2000', '3', '31', '40'] > >>> sorted(s) > ['1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] > >>> sorted(n+s) > ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', > 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] > Both your subject line, and the above samples are not "sorting 'numbers'"... They are sorting STRINGS that contain values representing the printable glyphs of decimal digits. The above should work in Python 3 as all the data appears as strings (probably Unicode in Python 3). However, you won't get a "numeric order" for the entries that are numbers -- the sort is lexicographical {Using 2.7}... Is the requirement that strings of digits are to be sorted AS integers rather than lexicographic? >>> data = [ '1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015', ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data) ['0015', '1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> data = [ 1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015, ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data) [1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> #note how "0015" sorted first, but 0015 is an octal value equal to decimal 13 >>> #also note how the string representation put sorted by the first character >>> #but the mixed list sorted by ascending integer value >>> > > > Possibly there is a better way but for Python 2.7 this gives the > required result > > Python 2.7.3 (default, Sep 26 2012, 21:51:14) > > >>> sorted(int(x) if x.isdigit() else x for x in n+s) > [1, 2, 3, 10, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', > 'acd', 'b a 4', 'bcd'] > This, however, is returning a mix of INTEGER and STRING. Is that what is really wanted? Your input, I presume, was the "all string" version -- shouldn't the output also be all string? (Okay -- that IS your next item) > > [str(x) for x in sorted(int(x) if x.isdigit() else x for x in n+s)] > ['1', '2', '3', '10', '13', '31', '40', '101', '2000', '1a', '222 bb', > 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] > > > But not for Python 3 > Python 3.2.3 (default, Oct 19 2012, 19:53:16) > > >>> sorted(n+s) > ['1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', > 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] > > >>> sorted(int(x) if x.isdigit() else x for x in n+s) > Traceback (most recent call last): > File "", line 1, in > TypeError: unorderable types: str() < int() > >>> > > The best I can think of is to split the input sequence into two lists, > sort each and then join them. Why -- I doubt Python 3.x .sort() and sorted() have removed the optional key and cmp keywords. Just supply your own comparison function that handles mixed types and you should be able to replicate the 2.7 process. Something like (untested on Python 3.x) >>> def cmpr(l, r): ... if type(l) == type(r): ... if l < r: return -1 ... if l == r: return 0 ... if l > r: return 1 ... else: ... if type(l) < type(r): return -1 ... if type(l) > type(r): return 1 ... # equality is covered above block ... >>> data = [ '1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015', ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data, cmp=cmpr) ['0015', '1', '10', '101', '13', '1a', '2', '2000', '222 bb', '3', '31', '40', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> data = [ 1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015, ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data, cmp=cmpr) [1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> You could even expand the cmpr() function to incorporate the conversion of decimal strings into numbers... >>> def cmpr(l, r): ... myL = l if type(l) == type("") and not l.isdigit() else int(l) ... myR = r if type(r) == type("") and not r.isdigit() else int(r) ... if type(myL) == type(myR): ... if myL < myR: return -1 ... if myL == myR: return 0 ... if myL > myR: return 1 ... else: ... if type(myL) < type(myR): return -1 ... if type(myL) > type(myR): return 1 ... >>> data = [ '1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015', ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data, cmp=cmpr) ['1', '2', '3', '10', '13', '0015', '31', '40', '101', '2000', '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> data = [ 1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015, ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4' ] >>> sorted(data, cmp=cmpr) [1, 2, 3, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> NOTE how the version with all character data has sorted the pure decimals numerically while still returning them as original strings. HOWEVER -- this probably needs to be expanded if you might have floating point STRINGS... Observe: >>> data = [ '1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015', ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4', '3.14153', '2.718E0' ] >>> sorted(data, cmp=cmpr) ['1', '2', '3', '10', '13', '0015', '31', '40', '101', '2000', '1a', '2.718E0', '222 bb', '3.14153', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> data = [ '1', '10', '101', '3', '40', '31', '13', '2', '2000', '0015', ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4', 3.14153, 2.718E0 ] >>> sorted(data, cmp=cmpr) ['1', '2', 2.718, '3', 3.14153, '10', '13', '0015', '31', '40', '101', '2000', '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> data = [ 1, 10, 101, 3, 40, 31, 13, 2, 2000, 0015, ... 'a', 'ab', 'acd', 'bcd', '1a', 'a1', '222 bb', 'b a 4', 3.14153, 2.718E0] >>> sorted(data, cmp=cmpr) [1, 2, 2.718, 3, 3.14153, 10, 13, 13, 31, 40, 101, 2000, '1a', '222 bb', 'a', 'a1', 'ab', 'acd', 'b a 4', 'bcd'] >>> "3.14153" and "2.718E0" do not pass the .isdigit() test, so remain as strings for sort purposes. -- Wulfraed Dennis Lee Bieber AF6VN wlfraed@ix.netcom.com HTTP://wlfraed.home.netcom.com/