Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #84387

Re: Case-insensitive sorting of strings (Python newbie)

From Marko Rauhamaa <marko@pacujo.net>
Newsgroups comp.lang.python
Subject Re: Case-insensitive sorting of strings (Python newbie)
Date 2015-01-23 21:14 +0200
Organization A noiseless patient Spider
Message-ID <873871fgxk.fsf@elektro.pacujo.net> (permalink)
References <54C27E13.5090808@ntlworld.com> <mailman.18046.1422035592.18130.python-list@python.org>

Show all headers | View raw


Peter Otten <__peter__@web.de>:

> The standard recommendation is to convert bytes to unicode as early as
> possible and only manipulate unicode.

Unicode doesn't get you off the hook (as you explain later in your
post). Upper/lowercase as well as collation order is ambiguous. Python
even with decent locale support can't be expected to do it all for you.

Well, if Python can't, then who can? Probably nobody in the world, not
generically, anyway.

Example:

    >>> print("re\u0301sume\u0301")
    résumé
    >>> print("r\u00e9sum\u00e9")
    résumé
    >>> print("re\u0301sume\u0301" == "r\u00e9sum\u00e9")
    False
    >>> print("\ufb01nd")
    find
    >>> print("find")
    find
    >>> print("\ufb01nd" == "find")
    False

If equality can't be determined, words really can't be sorted.


Marko

Back to comp.lang.python | Previous | NextPrevious in thread | Next in thread | Find similar | Unroll thread


Thread

Re: Case-insensitive sorting of strings (Python newbie) Peter Otten <__peter__@web.de> - 2015-01-23 18:53 +0100
  Re: Case-insensitive sorting of strings (Python newbie) Marko Rauhamaa <marko@pacujo.net> - 2015-01-23 21:14 +0200
    Re: Case-insensitive sorting of strings (Python newbie) Chris Angelico <rosuav@gmail.com> - 2015-01-24 06:56 +1100
  Re: Case-insensitive sorting of strings (Python newbie) wxjmfauth@gmail.com - 2015-01-24 02:34 -0800

csiph-web