Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #18937 > unrolled thread

NaN, Null, and Sorting

Started byEthan Furman <ethan@stoneleaf.us>
First post2012-01-13 11:04 -0800
Last post2012-01-16 11:07 +0000
Articles 6 — 6 participants

Back to article view | Back to comp.lang.python


Contents

  NaN, Null, and Sorting Ethan Furman <ethan@stoneleaf.us> - 2012-01-13 11:04 -0800
    Re: NaN, Null, and Sorting Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-01-14 04:54 +0000
    Re: NaN, Null, and Sorting jmfauth <wxjmfauth@gmail.com> - 2012-01-13 23:43 -0800
    Re: NaN, Null, and Sorting Eelco <hoogendoorn.eelco@gmail.com> - 2012-01-16 02:22 -0800
      Re: NaN, Null, and Sorting Chris Angelico <rosuav@gmail.com> - 2012-01-16 21:57 +1100
      Re: NaN, Null, and Sorting Robert Kern <robert.kern@gmail.com> - 2012-01-16 11:07 +0000

#18937 — NaN, Null, and Sorting

FromEthan Furman <ethan@stoneleaf.us>
Date2012-01-13 11:04 -0800
SubjectNaN, Null, and Sorting
Message-ID<mailman.4725.1326484286.27778.python-list@python.org>
With NaN, it is possible to get a list that will not properly sort:

--> NaN = float('nan')
--> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
--> sorted(spam)
[1, 2, nan, 3, nan, 4, 5, 7, nan]

I'm constructing a Null object with the semantics that if the returned 
object is Null, it's actual value is unknown.

 From a purist point of view if it is unknown then comparison results 
are also unknown since the actual value might be greater, lesser, or the 
same as the value being compared against.

 From a practical point of view a list with Nulls scattered throughout 
is a pain in the backside.

So I am strongly leaning towards implementing the comparisons such that 
Null objects are less than other objects so they will always sort together.

Thoughts/advice/criticisms/etc?

~Ethan~

[toc] | [next] | [standalone]


#18959

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-01-14 04:54 +0000
Message-ID<4f110a95$0$29988$c3e8da3$5496439d@news.astraweb.com>
In reply to#18937
On Fri, 13 Jan 2012 11:04:48 -0800, Ethan Furman wrote:

> With NaN, it is possible to get a list that will not properly sort:
> 
> --> NaN = float('nan')
> --> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN] --> sorted(spam)
> [1, 2, nan, 3, nan, 4, 5, 7, nan]
> 
> I'm constructing a Null object with the semantics that if the returned
> object is Null, it's actual value is unknown.
>
> From a purist point of view if it is unknown then comparison results
> are also unknown since the actual value might be greater, lesser, or the
> same as the value being compared against.

From a purist point of view, NANs are unordered with respect to numbers, 
and so one of two behaviours should occur:

(1) nan OP x should raise an exception, for all comparison operators 
except == and != 

(2) nan OP x should return False for all OPs except != 

I believe the current version of the standard supports operators for both 
sets of behaviour; the 1990s version of Apple's numeric framework (SANE) 
included both.

I think Python chooses the second behaviour, although it may be version 
and platform dependent. This is from Python 2.6:

>>> float('nan') < 0
False
>>> float('nan') > 0
False


I would expect the same behaviour for your Null objects. But as you say:


> From a practical point of view a list with Nulls scattered throughout
> is a pain in the backside.

And this is why sorting should be defined in terms of a separate sorting 
operator, not < or >, so that lists containing unordered values like NANs, 
Nulls, and complex numbers, can be sorted. "Sorted" is a property of the 
list, not the values within the list.


> So I am strongly leaning towards implementing the comparisons such that
> Null objects are less than other objects so they will always sort
> together.
> 
> Thoughts/advice/criticisms/etc?

Possibly the least-worst solution.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#18962

Fromjmfauth <wxjmfauth@gmail.com>
Date2012-01-13 23:43 -0800
Message-ID<3b5278a0-9612-4c34-a743-df463f5bedc4@w4g2000vbc.googlegroups.com>
In reply to#18937
On 13 jan, 20:04, Ethan Furman <et...@stoneleaf.us> wrote:
> With NaN, it is possible to get a list that will not properly sort:
>
> --> NaN = float('nan')
> --> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
> --> sorted(spam)
> [1, 2, nan, 3, nan, 4, 5, 7, nan]
>
> I'm constructing a Null object with the semantics that if the returned
> object is Null, it's actual value is unknown.
>

Short answer.

-  NaN != NA()

-  I find the actual implementation (Py3.2) quite satisfying. (M.
Dickinson's work)

jmf

[toc] | [prev] | [next] | [standalone]


#19036

FromEelco <hoogendoorn.eelco@gmail.com>
Date2012-01-16 02:22 -0800
Message-ID<67bd5e6b-a332-4d13-aad3-8de88b218ac5@z19g2000vbe.googlegroups.com>
In reply to#18937
On 13 jan, 20:04, Ethan Furman <et...@stoneleaf.us> wrote:
> With NaN, it is possible to get a list that will not properly sort:
>
> --> NaN = float('nan')
> --> spam = [1, 2, NaN, 3, NaN, 4, 5, 7, NaN]
> --> sorted(spam)
> [1, 2, nan, 3, nan, 4, 5, 7, nan]
>
> I'm constructing a Null object with the semantics that if the returned
> object is Null, it's actual value is unknown.
>
>  From a purist point of view if it is unknown then comparison results
> are also unknown since the actual value might be greater, lesser, or the
> same as the value being compared against.
>
>  From a practical point of view a list with Nulls scattered throughout
> is a pain in the backside.
>
> So I am strongly leaning towards implementing the comparisons such that
> Null objects are less than other objects so they will always sort together.
>
> Thoughts/advice/criticisms/etc?
>
> ~Ethan~

My suggestion would be thus: nans/nulls are unordered; sorting them is
fundamentally an ill defined notion. What you want, conceptually, is a
sorted list of the sortable entries, and a seperate list of the
unsorted entries. Translated into code, the most pure solution would
be to filter out the nanas/nulls in their own list first, and then
sort the rest. If the interface demands it, you can concatenate the
lists afterwards, but probably it is most convenient to keep them in
seperate lists.

Perhaps arbitrarily defining the ordering of nulls/nans is slightly
more efficient than the above, but it should not make a big
difference, and in terms of purity its no contest.

[toc] | [prev] | [next] | [standalone]


#19037

FromChris Angelico <rosuav@gmail.com>
Date2012-01-16 21:57 +1100
Message-ID<mailman.4790.1326711481.27778.python-list@python.org>
In reply to#19036
On Mon, Jan 16, 2012 at 9:22 PM, Eelco <hoogendoorn.eelco@gmail.com> wrote:
> What you want, conceptually, is a
> sorted list of the sortable entries, and a seperate list of the
> unsorted entries. Translated into code, the most pure solution would
> be to filter out the nanas/nulls in their own list first, and then
> sort the rest. If the interface demands it, you can concatenate the
> lists afterwards, but probably it is most convenient to keep them in
> seperate lists.

So... you split it into two lists, sort the two lists (one of which
can't be sorted), and then concatenate them. Sounds like the quicksort
algorithm.

ChrisA

[toc] | [prev] | [next] | [standalone]


#19038

FromRobert Kern <robert.kern@gmail.com>
Date2012-01-16 11:07 +0000
Message-ID<mailman.4791.1326712065.27778.python-list@python.org>
In reply to#19036
On 1/16/12 10:57 AM, Chris Angelico wrote:
> On Mon, Jan 16, 2012 at 9:22 PM, Eelco<hoogendoorn.eelco@gmail.com>  wrote:
>> What you want, conceptually, is a
>> sorted list of the sortable entries, and a seperate list of the
>> unsorted entries. Translated into code, the most pure solution would
>> be to filter out the nanas/nulls in their own list first, and then
>> sort the rest. If the interface demands it, you can concatenate the
>> lists afterwards, but probably it is most convenient to keep them in
>> seperate lists.
>
> So... you split it into two lists, sort the two lists (one of which
> can't be sorted), and then concatenate them. Sounds like the quicksort
> algorithm.

Not at all. The "split it into two lists" steps are entirely different in what 
Eelco suggested and quicksort. It's misleading to attempt to describe both using 
the same words.

-- 
Robert Kern

"I have come to believe that the whole world is an enigma, a harmless enigma
  that is made terrible by our own mad attempt to interpret it as though it had
  an underlying truth."
   -- Umberto Eco

[toc] | [prev] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web