Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #32176 > unrolled thread

a.index(float('nan')) fails

Started bymamboknave@gmail.com
First post2012-10-25 18:46 -0700
Last post2012-10-27 13:33 -0500
Articles 20 on this page of 26 — 12 participants

Back to article view | Back to comp.lang.python


Contents

  a.index(float('nan')) fails mamboknave@gmail.com - 2012-10-25 18:46 -0700
    Re: a.index(float('nan')) fails Terry Reedy <tjreedy@udel.edu> - 2012-10-25 22:04 -0400
      Re: a.index(float('nan')) fails Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-26 02:44 +0000
        Re: a.index(float('nan')) fails Terry Reedy <tjreedy@udel.edu> - 2012-10-26 03:54 -0400
          Re: a.index(float('nan')) fails Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-26 15:26 +0000
            Re: a.index(float('nan')) fails Terry Reedy <tjreedy@udel.edu> - 2012-10-26 14:49 -0400
      Re: a.index(float('nan')) fails Nobody <nobody@nowhere.com> - 2012-10-27 17:40 +0100
    Re: a.index(float('nan')) fails Cameron Simpson <cs@zip.com.au> - 2012-10-26 13:15 +1100
      Re: a.index(float('nan')) fails mamboknave@gmail.com - 2012-10-25 19:22 -0700
      Re: a.index(float('nan')) fails mamboknave@gmail.com - 2012-10-25 19:22 -0700
    Re: a.index(float('nan')) fails MRAB <python@mrabarnett.plus.com> - 2012-10-26 03:19 +0100
    Re: a.index(float('nan')) fails Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-26 02:33 +0000
      Re: a.index(float('nan')) fails Ethan Furman <ethan@stoneleaf.us> - 2012-10-28 06:07 -0700
    Re: a.index(float('nan')) fails Terry Reedy <tjreedy@udel.edu> - 2012-10-26 04:00 -0400
      Re: a.index(float('nan')) fails Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-26 16:23 +0000
        Re: a.index(float('nan')) fails MRAB <python@mrabarnett.plus.com> - 2012-10-26 17:43 +0100
        Re: a.index(float('nan')) fails Chris Angelico <rosuav@gmail.com> - 2012-10-27 03:45 +1100
          Re: a.index(float('nan')) fails Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2012-10-26 18:40 +0000
            Re: a.index(float('nan')) fails Devin Jeanpierre <jeanpierreda@gmail.com> - 2012-10-26 15:17 -0400
            Re: a.index(float('nan')) fails Chris Angelico <rosuav@gmail.com> - 2012-10-27 12:29 +1100
        Re: a.index(float('nan')) fails Terry Reedy <tjreedy@udel.edu> - 2012-10-26 14:58 -0400
        Re: a.index(float('nan')) fails Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-27 00:48 -0400
          Re: a.index(float('nan')) fails Thomas Rachel <nutznetz-0c1b6768-bfa9-48d5-a470-7603bd3aa915@spamschutz.glglgl.de> - 2012-10-27 08:56 +0200
            Re: a.index(float('nan')) fails Nobody <nobody@nowhere.com> - 2012-10-27 17:45 +0100
            Re: a.index(float('nan')) fails Dennis Lee Bieber <wlfraed@ix.netcom.com> - 2012-10-27 13:07 -0400
    Re: a.index(float('nan')) fails Mark Adam <dreamingforward@gmail.com> - 2012-10-27 13:33 -0500

Page 1 of 2  [1] 2  Next page →


#32176 — a.index(float('nan')) fails

Frommamboknave@gmail.com
Date2012-10-25 18:46 -0700
Subjecta.index(float('nan')) fails
Message-ID<bd80bfd0-b423-418f-a338-fea626d50093@googlegroups.com>
>>> a = [float('nan'), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> a
[nan, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> a.index(float('nan'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ValueError: list.index(x): x not in list

That means, the function .index() cannot detect nan values.
It happens on both Python 2.6 and Python 3.1

Is this a bug? Or I am not using .index() correctly?

Thanks!

[toc] | [next] | [standalone]


#32178

FromTerry Reedy <tjreedy@udel.edu>
Date2012-10-25 22:04 -0400
Message-ID<mailman.2873.1351217113.27098.python-list@python.org>
In reply to#32176
On 10/25/2012 9:46 PM, mamboknave@gmail.com wrote:
>>>> a = [float('nan'), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>> a
> [nan, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>> a.index(float('nan'))
> Traceback (most recent call last):
>    File "<stdin>", line 1, in <module>
> ValueError: list.index(x): x not in list
>
> That means, the function .index() cannot detect nan values.
> It happens on both Python 2.6 and Python 3.1
>
> Is this a bug? Or I am not using .index() correctly?

It is a consequence of the following, which some people (but not all) 
believe is mandated by the IEEE standard.

 >>> nan = float('nan')
 >>> nan is nan
True
 >>> nan == nan
False

 >>> nanlist = [nan]
 >>> nan in nanlist
True
 >>> nanlist.index(nan)
0

Containment of nan in collection is tested by is, not ==.

 >>> nan2 = float('nan')
 >>> nan2 is nan
False
 >>> nan2 == nan
False
 >>> nan2 in nanlist
False

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#32187

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-26 02:44 +0000
Message-ID<5089f915$0$29984$c3e8da3$5496439d@news.astraweb.com>
In reply to#32178
On Thu, 25 Oct 2012 22:04:52 -0400, Terry Reedy wrote:

> It is a consequence of the following, which some people (but not all)
> believe is mandated by the IEEE standard.
> 
>  >>> nan = float('nan')
>  >>> nan is nan
> True

The IEEE 754 standard says nothing about object identity. It only 
discusses value equality.

>  >>> nan == nan
> False

IEEE 754 states that all NANs compare unequal to everything, including 
NANs with the same bit value. It doesn't make an exception for 
comparisons with itself.

I'm not entirely sure why you suggest that there is an argument about 
what IEEE 754 says about NANs. As far as I can see, the argument is 
whether or not language designers should pick and choose which bits of 
the standard they want to follow, thus taking a step backwards to the 
chaos of numerical computing prior to the IEEE 754 standard.

Seriously, there are some real horror stories about numerical computing 
back in the Bad Old Days. Like computers where 1.0/x could give a divide-
by-zero error even though x == 0 returned False.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32198

FromTerry Reedy <tjreedy@udel.edu>
Date2012-10-26 03:54 -0400
Message-ID<mailman.2885.1351238073.27098.python-list@python.org>
In reply to#32187
On 10/25/2012 10:44 PM, Steven D'Aprano wrote:
> On Thu, 25 Oct 2012 22:04:52 -0400, Terry Reedy wrote:
>
>> It is a consequence of the following, which some people (but not all)
>> believe is mandated by the IEEE standard.
>>
>>   >>> nan = float('nan')
>>   >>> nan is nan
>> True
>
> The IEEE 754 standard says nothing about object identity. It only
> discusses value equality.
>
>>   >>> nan == nan
>> False
>
> IEEE 754 states that all NANs compare unequal to everything, including
> NANs with the same bit value. It doesn't make an exception for
> comparisons with itself.
>
> I'm not entirely sure why you suggest that there is an argument about
> what IEEE 754 says about NANs.

I did not do so.

> As far as I can see, the argument is
> whether or not language designers should pick and choose which bits of
> the standard they want to follow, thus taking a step backwards to the
> chaos of numerical computing prior to the IEEE 754 standard.

There has been disagreement about whether the standard mandates that 
Python behave the way it does. That is a fact, but I have no interest in 
discussing the issue.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#32216

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-26 15:26 +0000
Message-ID<508aab92$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32198
On Fri, 26 Oct 2012 03:54:02 -0400, Terry Reedy wrote:

> On 10/25/2012 10:44 PM, Steven D'Aprano wrote:
>> On Thu, 25 Oct 2012 22:04:52 -0400, Terry Reedy wrote:
>>
>>> It is a consequence of the following, which some people (but not all)
>>> believe is mandated by the IEEE standard.
>>>
>>>   >>> nan = float('nan')
>>>   >>> nan is nan
>>> True
>>
>> The IEEE 754 standard says nothing about object identity. It only
>> discusses value equality.
>>
>>>   >>> nan == nan
>>> False
>>
>> IEEE 754 states that all NANs compare unequal to everything, including
>> NANs with the same bit value. It doesn't make an exception for
>> comparisons with itself.
>>
>> I'm not entirely sure why you suggest that there is an argument about
>> what IEEE 754 says about NANs.
> 
> I did not do so.

I'm afraid you did. Your quote is shown above, and repeated here:

"... some people (but not all) believe is mandated by the IEEE standard"

This suggests that there is a disagreement -- an argument -- about what 
the IEEE standard mandates about NANs. I don't know why you think this 
disagreement exists, or who these "some people" are. The standard is not 
ambiguous, and while it is not readily available at no cost, it is widely 
described by many secondary sources.

Every NAN must compare unequal to every float, including itself.


> There has been disagreement about whether the standard mandates that
> Python behave the way it does. That is a fact, but I have no interest in
> discussing the issue.

I'm not entirely sure which behaviour of Python you are referring to 
here. If you choose not to reply, of course I can't force you to. It's 
your right to make ambiguous statements and then refuse to clarify what 
you are talking about.

If you are referring to *identity comparisons*, the IEEE 754 says nothing 
about object identity, so it has no bearing on Python's `is` operator.

If you are referring to the fact that `nan != nan` in Python, that is 
mandated by the IEEE 754 standard. I can't imagine who maintains that the 
standard doesn't mandate that; as I said, the disagreement that I have 
seen is whether or not to follow the standard, not on what the standard 
says.

If you are referring to something else, I don't know what it is.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32230

FromTerry Reedy <tjreedy@udel.edu>
Date2012-10-26 14:49 -0400
Message-ID<mailman.2901.1351277411.27098.python-list@python.org>
In reply to#32216
On 10/26/2012 11:26 AM, Steven D'Aprano wrote:
> On Fri, 26 Oct 2012 03:54:02 -0400, Terry Reedy wrote:
>
>> On 10/25/2012 10:44 PM, Steven D'Aprano wrote:
>>> On Thu, 25 Oct 2012 22:04:52 -0400, Terry Reedy wrote:
>>>
>>>> It is a consequence of the following, which some people (but not all)
>>>> believe is mandated by the IEEE standard.
>>>>
>>>>    >>> nan = float('nan')
>>>>    >>> nan is nan
>>>> True
>>>
>>> The IEEE 754 standard says nothing about object identity. It only
>>> discusses value equality.
>>>
>>>>    >>> nan == nan
>>>> False
>>>
>>> IEEE 754 states that all NANs compare unequal to everything, including
>>> NANs with the same bit value. It doesn't make an exception for
>>> comparisons with itself.
>>>
>>> I'm not entirely sure why you suggest that there is an argument about
>>> what IEEE 754 says about NANs.
>>
>> I did not do so.
>
> I'm afraid you did. Your quote is shown above, and repeated here:

The quote precedes and refers to Python code.

>
> "... some people (but not all) believe is mandated by the IEEE standard"
>
> This suggests that there is a disagreement -- an argument -- about what
> the IEEE standard mandates about NANs.

Disagreement about what Python should do has been expressed on the lists 
and even on the tracker. There was one discussion on python-ideas within 
the last month, another a year or so ago.

Python does not implement the full IEEE standard with signalling and 
non-signalling nans and multiple bit patterns.

When a nan is put in a Python collection, it is in effect treated as if 
it were equal to itself.
See the discussion in http://bugs.python.org/issue4296
including "I'm not sure that Python should be asked to guarantee
anything more than "b == b" returning False when b is
a NaN.  It wouldn't seem unreasonable to consider
behavior of nans in containers (sets, lists, dicts)
as undefined when it comes to equality and identity
checks."



-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#32285

FromNobody <nobody@nowhere.com>
Date2012-10-27 17:40 +0100
Message-ID<pan.2012.10.27.16.40.19.513000@nowhere.com>
In reply to#32178
On Thu, 25 Oct 2012 22:04:52 -0400, Terry Reedy wrote:

> Containment of nan in collection is tested by is, not ==.

AFAICT, it isn't specific to NaN. The test used by .index() and "in"
appears to be equivalent to:

	def equal(a, b):
	    return a is b or a == b

IOW, it always checks for object identity before equality.

Replacing NaN with an instance of a user-defined class with a
non-reflexive __eq__() method supports this:

	> class Foo(object):
	=  def __eq__(self, other):
	=   return False
	= 
	> a = Foo()
	> b = Foo()
	> a in [1,2,a,3,4]
	True
	> b in [1,2,a,3,4]
	False
	> [1,2,a,3,4].index(a)
	2
	> [1,2,a,3,4].index(b)
	Traceback (most recent call last):
	  File "<stdin>", line 1, in <module>
	ValueError: <__main__.Foo object at 0x7fa7055b0550> is not in list

[toc] | [prev] | [next] | [standalone]


#32181

FromCameron Simpson <cs@zip.com.au>
Date2012-10-26 13:15 +1100
Message-ID<mailman.2875.1351217756.27098.python-list@python.org>
In reply to#32176
On 25Oct2012 18:46, mamboknave@gmail.com <mamboknave@gmail.com> wrote:
| >>> a = [float('nan'), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
| >>> a
| [nan, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
| >>> a.index(float('nan'))
| Traceback (most recent call last):
|   File "<stdin>", line 1, in <module>
| ValueError: list.index(x): x not in list
| 
| That means, the function .index() cannot detect nan values.
| It happens on both Python 2.6 and Python 3.1
| 
| Is this a bug? Or I am not using .index() correctly?

The special NaN float value always compares unequal, even to itself.
IEEE floating point FTW!

You're using index incorrectly, but only because it relies on ==
returning True, which it won't.

You can use math.isnan:

  http://docs.python.org/library/math.html#math.isnan
  http://docs.python.org/py3k/library/math.html#math.isnan

for the test instead. Nan requires special handling.

Cheers,
-- 
Cameron Simpson <cs@zip.com.au>

I'm not making any of this up you know. - Anna Russell

[toc] | [prev] | [next] | [standalone]


#32182

Frommamboknave@gmail.com
Date2012-10-25 19:22 -0700
Message-ID<69dfc5aa-e4f2-4953-a8c4-1adb0b449960@googlegroups.com>
In reply to#32181
On Thursday, October 25, 2012 7:16:02 PM UTC-7, Cameron Simpson wrote:

Of course!! How could I get into that trap??

Thanks to you & to Terry

[toc] | [prev] | [next] | [standalone]


#32183

Frommamboknave@gmail.com
Date2012-10-25 19:22 -0700
Message-ID<mailman.2876.1351218138.27098.python-list@python.org>
In reply to#32181
On Thursday, October 25, 2012 7:16:02 PM UTC-7, Cameron Simpson wrote:

Of course!! How could I get into that trap??

Thanks to you & to Terry

[toc] | [prev] | [next] | [standalone]


#32184

FromMRAB <python@mrabarnett.plus.com>
Date2012-10-26 03:19 +0100
Message-ID<mailman.2877.1351218162.27098.python-list@python.org>
In reply to#32176
On 2012-10-26 03:04, Terry Reedy wrote:
> On 10/25/2012 9:46 PM, mamboknave@gmail.com wrote:
>>>>> a = [float('nan'), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>>> a
>> [nan, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>>> a.index(float('nan'))
>> Traceback (most recent call last):
>>    File "<stdin>", line 1, in <module>
>> ValueError: list.index(x): x not in list
>>
>> That means, the function .index() cannot detect nan values.
>> It happens on both Python 2.6 and Python 3.1
>>
>> Is this a bug? Or I am not using .index() correctly?
>
> It is a consequence of the following, which some people (but not all)
> believe is mandated by the IEEE standard.
>
>   >>> nan = float('nan')
>   >>> nan is nan
> True
>   >>> nan == nan
> False
>
>   >>> nanlist = [nan]
>   >>> nan in nanlist
> True
>   >>> nanlist.index(nan)
> 0
>
> Containment of nan in collection is tested by is, not ==.
>
>   >>> nan2 = float('nan')
>   >>> nan2 is nan
> False
>   >>> nan2 == nan
> False
>   >>> nan2 in nanlist
> False
>
In summary, .index() looks for an item which is equal to its argument,
but it's a feature of NaN (as defined by the standard) that it doesn't
equal NaN, therefore .index() will never find it.

Another consequence is that the presence of a NaN in a list prevents
.sort() from sorting correctly.

[toc] | [prev] | [next] | [standalone]


#32185

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-26 02:33 +0000
Message-ID<5089f662$0$29984$c3e8da3$5496439d@news.astraweb.com>
In reply to#32176
On Thu, 25 Oct 2012 18:46:20 -0700, mamboknave wrote:

> That means, the function .index() cannot detect nan values. It happens
> on both Python 2.6 and Python 3.1
> 
> Is this a bug? Or I am not using .index() correctly?

You shouldn't be using index() or == to detect NANs. The right way to 
detect NANs is with the math.isnan() function.

The list.index method tests for the item with equality. Since NANs are 
mandated to compare unequal to anything, including themselves, index 
cannot match them.

Try this instead:

from math import isnan

def find_nan(seq):
    """Return the index of the first NAN in seq, otherwise None."""
    for i, x in enumerate(seq):
        if isnan(x):
            return i


For old versions of Python that don't provide an isnan function, you can 
do this:

def isnan(x):
    return x != x



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32311

FromEthan Furman <ethan@stoneleaf.us>
Date2012-10-28 06:07 -0700
Message-ID<mailman.2969.1351430163.27098.python-list@python.org>
In reply to#32185
Steven D'Aprano wrote:
> The list.index method tests for the item with equality. Since NANs are 
> mandated to compare unequal to anything, including themselves, index 
> cannot match them.

This is incorrect.  .index() uses identity first, then equality, and 
will match the same NaN in a list.  The OP's problem was in using a 
different NaN.

Having said that, your find_nan() solution is probably the one to use 
anyway.

> from math import isnan
> 
> def find_nan(seq):
>     """Return the index of the first NAN in seq, otherwise None."""
>     for i, x in enumerate(seq):
>         if isnan(x):
>             return i
> 
> 
> For old versions of Python that don't provide an isnan function, you can 
> do this:
> 
> def isnan(x):
>     return x != x

[toc] | [prev] | [next] | [standalone]


#32199

FromTerry Reedy <tjreedy@udel.edu>
Date2012-10-26 04:00 -0400
Message-ID<mailman.2886.1351238424.27098.python-list@python.org>
In reply to#32176
On 10/25/2012 10:19 PM, MRAB wrote:
> On 2012-10-26 03:04, Terry Reedy wrote:
>> On 10/25/2012 9:46 PM, mamboknave@gmail.com wrote:
>>>>>> a = [float('nan'), 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>>>> a
>>> [nan, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>>>>> a.index(float('nan'))

This is a second nan object, and it is not in the list.

>>> Traceback (most recent call last):
>>>    File "<stdin>", line 1, in <module>
>>> ValueError: list.index(x): x not in list
>>>
>>> That means, the function .index() cannot detect nan values.
>>> It happens on both Python 2.6 and Python 3.1
>>>
>>> Is this a bug? Or I am not using .index() correctly?
>>
>> It is a consequence of the following, which some people (but not all)
>> believe is mandated by the IEEE standard.
>>
>>   >>> nan = float('nan')
>>   >>> nan is nan
>> True
>>   >>> nan == nan
>> False
>>
>>   >>> nanlist = [nan]
>>   >>> nan in nanlist
>> True
>>   >>> nanlist.index(nan)
>> 0

.index found the nan.

>> Containment of nan in collection is tested by is, not ==.
>>
>>   >>> nan2 = float('nan')
>>   >>> nan2 is nan
>> False
>>   >>> nan2 == nan
>> False
>>   >>> nan2 in nanlist
>> False
>>
> In summary, .index() looks for an item which is equal to its argument,
> but it's a feature of NaN (as defined by the standard) that it doesn't
> equal NaN, therefore .index() will never find it.

Except that is *does* find the particular nan object that is in the 
collection. So nan in collection and list.index(nan) look for the nan by 
identity, not equality. This inconsistency is an intentional decision to 
not propagate the insanity of nan != nan to Python collections.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#32224

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-26 16:23 +0000
Message-ID<508ab917$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32199
On Fri, 26 Oct 2012 04:00:03 -0400, Terry Reedy wrote:

> On 10/25/2012 10:19 PM, MRAB wrote:

>> In summary, .index() looks for an item which is equal to its argument,
>> but it's a feature of NaN (as defined by the standard) that it doesn't
>> equal NaN, therefore .index() will never find it.
> 
> Except that is *does* find the particular nan object that is in the
> collection. So nan in collection and list.index(nan) look for the nan by
> identity, not equality.

So it does. I made the same mistake as MRAB, thank you for the correction.



> This inconsistency is an intentional decision to
> not propagate the insanity of nan != nan to Python collections.

That's a value judgement about NANs which is not shared by everyone.

Quite frankly, I consider it an ignorant opinion about NANs, despite what 
Bertrand Meyer thinks. Reflectivity is an important property, but it is 
not the only important property and it is not even the most important 
property of numbers. There are far worse problems with floats than the 
non-reflexivity of NANs.

Since it is impossible to have a fixed-size numeric type that satisfies 
*all* of the properties of real numbers, some properties must be broken. 
I can only imagine that the reason Meyer, and presumably you, think that 
the loss of reflexivity is more "insane" than the other violations of 
floating point numbers is due to unfamiliarity. (And note that I said 
*numbers*, not NANs.)

Anyone who has used a pocket calculator will be used to floating point 
calculations being wrong, so much so that most people don't even think 
about it. They just expect numeric calculations to be off by a little, 
and don't give it any further thought. But NANs freak them out because 
they're different.

In real life, you are *much* more likely to run into these examples of 
"insanity" of floats than to be troubled by NANs:

- associativity of addition is lost
- distributivity of multiplication is lost
- commutativity of addition is lost
- not all floats have an inverse

e.g. 

(0.1 + 0.2) + 0.3 != 0.1 + (0.2 + 0.3)

1e6*(1.1 + 2.2) != 1e6*1.1 + 1e6*2.2

1e10 + 0.1 + -1e10 != 1e10 + -1e10 + 0.1

1/(1/49.0) != 49.0

Such violations of the rules of real arithmetic aren't even hard to find. 
They're everywhere.

In practical terms, those sorts of errors are *far* more significant in 
computational mathematics than the loss of reflexivity. I can't think of 
the last time I've cared that x is not necessarily equal to x in a 
floating point calculation, but the types of errors shown above are 
*constantly* effecting computations and leading to loss of precision or 
even completely wrong answers.

Once NANs were introduced, keeping reflexivity would lead to even worse 
situations than x != x. It would lead to nonsense identities like 
log(-1) ==log(-2), hence 1 == 2.


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32226

FromMRAB <python@mrabarnett.plus.com>
Date2012-10-26 17:43 +0100
Message-ID<mailman.2897.1351269785.27098.python-list@python.org>
In reply to#32224
On 2012-10-26 17:23, Steven D'Aprano wrote:
> On Fri, 26 Oct 2012 04:00:03 -0400, Terry Reedy wrote:
>
>> On 10/25/2012 10:19 PM, MRAB wrote:
>
>>> In summary, .index() looks for an item which is equal to its argument,
>>> but it's a feature of NaN (as defined by the standard) that it doesn't
>>> equal NaN, therefore .index() will never find it.
>>
>> Except that is *does* find the particular nan object that is in the
>> collection. So nan in collection and list.index(nan) look for the nan by
>> identity, not equality.
>
> So it does. I made the same mistake as MRAB, thank you for the correction.
>
[snip]
Yes, I forgot that Python checks for identity before checking for
equality:

 >>> [float("nan")].index(float("nan"))
Traceback (most recent call last):
   File "<pyshell#0>", line 1, in <module>
     [float("nan")].index(float("nan"))
ValueError: nan is not in list
 >>> nan = float("nan")
 >>> [nan].index(nan)
0

[toc] | [prev] | [next] | [standalone]


#32227

FromChris Angelico <rosuav@gmail.com>
Date2012-10-27 03:45 +1100
Message-ID<mailman.2898.1351269949.27098.python-list@python.org>
In reply to#32224
On Sat, Oct 27, 2012 at 3:23 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> In real life, you are *much* more likely to run into these examples of
> "insanity" of floats than to be troubled by NANs:
>
> - associativity of addition is lost
> - distributivity of multiplication is lost
> - commutativity of addition is lost
> - not all floats have an inverse
>
> e.g.
>
> (0.1 + 0.2) + 0.3 != 0.1 + (0.2 + 0.3)
>
> 1e6*(1.1 + 2.2) != 1e6*1.1 + 1e6*2.2
>
> 1e10 + 0.1 + -1e10 != 1e10 + -1e10 + 0.1
>
> 1/(1/49.0) != 49.0
>
> Such violations of the rules of real arithmetic aren't even hard to find.
> They're everywhere.

Actually, as I see it, there's only one principle to take note of: the
"HMS Pinafore Floating Point Rule"...

** Floating point expressions should never be tested for equality **
** What, never? **
** Well, hardly ever! **

The problem isn't with the associativity, it's with the equality
comparison. Replace "x == y" with "abs(x-y)<epsilon" for some epsilon
and all your statements fulfill people's expectations. (Possibly with
the exception of "1e10 + 0.1 + -1e10" as it's going to be hard for an
automated algorithm to pick a useful epsilon. But it still works.)
Ultimately, it's the old problem of significant digits. Usually it
only comes up with measured quantities, but this is ultimately the
same issue. Doing calculations to greater precision than the answer
warrants is fine, but when you come to compare, you effectively need
to round both values off to their actual precisions.

ChrisA

[toc] | [prev] | [next] | [standalone]


#32229

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2012-10-26 18:40 +0000
Message-ID<508ad937$0$29967$c3e8da3$5496439d@news.astraweb.com>
In reply to#32227
On Sat, 27 Oct 2012 03:45:46 +1100, Chris Angelico wrote:

> On Sat, Oct 27, 2012 at 3:23 AM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> In real life, you are *much* more likely to run into these examples of
>> "insanity" of floats than to be troubled by NANs:
>>
>> - associativity of addition is lost
>> - distributivity of multiplication is lost 
>> - commutativity of addition is lost
>> - not all floats have an inverse
>>
>> e.g.
>>
>> (0.1 + 0.2) + 0.3 != 0.1 + (0.2 + 0.3)
>>
>> 1e6*(1.1 + 2.2) != 1e6*1.1 + 1e6*2.2
>>
>> 1e10 + 0.1 + -1e10 != 1e10 + -1e10 + 0.1
>>
>> 1/(1/49.0) != 49.0
>>
>> Such violations of the rules of real arithmetic aren't even hard to
>> find. They're everywhere.
> 
> Actually, as I see it, there's only one principle to take note of: the
> "HMS Pinafore Floating Point Rule"...
> 
> ** Floating point expressions should never be tested for equality ** 
> ** What, never? **
> ** Well, hardly ever! **
> 
> The problem isn't with the associativity, it's with the equality
> comparison. Replace "x == y" with "abs(x-y)<epsilon" for some epsilon
> and all your statements fulfill people's expectations.

O RYLY?

Would you care to tell us which epsilon they should use?

Hint: *whatever* epsilon you pick, there will be cases where that is 
either stupidly too small, stupidly too large, or one that degenerates to 
float equality. And you may not be able to tell if you have one of those 
cases or not.

Here's a concrete example for you: 

What *single* value of epsilon should you pick such that the following 
two expressions evaluate correctly?

sum([1e20, 0.1, -1e20, 0.1]*1000) == 200
sum([1e20, 99.9, -1e20, 0.1]*1000) != 200


The advice "never test floats for equality" is:

(1) pointless without a good way to know what epsilon they should use;

(2) sheer superstition since there are cases where testing floats for 
equality is the right thing to do (although I note you dodged that bullet 
with "hardly ever" *wink*);

and most importantly

(3) missing the point, since the violations of the rules of real-valued 
mathematics still occur regardless of whether you explicitly test for 
equality or not.

For instance, if you write:

result = a + (b + c)

some compilers may assume associativity and calculate (a + b) + c 
instead. But that is not guaranteed to give the same result! (K&R allowed 
C compilers to do that; the subsequent ANSI C standard prohibited re-
ordering, but in practice most C compilers provide a switch to allow it.)

A real-world example: Python's math.fsum is a high-precision summation 
with error compensation based on the Kahan summation algorithm. Here's a 
pseudo-code version:

http://en.wikipedia.org/wiki/Kahan_summation_algorithm

which includes the steps:

t = sum + y;
c = (t - sum) - y;

A little bit of algebra should tell you that c must equal zero. 
Unfortunately, in this case algebra is wrong, because floats are not real 
numbers. c is not necessarily zero.

An optimizing compiler, or an optimizing programmer, might very well 
eliminate those calculations and so inadvertently eliminate the error 
compensation. And not an equals sign in sight.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#32233

FromDevin Jeanpierre <jeanpierreda@gmail.com>
Date2012-10-26 15:17 -0400
Message-ID<mailman.2904.1351279099.27098.python-list@python.org>
In reply to#32229
On Fri, Oct 26, 2012 at 2:40 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
>> The problem isn't with the associativity, it's with the equality
>> comparison. Replace "x == y" with "abs(x-y)<epsilon" for some epsilon
>> and all your statements fulfill people's expectations.
>
> O RYLY?
>
> Would you care to tell us which epsilon they should use?

I would assume some epsilon that bounds the error of their
computation. Which one to use would depend on the error propagation
their function incurs.

That said, I also disagree with the sentiment "all your statements
fulfill people's expectations". Comparing to be within some epsilon of
each other may mean that some things that are the result of
mathematically unequal expressions, will be called equal because they
are very close to each other by accident. Unless perhaps completely
tight bounds on error can be achieved? I've never seen anyone do this,
but maybe it's reasonable.

> Hint: *whatever* epsilon you pick, there will be cases where that is
> either stupidly too small, stupidly too large, or one that degenerates to
> float equality. And you may not be able to tell if you have one of those
> cases or not.
>
> Here's a concrete example for you:
>
> What *single* value of epsilon should you pick such that the following
> two expressions evaluate correctly?
>
> sum([1e20, 0.1, -1e20, 0.1]*1000) == 200
> sum([1e20, 99.9, -1e20, 0.1]*1000) != 200

Some computations have unbounded error, such as computations where
catastrophic cancellation can occur. That doesn't mean all
computations do. For many computations, you can find a single epsilon
that will always return True for things that "should" be equal, but
aren't -- for example, squaring a number does no worse than tripling
the relative error, so if you square a number that was accurate to
within machine epsilon, and want to compare it to a constant, you can
compare with relative epsilon = 3*machine_epsilon.

I'm not sure how commonly this occurs in real life, because I'm not a
numerical programmer. All I know is that your example is good, but
shows a not-universally-applicable problem.

It is, however, still pretty applicable and worth noting, so I'm not
unhappy you did. For example, how large can the absolute error of the
sin function applied to a float be? Answer: as large as 2, and the
relative error can be arbitrarily large. (Reason: error scales with
the input, but the frequency of the sin function does not.)

(In case you can't tell, I only have studied this stuff as a student. :P)

-- Devin

[toc] | [prev] | [next] | [standalone]


#32259

FromChris Angelico <rosuav@gmail.com>
Date2012-10-27 12:29 +1100
Message-ID<mailman.2927.1351301389.27098.python-list@python.org>
In reply to#32229
On Sat, Oct 27, 2012 at 5:40 AM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> On Sat, 27 Oct 2012 03:45:46 +1100, Chris Angelico wrote:
>>
>> Actually, as I see it, there's only one principle to take note of: the
>> "HMS Pinafore Floating Point Rule"...
>>
>> ** Floating point expressions should never be tested for equality **
>> ** What, never? **
>> ** Well, hardly ever! **
>>
>> The problem isn't with the associativity, it's with the equality
>> comparison. Replace "x == y" with "abs(x-y)<epsilon" for some epsilon
>> and all your statements fulfill people's expectations.
>
> O RYLY?
>
> Would you care to tell us which epsilon they should use?
>
> Hint: *whatever* epsilon you pick, there will be cases where that is
> either stupidly too small, stupidly too large, or one that degenerates to
> float equality. And you may not be able to tell if you have one of those
> cases or not.
>
> Here's a concrete example for you:
>
> What *single* value of epsilon should you pick such that the following
> two expressions evaluate correctly?
>
> sum([1e20, 0.1, -1e20, 0.1]*1000) == 200
> sum([1e20, 99.9, -1e20, 0.1]*1000) != 200

Your epsilon value needs to take into account the precisions of the
values involved, and each operation needs to modify the
precision/error value. That's how I was taught to do it in
mathematical calculations. Well, I was taught "significant digits",
counting decimal digits, and a computer would normally want to count
"bits of precision", but close enough.

So here's my heresy: When you add 1e20 and 0.1, the value should be
equal to the original 1e20 unless it has at least 21 significant
digits. Otherwise, you get stupidly accurate errors, like in the old
anecdote about the age of a museum piece: It's 1001 years, 2 months,
and 3 days old, because I asked last year how old it was and it was a
thousand years old.

Flame away!

ChrisA

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web