Path: csiph.com!v102.xanadu-bbs.net!xanadu-bbs.net!feeder.erje.net!eu.feeder.erje.net!news2.arglkargh.de!news.mixmin.net!eternal-september.org!feeder.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail From: Johann Hibschman Newsgroups: comp.lang.python Subject: Re: NaN comparisons - Call For Anecdotes Date: Thu, 17 Jul 2014 14:49:15 -0400 Organization: A noiseless patient Spider Lines: 33 Message-ID: References: <53BC05FB.4050707@jmunch.dk> <53BD70F4.4000504@stoneleaf.us> <53BDAF90.8010709@jmunch.dk> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: mx05.eternal-september.org; posting-host="bf8df648563a652600991a2f46263256"; logging-data="13610"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19CP/JI92NqAgIbie3ASx5OutD1QT3dyXw=" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.3 (windows-nt) Cancel-Lock: sha1:xlox6MM/0ypfeE35d/cnKFLAGrs= sha1:3CA2B9otvpKm/UjLMRvq43zrIBY= Xref: csiph.com comp.lang.python:74673 Chris Angelico writes: > But you also don't know that he hasn't. NaN doesn't mean "unknown", it > means "Not a Number". You need a more sophisticated system that allows > for uncertainty in your data. Regardless of whether this is the right design, it's still an example of use. As to the design, using NaN to implement NA is a hack with a long history, see http://www.numpy.org/NA-overview.html for some color. Using NaN gets us a hardware-accelerated implementation with just about the right semantics. In a real example, these lists are numpy arrays with tens of millions of elements, so this isn't a trivial benefit. (Technically, that's what's in the database; a given analysis may look at a sample of 100k or so.) > You have a special business case here (the need to > record information with a "maybe" state), and you need to cope with > it, which means dedicated logic and planning and design and code. Yes, in principle. In practice, everyone is used to the semantics of R-style missing data, which are reasonably well-matched by nan. In principle, (NA == 1.0) should be a NA (missing) truth value, as should (NA == NA), but in practice having it be False is more useful. As an example, indexing R vectors by a boolean vector containing NA yields NA results, which is a feature that I never want. Cheers, Johann