Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #9505 > unrolled thread

None versus MISSING sentinel -- request for design feedback

Started bySteven D'Aprano <steve+comp.lang.python@pearwood.info>
First post2011-07-15 15:28 +1000
Last post2011-07-15 11:02 -0600
Articles 20 on this page of 23 — 12 participants

Back to article view | Back to comp.lang.python


Contents

  None versus MISSING sentinel -- request for design feedback Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-15 15:28 +1000
    Re: None versus MISSING sentinel -- request for design feedback Chris Angelico <rosuav@gmail.com> - 2011-07-15 16:08 +1000
      Re: None versus MISSING sentinel -- request for design feedback "bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com> - 2011-07-15 00:53 -0700
      Re: None versus MISSING sentinel -- request for design feedback Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-15 20:46 +1000
        Re: None versus MISSING sentinel -- request for design feedback Chris Angelico <rosuav@gmail.com> - 2011-07-15 21:04 +1000
          Re: None versus MISSING sentinel -- request for design feedback Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-15 22:48 +1000
            Re: None versus MISSING sentinel -- request for design feedback Mel <mwilson@the-wire.com> - 2011-07-15 09:16 -0400
              Re: None versus MISSING sentinel -- request for design feedback Ethan Furman <ethan@stoneleaf.us> - 2011-07-15 10:18 -0700
                Re: None versus MISSING sentinel -- request for design feedback Gregory Ewing <greg.ewing@canterbury.ac.nz> - 2011-07-16 14:24 +1200
                  Re: None versus MISSING sentinel -- request for design feedback Ethan Furman <ethan@stoneleaf.us> - 2011-07-16 17:31 -0700
    Re: None versus MISSING sentinel -- request for design feedback Rob Williscroft <rtw@rtw.me.uk> - 2011-07-15 07:43 +0000
      Re: None versus MISSING sentinel -- request for design feedback Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-15 20:19 +1000
        Re: None versus MISSING sentinel -- request for design feedback "OKB (not okblacke)" <brenNOSPAMbarn@NObrenSPAMbarn.net> - 2011-07-15 17:40 +0000
        Re: None versus MISSING sentinel -- request for design feedback Terry Reedy <tjreedy@udel.edu> - 2011-07-15 17:35 -0400
    Re: None versus MISSING sentinel -- request for design feedback Cameron Simpson <cs@zip.com.au> - 2011-07-15 17:44 +1000
      Re: None versus MISSING sentinel -- request for design feedback "bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com> - 2011-07-15 02:58 -0700
      Re: None versus MISSING sentinel -- request for design feedback Steven D'Aprano <steve+comp.lang.python@pearwood.info> - 2011-07-15 20:17 +1000
        Re: None versus MISSING sentinel -- request for design feedback Cameron Simpson <cs@zip.com.au> - 2011-07-15 20:38 +1000
    Re: None versus MISSING sentinel -- request for design feedback "bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com> - 2011-07-15 00:59 -0700
    Re: None versus MISSING sentinel -- request for design feedback Teemu Likonen <tlikonen@iki.fi> - 2011-07-15 11:28 +0300
      Re: None versus MISSING sentinel -- request for design feedback "bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com> - 2011-07-15 03:02 -0700
        Re: None versus MISSING sentinel -- request for design feedback Teemu Likonen <tlikonen@iki.fi> - 2011-07-15 13:56 +0300
    Re: None versus MISSING sentinel -- request for design feedback Eric Snow <ericsnowcurrently@gmail.com> - 2011-07-15 11:02 -0600

Page 1 of 2  [1] 2  Next page →


#9505 — None versus MISSING sentinel -- request for design feedback

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-07-15 15:28 +1000
SubjectNone versus MISSING sentinel -- request for design feedback
Message-ID<4e1fd009$0$29986$c3e8da3$5496439d@news.astraweb.com>
Hello folks,

I'm designing an API for some lightweight calculator-like statistics
functions, such as mean, standard deviation, etc., and I want to support
missing values. Missing values should be just ignored. E.g.:

mean([1, 2, MISSING, 3]) => 6/3 = 2 rather than 6/4 or raising an error.

My question is, should I accept None as the missing value, or a dedicated
singleton?

In favour of None: it's already there, no extra code required. People may
expect it to work.

Against None: it's too easy to mistakenly add None to a data set by mistake,
because functions return None by default.

In favour of a dedicated MISSING singleton: it's obvious from context. It's
not a lot of work to implement compared to using None. Hard to accidentally
include it by mistake. If None does creep into the data by accident, you
get a nice explicit exception.

Against MISSING: users may expect to be able to choose their own sentinel by
assigning to MISSING. I don't want to support that.


I've considered what other packages do:-

R uses a special value, NA, to stand in for missing values. This is more or
less the model I wish to follow.

I believe that MATLAB treats float NANs as missing values. I consider this
an abuse of NANs and I won't be supporting that :-P

Spreadsheets such as Excel, OpenOffice and Gnumeric generally ignore blank
cells, and give you a choice between ignoring text and treating it as zero.
E.g. with cells set to [1, 2, "spam", 3] the AVERAGE function returns 2 and
the AVERAGEA function returns 1.5.

numpy uses masked arrays, which is probably over-kill for my purposes; I am
gratified to see it doesn't abuse NANs:

>>> import numpy as np
>>> a = np.array([1, 2, float('nan'), 3])
>>> np.mean(a)
nan

numpy also treats None as an error:

>>> a = np.array([1, 2, None, 3])
>>> np.mean(a)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.5/site-packages/numpy/core/fromnumeric.py", line
860, in mean
    return mean(axis, dtype, out)
TypeError: unsupported operand type(s) for +: 'int' and 'NoneType'


I would appreciate any comments, advice or suggestions. 


-- 
Steven

[toc] | [next] | [standalone]


#9507

FromChris Angelico <rosuav@gmail.com>
Date2011-07-15 16:08 +1000
Message-ID<mailman.1044.1310710125.1164.python-list@python.org>
In reply to#9505
On Fri, Jul 15, 2011 at 3:28 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> My question is, should I accept None as the missing value, or a dedicated
> singleton?
>
> In favour of None: it's already there, no extra code required. People may
> expect it to work.
>
> Against None: it's too easy to mistakenly add None to a data set by mistake,
> because functions return None by default.

I guess the question is: Why are the missing values there? If they're
there because some function returned None because it didn't have a
value to return, and therefore it's a missing value, then using None
as "missing" would make a lot of sense. But if it's a more explicit
concept of "here's a table of values, and the user said that this one
doesn't exist", it'd be better to have an explicit MISSING. (Which I
assume would be exposed as yourmodule.MISSING or something.)

Agreed that float('nan') and "" and "spam" are all bad values for
Missings. Possibly "" should come out as 0, but "spam" should
definitely fail.

Chris Angelico

[toc] | [prev] | [next] | [standalone]


#9520

From"bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com>
Date2011-07-15 00:53 -0700
Message-ID<d82ab632-8709-44a4-8d8b-d2c9f6f9effc@l37g2000yqd.googlegroups.com>
In reply to#9507
On Jul 15, 8:08 am, Chris Angelico <ros...@gmail.com> wrote:
>
> Agreed that float('nan') and "" and "spam" are all bad values for
> Missings. Possibly "" should come out as 0

"In the face of ambiguity, refuse the temptation to guess."

As far as I'm concerned, I'd expect this to raise a TypeError...

[toc] | [prev] | [next] | [standalone]


#9535

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-07-15 20:46 +1000
Message-ID<4e201a9d$0$29989$c3e8da3$5496439d@news.astraweb.com>
In reply to#9507
Chris Angelico wrote:

> On Fri, Jul 15, 2011 at 3:28 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> My question is, should I accept None as the missing value, or a dedicated
>> singleton?
>>
>> In favour of None: it's already there, no extra code required. People may
>> expect it to work.
>>
>> Against None: it's too easy to mistakenly add None to a data set by
>> mistake, because functions return None by default.
> 
> I guess the question is: Why are the missing values there? If they're
> there because some function returned None because it didn't have a
> value to return, and therefore it's a missing value, then using None
> as "missing" would make a lot of sense. But if it's a more explicit
> concept of "here's a table of values, and the user said that this one
> doesn't exist", it'd be better to have an explicit MISSING. (Which I
> assume would be exposed as yourmodule.MISSING or something.)

In general, you have missing values in statistics because somebody wouldn't
answer a question, and the Ethics Committee frowns on researchers torturing
their subjects to get information. They make you fill out forms.

Seriously, missing data is just missing. Unknown. Lost. Not available. Like:

Name    Age     Income     Years of schooling
==============================================
Bill    42      150,000    16
Susan   23      39,000     14
Karen   unknown 89,000     15
Bob     31      0          7
George  79      12,000     unknown
Sally   17      19,000     5
Fred    66      unknown    11

One might still like to calculate the average age as 43.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#9537

FromChris Angelico <rosuav@gmail.com>
Date2011-07-15 21:04 +1000
Message-ID<mailman.1060.1310727853.1164.python-list@python.org>
In reply to#9535
On Fri, Jul 15, 2011 at 8:46 PM, Steven D'Aprano
<steve+comp.lang.python@pearwood.info> wrote:
> In general, you have missing values in statistics because somebody wouldn't
> answer a question, and the Ethics Committee frowns on researchers torturing
> their subjects to get information. They make you fill out forms.
>

Which, then, is in support of an explicit "User chose not to answer
this question" MISSING value.

ChrisA

[toc] | [prev] | [next] | [standalone]


#9541

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-07-15 22:48 +1000
Message-ID<4e203706$0$29972$c3e8da3$5496439d@news.astraweb.com>
In reply to#9537
Chris Angelico wrote:

> On Fri, Jul 15, 2011 at 8:46 PM, Steven D'Aprano
> <steve+comp.lang.python@pearwood.info> wrote:
>> In general, you have missing values in statistics because somebody
>> wouldn't answer a question, and the Ethics Committee frowns on
>> researchers torturing their subjects to get information. They make you
>> fill out forms.
>>
> 
> Which, then, is in support of an explicit "User chose not to answer
> this question" MISSING value.

Well yes, but None is an explicit missing value too. The question I have is
if I should support None as that value, or something else. Or if anyone can
put a good case for it, both, or neither and so something completely
different.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#9544

FromMel <mwilson@the-wire.com>
Date2011-07-15 09:16 -0400
Message-ID<ivpeip$l9$1@speranza.aioe.org>
In reply to#9541
Steven D'Aprano wrote:

> Well yes, but None is an explicit missing value too. The question I have
> is if I should support None as that value, or something else. Or if anyone
> can put a good case for it, both, or neither and so something completely
> different.

If it's any help, I think (some of?) the database interface packages already 
do just that, returning None when they find NULL fields.


	Mel.

[toc] | [prev] | [next] | [standalone]


#9557

FromEthan Furman <ethan@stoneleaf.us>
Date2011-07-15 10:18 -0700
Message-ID<mailman.1071.1310749444.1164.python-list@python.org>
In reply to#9544
Mel wrote:
> Steven D'Aprano wrote:
> 
>> Well yes, but None is an explicit missing value too. The question I have
>> is if I should support None as that value, or something else. Or if anyone
>> can put a good case for it, both, or neither and so something completely
>> different.
> 
> If it's any help, I think (some of?) the database interface packages already 
> do just that, returning None when they find NULL fields.

Indeed.  I'm adding Null support to my dbf package now, and while some 
of the return values (Logical, Date, DateTime, and probably Character) 
will have their own dedicated singletons (Null, NullDate, NullDateTime, 
NullChar -- which will all compare equal to None) the numeric values 
will be None... although, now that I've seen this thread, I'll add the 
ability to choose what the numeric Null is returned as.

~Ethan~

[toc] | [prev] | [next] | [standalone]


#9591

FromGregory Ewing <greg.ewing@canterbury.ac.nz>
Date2011-07-16 14:24 +1200
Message-ID<98cb2oF3vpU1@mid.individual.net>
In reply to#9557
Ethan Furman wrote:
> some 
> of the return values (Logical, Date, DateTime, and probably Character) 
> will have their own dedicated singletons (Null, NullDate, NullDateTime, 
> NullChar -- which will all compare equal to None)

That doesn't seem like a good idea to me. It's common practice
to use 'is' rather than '==' when comparing things to None.

Why do you want to use special null values for these types?

-- 
Greg

[toc] | [prev] | [next] | [standalone]


#9664

FromEthan Furman <ethan@stoneleaf.us>
Date2011-07-16 17:31 -0700
Message-ID<mailman.1146.1310862722.1164.python-list@python.org>
In reply to#9591
Gregory Ewing wrote:
> Ethan Furman wrote:
>> some of the return values (Logical, Date, DateTime, and probably 
>> Character) will have their own dedicated singletons (Null, NullDate, 
>> NullDateTime, NullChar -- which will all compare equal to None)
> 
> That doesn't seem like a good idea to me. It's common practice
> to use 'is' rather than '==' when comparing things to None.
> 
> Why do you want to use special null values for these types?

Okay, after spending some time thinking about this question
I don't believe I have a good answer.  I think it was probably something 
I thought of back when I started this project (which is basically what I 
learned Python on) and I've since learned enough that whatever reason I 
had back then has been replaced with more thorough knowledge and better 
practices.

The best reason I have at this point is being able to know what the Null 
value is supposed to represent -- True/False, a Date, etc. -- however, 
even that is weakened by my decision to use None for Null in the case of 
Character and Numerics; so there is probably no reason to not use None 
in the case of Logicals, Dates, DateTimes, and Times.

Thank you for the question!

~Ethan~

[toc] | [prev] | [next] | [standalone]


#9516

FromRob Williscroft <rtw@rtw.me.uk>
Date2011-07-15 07:43 +0000
Message-ID<mailman.1052.1310715856.1164.python-list@python.org>
In reply to#9505
Steven D'Aprano wrote in news:4e1fd009$0$29986$c3e8da3
$5496439d@news.astraweb.com in gmane.comp.python.general:

> I'm designing an API for some lightweight calculator-like statistics
> functions, such as mean, standard deviation, etc., and I want to support
> missing values. Missing values should be just ignored. E.g.:
> 
> mean([1, 2, MISSING, 3]) => 6/3 = 2 rather than 6/4 or raising an error.

If you can't make your mind up then maybe you shouldn't:

MISSING = MissingObject()
def mean( sequence, missing = MISSING ):
  ...

Rob.

[toc] | [prev] | [next] | [standalone]


#9533

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-07-15 20:19 +1000
Message-ID<4e201439$0$29983$c3e8da3$5496439d@news.astraweb.com>
In reply to#9516
Rob Williscroft wrote:

> Steven D'Aprano wrote in news:4e1fd009$0$29986$c3e8da3
> $5496439d@news.astraweb.com in gmane.comp.python.general:
> 
>> I'm designing an API for some lightweight calculator-like statistics
>> functions, such as mean, standard deviation, etc., and I want to support
>> missing values. Missing values should be just ignored. E.g.:
>> 
>> mean([1, 2, MISSING, 3]) => 6/3 = 2 rather than 6/4 or raising an error.
> 
> If you can't make your mind up then maybe you shouldn't:

Heh, good point.

It's not so much that I can't make up my mind -- I have a preferred solution
in mind, but I want to hear what sort of interface for dealing with missing
values others expect, and I don't want to prejudice others too greatly.


> MISSING = MissingObject()
> def mean( sequence, missing = MISSING ):

So you think the right API is to allow the caller to specify what counts as
a missing value at runtime? Are you aware of any other statistics packages
that do that?


-- 
Steven

[toc] | [prev] | [next] | [standalone]


#9558

From"OKB (not okblacke)" <brenNOSPAMbarn@NObrenSPAMbarn.net>
Date2011-07-15 17:40 +0000
Message-ID<Xns9F236C55079E2OKB@88.198.244.100>
In reply to#9533
Steven D'Aprano wrote:

> Rob Williscroft wrote:
>> MISSING = MissingObject()
>> def mean( sequence, missing = MISSING ):
> 
> So you think the right API is to allow the caller to specify what
> counts as a missing value at runtime? Are you aware of any other
> statistics packages that do that?

    	R does it, not in the stats functions itself but in, for instance 
read.table.  When reading data from an external file, you can specify a 
set of values that will be converted to NA in the resulting data frame.

    	I think it's worth considering this approach, namely separating the 
input of the data into your system from the calculations on that 
data.  You haven't said exactly how people are going to be using your 
API, but your example of "where mising data comes from" showed something 
like a table of data from a survey.  If this is the case, and users are 
going to be importing sets of data from external files, it makes a lot 
of sense to let them specify "convert these particular values to MISSING 
when importing".

    	Either way, my answer to your original question would be: if you 
want to err on the side of caution, use your own MISSING value and just 
provide a simple function that will MISSING-ize specified values:

def ckeanUp(data, missing=None):
    	if missing is None:
    	    	missing = []
    	return [d for d in data if d not in missing else MISSING]

(Yet another use of None here! :-)

    	Then if people find their functions are returning None (or any 
other value, such as an empty string) to mean a "genuine" missing value, 
they can just wrap the call in this cleanUp function.  The reverse is 
harder to do: if you use None as your missing-value sentinel, you 
irrevocably lose the ability to tell it apart from other uses of None.

-- 
--OKB (not okblacke)
Brendan Barnwell
"Do not follow where the path may lead.  Go, instead, where there is
no path, and leave a trail."
	--author unknown

[toc] | [prev] | [next] | [standalone]


#9572

FromTerry Reedy <tjreedy@udel.edu>
Date2011-07-15 17:35 -0400
Message-ID<mailman.1079.1310765743.1164.python-list@python.org>
In reply to#9533
On 7/15/2011 6:19 AM, Steven D'Aprano wrote:

Use None as default. Requiring users to use your special value would be 
a nuisance. They may have data prepared separately from your module.

> Rob Williscroft wrote:

>> MISSING = MissingObject()
>> def mean( sequence, missing = MISSING ):

This is also a good idea.

> So you think the right API is to allow the caller to specify what counts as
> a missing value at runtime? Are you aware of any other statistics packages
> that do that?

AFAIK, standard feature on major packages. BMDP, SAS and SPSS as I 
remember. Missing values could be specified on a per column basis.

-- 
Terry Jan Reedy

[toc] | [prev] | [next] | [standalone]


#9517

FromCameron Simpson <cs@zip.com.au>
Date2011-07-15 17:44 +1000
Message-ID<mailman.1051.1310715854.1164.python-list@python.org>
In reply to#9505
On 15Jul2011 15:28, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
| In favour of None: it's already there, no extra code required. People may
| expect it to work.

Broadly, I like this one for the reasons you cite.

| Against None: it's too easy to mistakenly add None to a data set by mistake,
| because functions return None by default.

This is a hazard everywhere, but won't such a circumstance normally
break lots of stuff anyway? What's an example scenario for getting None
by accident but still a bunch of non-None values? The main one I can
imagine is a function with a return path that accidentally misses the
value something, eg:

  def f(x):
    if blah:
      return 7
    ...
    if foo:
      return 0
    # whoops!


I suppose there's no scope for having the append-to-the-list step sanity
check for the sentinel (be it None or otherwise)?

| In favour of a dedicated MISSING singleton: it's obvious from context. It's
| not a lot of work to implement compared to using None. Hard to accidentally
| include it by mistake. If None does creep into the data by accident, you
| get a nice explicit exception.

I confess to being about to discard None as a sentinel in a bit of my
own code, but only to allow None to be used as a valid value, using the
usual idiom:

  class IQ(Queue):
    def __init__(self, ...):
      self._sentinel = object()
      ...

| Against MISSING: users may expect to be able to choose their own sentinel by
| assigning to MISSING. I don't want to support that.

Well, we don't have readonly values to play with :-(
Personally I'd do what I did above: give it a "private" name like
_MISSING so that people should expect to have inside (and unsupported,
unguarenteed) knowledge if they fiddle with it. Or are you publishing
the sentinal's name to your callers i.e. may they really return _MISSING
legitimately from their functions?

Cheers,
-- 
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

What's fair got to do with it? It's going to happen.    - Lawrence of Arabia

[toc] | [prev] | [next] | [standalone]


#9530

From"bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com>
Date2011-07-15 02:58 -0700
Message-ID<a6f3ac2f-1a64-4b0f-a87c-dc01d0795958@g9g2000yqb.googlegroups.com>
In reply to#9517
On Jul 15, 9:44 am, Cameron Simpson <c...@zip.com.au> wrote:
> On 15Jul2011 15:28, Steven D'Aprano <steve+comp.lang.pyt...@pearwood.info> wrote:
> | Against MISSING: users may expect to be able to choose their own sentinel by
> | assigning to MISSING. I don't want to support that.
>
> Well, we don't have readonly values to play with :-(
> Personally I'd do what I did above: give it a "private" name like
> _MISSING so that people should expect to have inside (and unsupported,
> unguarenteed) knowledge if they fiddle with it.

I think the point is to allow users to explicitely use MISSING in
their data sets, so it does have to be public. But anyway: ALL_UPPER
names are supposed to be treated as constants, so the "warranty void
if messed with" still apply.

[toc] | [prev] | [next] | [standalone]


#9532

FromSteven D'Aprano <steve+comp.lang.python@pearwood.info>
Date2011-07-15 20:17 +1000
Message-ID<4e2013c6$0$29983$c3e8da3$5496439d@news.astraweb.com>
In reply to#9517
Cameron Simpson wrote:

> On 15Jul2011 15:28, Steven D'Aprano <steve+comp.lang.python@pearwood.info>
> wrote:
> | In favour of None: it's already there, no extra code required. People
> | may expect it to work.
> 
> Broadly, I like this one for the reasons you cite.
> 
> | Against None: it's too easy to mistakenly add None to a data set by
> | mistake, because functions return None by default.
> 
> This is a hazard everywhere, but won't such a circumstance normally
> break lots of stuff anyway? 

Maybe, maybe not. Either way, it has nothing to do with me -- I only care
about what my library does if presented with None in a list of numbers.
Should I treat it as a missing value, and ignore it, or treat it as an
error?


> What's an example scenario for getting None 
> by accident but still a bunch of non-None values? The main one I can
> imagine is a function with a return path that accidentally misses the
> value something, eg:
[code snipped]

Yes, that's the main example I can think of. It doesn't really matter how it
happens though, only that it is more likely for None to accidentally get
inserted into a list than it is for a module-specific MISSING value.

My thoughts are, if my library gets presented with two lists:

[1, 2, 3, None, 5, 6]

[1, 2, 3, mylibrary.MISSING, 5, 6]

which is less likely to be an accident rather than deliberate? That's the
one I should accept as the missing value. Does anyone think that's the
wrong choice?


> I suppose there's no scope for having the append-to-the-list step sanity
> check for the sentinel (be it None or otherwise)?

It is not my responsibility to validate data during construction, only to do
the right thing when given that data. The right thing being, raise an
exception if values are not numeric, unless an explicit "missing" value
(whatever that ends up being).


> | Against MISSING: users may expect to be able to choose their own
> | sentinel by assigning to MISSING. I don't want to support that.
> 
> Well, we don't have readonly values to play with :-(
> Personally I'd do what I did above: give it a "private" name like
> _MISSING so that people should expect to have inside (and unsupported,
> unguarenteed) knowledge if they fiddle with it. Or are you publishing
> the sentinal's name to your callers i.e. may they really return _MISSING
> legitimately from their functions?

Assuming I choose against None, and go with MISSING, it will be a public
part of the library API. The idea being that callers will be responsible
for ensuring that if they have data with missing values, they insert the
correct sentinel, rather than whatever random non-numeric value they
started off with.



-- 
Steven

[toc] | [prev] | [next] | [standalone]


#9534

FromCameron Simpson <cs@zip.com.au>
Date2011-07-15 20:38 +1000
Message-ID<mailman.1059.1310726293.1164.python-list@python.org>
In reply to#9532
On 15Jul2011 20:17, Steven D'Aprano <steve+comp.lang.python@pearwood.info> wrote:
| Cameron Simpson wrote:
| > I suppose there's no scope for having the append-to-the-list step sanity
| > check for the sentinel (be it None or otherwise)?
| 
| It is not my responsibility to validate data during construction, only to do
| the right thing when given that data. The right thing being, raise an
| exception if values are not numeric, unless an explicit "missing" value
| (whatever that ends up being).

Well there you go. You need to use MISSING, not None. As you say, None
can easily be a mistake and you want to be sure. If what you describe as
"right" is right, then I too would be using a special sentinal instead
of None.
-- 
Cameron Simpson <cs@zip.com.au> DoD#743
http://www.cskk.ezoshosting.com/cs/

The English language has a word to describe a group of anarcho-collectivists
without resorting to spiffy hyphenated coined phrases: a mob.
        - Tim Mefford, <tim@physics.orst.edu>

[toc] | [prev] | [next] | [standalone]


#9524

From"bruno.desthuilliers@gmail.com" <bruno.desthuilliers@gmail.com>
Date2011-07-15 00:59 -0700
Message-ID<fd654227-7fac-48ff-9e85-eb1058c731f0@m10g2000yqd.googlegroups.com>
In reply to#9505
On Jul 15, 7:28 am, Steven D'Aprano <steve
+comp.lang.pyt...@pearwood.info> wrote:
>
> I'm designing an API for some lightweight calculator-like statistics
> functions, such as mean, standard deviation, etc., and I want to support
> missing values. Missing values should be just ignored. E.g.:


(snip)

> Against None: it's too easy to mistakenly add None to a data set by mistake,
> because functions return None by default.

Yeps.

> In favour of a dedicated MISSING singleton: it's obvious from context. It's
> not a lot of work to implement compared to using None. Hard to accidentally
> include it by mistake. If None does creep into the data by accident, you
> get a nice explicit exception.
>
> Against MISSING: users may expect to be able to choose their own sentinel by
> assigning to MISSING. I don't want to support that.

What about allowing users to specificy their own sentinel in the
simplest pythonic way:

# stevencalc.py
MISSING = object()

def mean(values, missing=MISSING):
    your code here


Or, if you want to make it easier to specify the sentinel once for the
whole API:

# stevencalc.py
MISSING = object()

class Calc(object):
    def __init__(self, missing=MISSING):
        self._missing = missing
    def mean(self, values):
        # your code here


# default:
_calc = Calc()
mean = _calc.mean
# etc...

My 2 cents...

[toc] | [prev] | [next] | [standalone]


#9527

FromTeemu Likonen <tlikonen@iki.fi>
Date2011-07-15 11:28 +0300
Message-ID<87pqlc3q2s.fsf@mithlond.arda>
In reply to#9505
* 2011-07-15T15:28:41+10:00 * Steven D'Aprano wrote:

> I'm designing an API for some lightweight calculator-like statistics
> functions, such as mean, standard deviation, etc., and I want to
> support missing values. Missing values should be just ignored. E.g.:
>
> mean([1, 2, MISSING, 3]) => 6/3 = 2 rather than 6/4 or raising an
> error.
>
> My question is, should I accept None as the missing value, or a
> dedicated singleton?

How about accepting anything but ignoring all non-numbers?

[toc] | [prev] | [next] | [standalone]


Page 1 of 2  [1] 2  Next page →

Back to top | Article view | comp.lang.python


csiph-web