Path: csiph.com!x330-a1.tempe.blueboxinc.net!usenet.pasdenom.info!aioe.org!feeder.news-service.com!feeder1.cambriumusenet.nl!feed.tweaknews.nl!194.109.133.84.MISMATCH!newsfeed.xs4all.nl!newsfeed5.news.xs4all.nl!xs4all!post.news.xs4all.nl!not-for-mail Return-Path: X-Original-To: python-list@python.org Delivered-To: python-list@mail.python.org X-Spam-Status: OK 0.000 X-Spam-Evidence: '*H*': 1.00; '*S*': 0.00; 'value,': 0.04; 'assigning': 0.07; 'suggestions.': 0.07; '>>>>': 0.09; 'accident,': 0.09; 'context.': 0.09; 'default.': 0.09; 'exception.': 0.09; 'folks,': 0.09; 'indicates': 0.09; 'matlab': 0.09; 'none.': 0.09; 'none:': 0.09; 'skip:" 60': 0.09; 'solution,': 0.09; 'subject:None': 0.09; 'ugly': 0.09; 'underscores': 0.09; 'api': 0.11; 'error:': 0.12; 'float': 0.13; 'wrote:': 0.15; "'int'": 0.16; ':-p': 0.16; 'arrays,': 0.16; 'downside': 0.16; 'mean,': 0.16; 'missing,': 0.16; 'mistake.': 0.16; 'nameerror': 0.16; 'nan': 0.16; 'numpy': 0.16; 'openoffice': 0.16; 'out)': 0.16; 'subject:design': 0.16; 'subject:versus': 0.16; 'underscore': 0.16; 'case.': 0.16; 'cc:addr:python-list': 0.16; 'pm,': 0.16; 'functions,': 0.19; 'seems': 0.20; '(like': 0.21; '(most': 0.21; 'ignore': 0.21; 'cc:2**0': 0.21; 'compared': 0.22; 'cc:no real name:2**0': 0.22; "doesn't": 0.22; 'header:In- Reply-To:1': 0.22; 'trying': 0.23; 'away.': 0.23; 'indicating': 0.23; 'received:209.85.215.46': 0.23; 'received:mail- ew0-f46.google.com': 0.23; 'values.': 0.23; 'code': 0.24; 'subject: -- ': 0.25; 'traceback': 0.25; 'expect': 0.25; '(or': 0.25; 'function': 0.26; 'helpful': 0.26; "i'm": 0.27; 'url:mailman': 0.27; 'work.': 0.28; 'thu,': 0.28; 'message- id:@mail.gmail.com': 0.28; 'import': 0.29; 'anyway.': 0.29; 'convention': 0.29; 'fine.': 0.29; 'ignored.': 0.29; 'cc:addr:python.org': 0.30; 'accidentally': 0.30; 'etc.,': 0.30; 'none,': 0.30; 'operand': 0.30; 'typeerror:': 0.30; 'zero.': 0.30; 'required.': 0.31; 'values': 0.31; 'this.': 0.31; 'url:listinfo': 0.32; "won't": 0.32; 'too': 0.32; 'does': 0.32; 'generally': 0.33; 'rather': 0.33; 'implement': 0.33; "i've": 0.33; 'instead': 0.34; 'there': 0.34; 'e.g.': 0.34; 'none': 0.35; 'explicit': 0.35; 'last):': 0.35; 'uses': 0.35; "isn't": 0.35; 'probably': 0.35; 'question': 0.35; 'error.': 0.36; 'url:python': 0.37; 'designing': 0.37; 'some': 0.37; 'model': 0.37; 'but': 0.37; 'could': 0.37; 'using': 0.37; 'another': 0.38; 'received:google.com': 0.38; 'received:209.85': 0.38; 'url:org': 0.38; 'subject:: ': 0.38; 'steven': 0.38; 'something': 0.38; 'two': 0.38; 'considered': 0.38; 'perhaps': 0.39; 'should': 0.39; 'data': 0.39; 'abuse': 0.39; 'i.e.': 0.39; 'goes': 0.39; 'might': 0.39; 'missing': 0.40; 'received:209': 0.40; 'give': 0.60; 'your': 0.60; 'choose': 0.61; 'leading': 0.61; 'dedicated': 0.62; 'comments,': 0.64; 'here': 0.66; 'believe': 0.66; 'special': 0.67; 'wish': 0.70; 'stand': 0.71; 'blank': 0.71; 'favour': 0.73; 'average': 0.77; '"we\'re': 0.84; "'nonetype'": 0.84; 'lightweight': 0.84; 'mistakenly': 0.84; 'excel,': 0.91; 'follow.': 0.91; 'type(s)': 0.91 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=+l8JU3xo9OdajF/so4uKQLF3KBT4+qHaR6Qn9TMy2iU=; b=pLt2ACzGgGnEYxBLrWWYJtUn7YD7T9mDT66GHpzli4dZFWjx9Z7m8JlQ9AUtISQgy2 rbOpwq+567MveNZkccwJCHHM46ykuCjsYzUEwEExKF6c3oyCtC1ZaL2/QpKe86ltl5qd +wpInEC0NSvqKvAFFZfWtz9sKQUSIepmW/jSE= MIME-Version: 1.0 In-Reply-To: <4e1fd009$0$29986$c3e8da3$5496439d@news.astraweb.com> References: <4e1fd009$0$29986$c3e8da3$5496439d@news.astraweb.com> Date: Fri, 15 Jul 2011 11:02:08 -0600 Subject: Re: None versus MISSING sentinel -- request for design feedback From: Eric Snow To: "Steven D'Aprano" Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: quoted-printable Cc: python-list@python.org X-BeenThere: python-list@python.org X-Mailman-Version: 2.1.12 Precedence: list List-Id: General discussion list for the Python programming language List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Newsgroups: comp.lang.python Message-ID: Lines: 116 NNTP-Posting-Host: 2001:888:2000:d::a6 X-Trace: 1310749330 news.xs4all.nl 23884 [2001:888:2000:d::a6]:49257 X-Complaints-To: abuse@xs4all.nl Xref: x330-a1.tempe.blueboxinc.net comp.lang.python:9556 On Thu, Jul 14, 2011 at 11:28 PM, Steven D'Aprano wrote: > Hello folks, > > I'm designing an API for some lightweight calculator-like statistics > functions, such as mean, standard deviation, etc., and I want to support > missing values. Missing values should be just ignored. E.g.: > > mean([1, 2, MISSING, 3]) =3D> 6/3 =3D 2 rather than 6/4 or raising an err= or. > > My question is, should I accept None as the missing value, or a dedicated > singleton? > > In favour of None: it's already there, no extra code required. People may > expect it to work. > > Against None: it's too easy to mistakenly add None to a data set by mista= ke, > because functions return None by default. Good point. > > In favour of a dedicated MISSING singleton: it's obvious from context. It= 's > not a lot of work to implement compared to using None. Hard to accidental= ly > include it by mistake. If None does creep into the data by accident, you > get a nice explicit exception. Also good points. > > Against MISSING: users may expect to be able to choose their own sentinel= by > assigning to MISSING. I don't want to support that. > > > I've considered what other packages do:- > > R uses a special value, NA, to stand in for missing values. This is more = or > less the model I wish to follow. > > I believe that MATLAB treats float NANs as missing values. I consider thi= s > an abuse of NANs and I won't be supporting that :-P I was just thinking of this. :) > > Spreadsheets such as Excel, OpenOffice and Gnumeric generally ignore blan= k > cells, and give you a choice between ignoring text and treating it as zer= o. > E.g. with cells set to [1, 2, "spam", 3] the AVERAGE function returns 2 a= nd > the AVERAGEA function returns 1.5. > > numpy uses masked arrays, which is probably over-kill for my purposes; I = am > gratified to see it doesn't abuse NANs: > >>>> import numpy as np >>>> a =3D np.array([1, 2, float('nan'), 3]) >>>> np.mean(a) > nan > > numpy also treats None as an error: > >>>> a =3D np.array([1, 2, None, 3]) >>>> np.mean(a) > Traceback (most recent call last): > =A0File "", line 1, in > =A0File "/usr/lib/python2.5/site-packages/numpy/core/fromnumeric.py", lin= e > 860, in mean > =A0 =A0return mean(axis, dtype, out) > TypeError: unsupported operand type(s) for +: 'int' and 'NoneType' > > > I would appreciate any comments, advice or suggestions. > Too bad there isn't a good way to "freeze" a name, i.e. indicate that any attempt to rebind it is an exception. Trying to rebind None is a SyntaxError, but a NameError or something would be fine. Then the downside of using your own sentinel here goes away. In reality, using Missing may be your best bet anyway. If there were a convention for indicating a name should not be re-bound (like a single leading underscore indicates "private"), you could use that (all caps?). Since "we're all consenting adults" it would probably be good enough to make sure others know that Missing should not be re-bound... I might have said to use NotImplemented instead of None, but it can be re-bound and the name isn't as helpful for your use case. Another solution, perhaps ugly or confusing, is to use something like two underscores as the name for your sentinel: mean([1, 2, __, 3]) Still it seems like using Missing (or whatever) would be better than None. -eric > > -- > Steven > > -- > http://mail.python.org/mailman/listinfo/python-list >