Path: csiph.com!usenet.pasdenom.info!aioe.org!news.stack.nl!newsfeed.xs4all.nl!newsfeed3.news.xs4all.nl!xs4all!newsgate.cistron.nl!newsgate.news.xs4all.nl!post.news.xs4all.nl!not-for-mail
MIME-Version: 1.0
In-Reply-To: <55530414-1cf0-46fa-bdce-890d8679b292@googlegroups.com>
References: <520592f9$0$30000$c3e8da3$5496439d@news.astraweb.com> <0d60fd90-eb19-4702-acd5-dd7ba0eddeda@googlegroups.com> <cb7878a8-7700-46a8-b4e8-3f12fdf2cf19@googlegroups.com> <mailman.3.1376673382.23369.python-list@python.org> <55530414-1cf0-46fa-bdce-890d8679b292@googlegroups.com>
From: Oscar Benjamin <oscar.j.benjamin@gmail.com>
Date: Fri, 16 Aug 2013 20:41:01 +0100
Subject: Re: PEP 450 Adding a statistics module to Python
To: Chris Barker - NOAA Federal <chris.barker@noaa.gov>
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Cc: Python List <python-list@python.org>
Precedence: list
Newsgroups: comp.lang.python
Message-ID: <mailman.6.1376682090.23369.python-list@python.org>
Lines: 46
NNTP-Posting-Host: 2001:888:2000:d::a6
Xref: csiph.com comp.lang.python:52606

On 16 August 2013 20:00,  <chris.barker@noaa.gov> wrote:
>  > > One other point -- for performance reason, is would be nice to have =
some compiled code in there -- this adds incentive to put it in the stdlib =
-- external packages that need compiling is what makes numpy unacceptable t=
o some folks.
>>
>> It might be good to have a C accelerator one day but actually I think
>> the pure-Python-ness of it is a strong reason to have it since it
>> provides accurate statistics functions to all Python implementations
>> (unlike numpy) at no additional cost.
>
> Well, I'd rather not have a package that is great for education and  toy =
problems, but not-so-good for the real ones...

Again it depends what you mean by "real". From the other lists where
we meet I'd guess that your problems are in the "needs a nuclear
reactor" camp. I doubt that the stdlib will ever be sufficiently
mathematically/computationally oriented to fully service either of our
needs (and I don't mean that as a criticism). I persuaded the IT guys
at my work that we needed the whole Enthought Python Distribution on
all machines just because I didn't want to have to argue about
individual packages.

However in my real work, where I compute means and variances etc. I
very often do work with very small datasets and I know a lot of others
who work almost exclusively with them (think e.g. clinical data where
N is often less than 100).

> I guess my point is this:
>
> This is a way to make the standard python distribution better for some co=
mmon computational tasks. But rather than think of it as "we need some stat=
s functions in the python stdlib", perhaps we should be thinking: "out of t=
he box python should be better for computation" -- in which case, I'd start=
 with a decent array object.

I think that, whether or not the statistics module gains a C
accelerator, if a fast numerical array type comes along then I'd
expect that the statistics module would use its methods as a fast
path. And if it provides a speed boost without compromising
boundedness or accuracy I'm sure that the array type would be used
internally where appropriate (just as numpy converts collections to
arrays before computation).


Oscar