Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57084

Re: Python package statistics

References <CAFEUn8YPfRsqKSfB5SCUo5=PDfMLJ3iMQ2Uq=S2o=XRWJVQNXg@mail.gmail.com> <l3s04b$8v0$1@ger.gmane.org>
Date 2013-10-19 00:07 +0300
Subject Re: Python package statistics
From Yaşar Arabacı <yasar11732@gmail.com>
Newsgroups comp.lang.python
Message-ID <mailman.1248.1382142882.18130.python-list@python.org> (permalink)

Show all headers | View raw


Hi Terry,

Thanks for pointing it out.matplotlib's hist function wasn't broken
after all :) I published non-parametric statistics here:
http://ysar.net/python/python-package-statistics-additions.html

2013/10/18 Terry Reedy <tjreedy@udel.edu>:
> On 10/18/2013 8:41 AM, Yaşar Arabacı wrote:
>>
>> Hi people,
>>
>> I collected some data on PyPI and published some statistics about
>> packages on PyPI. I think you might find it an interesting read:
>>
>> http://ysar.net/python/python-package-statistics.html
>
>
> "b2gpopulate (36MB)
> ...
> Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is
> 161 KB and standard deviation is 1MB."
>
> For such highly skewed data, the mean and especially the standard deviation
> and confidence intervals are meaningless. The are 'parameteric' statistics,
> which is to say, were designed for bell-shaped distributions. (I will not
> say 'normal' == Guassian distributions because they are *not* normal for
> much raw data.)
>
>  A better summary is obtained from either 'non-parametric' statistics
> (median, inter-quartile range) or from 'normalizing' the data (if possible).
> For the latter, try taking the square root or log of the sizes and plot the
> distribution. If either works, take the mean and sd of the transformed
> values. Then report those and also the transformed back mean and mean+-sd.
>
> --
> Terry Jan Reedy
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
http://ysar.net/

Back to comp.lang.python | Previous | Next | Find similar | Unroll thread


Thread

Re: Python package statistics Yaşar Arabacı <yasar11732@gmail.com> - 2013-10-19 00:07 +0300

csiph-web