Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #57084
| References | <CAFEUn8YPfRsqKSfB5SCUo5=PDfMLJ3iMQ2Uq=S2o=XRWJVQNXg@mail.gmail.com> <l3s04b$8v0$1@ger.gmane.org> |
|---|---|
| Date | 2013-10-19 00:07 +0300 |
| Subject | Re: Python package statistics |
| From | Yaşar Arabacı <yasar11732@gmail.com> |
| Newsgroups | comp.lang.python |
| Message-ID | <mailman.1248.1382142882.18130.python-list@python.org> (permalink) |
Hi Terry, Thanks for pointing it out.matplotlib's hist function wasn't broken after all :) I published non-parametric statistics here: http://ysar.net/python/python-package-statistics-additions.html 2013/10/18 Terry Reedy <tjreedy@udel.edu>: > On 10/18/2013 8:41 AM, Yaşar Arabacı wrote: >> >> Hi people, >> >> I collected some data on PyPI and published some statistics about >> packages on PyPI. I think you might find it an interesting read: >> >> http://ysar.net/python/python-package-statistics.html > > > "b2gpopulate (36MB) > ... > Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is > 161 KB and standard deviation is 1MB." > > For such highly skewed data, the mean and especially the standard deviation > and confidence intervals are meaningless. The are 'parameteric' statistics, > which is to say, were designed for bell-shaped distributions. (I will not > say 'normal' == Guassian distributions because they are *not* normal for > much raw data.) > > A better summary is obtained from either 'non-parametric' statistics > (median, inter-quartile range) or from 'normalizing' the data (if possible). > For the latter, try taking the square root or log of the sizes and plot the > distribution. If either works, take the mean and sd of the transformed > values. Then report those and also the transformed back mean and mean+-sd. > > -- > Terry Jan Reedy > > > -- > https://mail.python.org/mailman/listinfo/python-list -- http://ysar.net/
Back to comp.lang.python | Previous | Next | Find similar | Unroll thread
Re: Python package statistics Yaşar Arabacı <yasar11732@gmail.com> - 2013-10-19 00:07 +0300
csiph-web