Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]


Groups > comp.lang.python > #57084 > unrolled thread

Re: Python package statistics

Started byYaşar Arabacı <yasar11732@gmail.com>
First post2013-10-19 00:07 +0300
Last post2013-10-19 00:07 +0300
Articles 1 — 1 participant

Back to article view | Back to comp.lang.python

This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by below is the oldest one visible, not the original post.


Contents

  Re: Python package statistics Yaşar Arabacı <yasar11732@gmail.com> - 2013-10-19 00:07 +0300

#57084 — Re: Python package statistics

FromYaşar Arabacı <yasar11732@gmail.com>
Date2013-10-19 00:07 +0300
SubjectRe: Python package statistics
Message-ID<mailman.1248.1382142882.18130.python-list@python.org>
Hi Terry,

Thanks for pointing it out.matplotlib's hist function wasn't broken
after all :) I published non-parametric statistics here:
http://ysar.net/python/python-package-statistics-additions.html

2013/10/18 Terry Reedy <tjreedy@udel.edu>:
> On 10/18/2013 8:41 AM, Yaşar Arabacı wrote:
>>
>> Hi people,
>>
>> I collected some data on PyPI and published some statistics about
>> packages on PyPI. I think you might find it an interesting read:
>>
>> http://ysar.net/python/python-package-statistics.html
>
>
> "b2gpopulate (36MB)
> ...
> Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is
> 161 KB and standard deviation is 1MB."
>
> For such highly skewed data, the mean and especially the standard deviation
> and confidence intervals are meaningless. The are 'parameteric' statistics,
> which is to say, were designed for bell-shaped distributions. (I will not
> say 'normal' == Guassian distributions because they are *not* normal for
> much raw data.)
>
>  A better summary is obtained from either 'non-parametric' statistics
> (median, inter-quartile range) or from 'normalizing' the data (if possible).
> For the latter, try taking the square root or log of the sizes and plot the
> distribution. If either works, take the mean and sd of the transformed
> values. Then report those and also the transformed back mean and mean+-sd.
>
> --
> Terry Jan Reedy
>
>
> --
> https://mail.python.org/mailman/listinfo/python-list



-- 
http://ysar.net/

[toc] | [standalone]


Back to top | Article view | comp.lang.python


csiph-web