Groups | Search | Server Info | Keyboard shortcuts | Login | Register [http] [https] [nntp] [nntps]
Groups > comp.lang.python > #57072 > unrolled thread
| Started by | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| First post | 2013-10-18 14:54 -0400 |
| Last post | 2013-10-18 14:54 -0400 |
| Articles | 1 — 1 participant |
Back to article view | Back to comp.lang.python
This discussion starts older than the indexed window; earlier articles aren't shown. The article labeled Started by
below is the oldest one visible, not the original post.
Re: Python package statistics Terry Reedy <tjreedy@udel.edu> - 2013-10-18 14:54 -0400
| From | Terry Reedy <tjreedy@udel.edu> |
|---|---|
| Date | 2013-10-18 14:54 -0400 |
| Subject | Re: Python package statistics |
| Message-ID | <mailman.1240.1382122470.18130.python-list@python.org> |
On 10/18/2013 8:41 AM, Yaşar Arabacı wrote: > Hi people, > > I collected some data on PyPI and published some statistics about > packages on PyPI. I think you might find it an interesting read: > > http://ysar.net/python/python-package-statistics.html "b2gpopulate (36MB) ... Total sizes on packages in PyPI amounted to 4.2 GB. Average package size is 161 KB and standard deviation is 1MB." For such highly skewed data, the mean and especially the standard deviation and confidence intervals are meaningless. The are 'parameteric' statistics, which is to say, were designed for bell-shaped distributions. (I will not say 'normal' == Guassian distributions because they are *not* normal for much raw data.) A better summary is obtained from either 'non-parametric' statistics (median, inter-quartile range) or from 'normalizing' the data (if possible). For the latter, try taking the square root or log of the sizes and plot the distribution. If either works, take the mean and sd of the transformed values. Then report those and also the transformed back mean and mean+-sd. -- Terry Jan Reedy
Back to top | Article view | comp.lang.python
csiph-web